[Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windo

To:	<xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
From:	"Xin, Xiaohui" <xiaohui.xin@xxxxxxxxx>
Date:	Sat, 4 Nov 2006 21:48:12 +0800
Cc:	"Tian, Kevin" <kevin.tian@xxxxxxxxx>, "Li, Xin B" <xin.b.li@xxxxxxxxx>, "He, Qing" <qing.he@xxxxxxxxx>, "Mallick, Asit K" <asit.k.mallick@xxxxxxxxx>, "Li, Susie" <susie.li@xxxxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>
Delivery-date:	Sat, 04 Nov 2006 05:48:40 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AccAF9dKTIq91n6sSkWEGX/0OhiYJw==
Thread-topic:	[Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

Some background:

Now the 32bit HVM SMP Windows guest with the PV drivers will hang randomly. Sometimes the problem occurs during drivers loading, and sometimes the problem occurs when the guest is destroyed. And at last, Xen0 will hang also. We are debugging this issue.

With the great help of Kevin Tian, we at last find two deadlock issues on HVM SMP guest. The description of the deadlock is followed. Suppose we have two vcpus now.

1) One vcpu is holding the BIGLOCK, and it wants to hold the shadow_lock. At the same time, the other vcpu is holding the shadow_lock, and it wants to walk the P2M table. The fault pfn address is near the 4G boundary, for example 0xfee00, and of course the va for the P2M table entry is now even never mapped. So when the vcpu tries to walk the P2M table, one page fault in Xen address area occurs. The current do_page_fault() will call spurious_page_fault() to test if it is a page fault really or not. But the spurious_page_fault() will first try to hold the BIGLOCK. So the deadlock…..

2) When the guest is destroyed, Xen will call domain_shutdown_finalise(), the function will first try to hold the BIGLOCK, and next call vcpu_sleep_sync(). The vcpu_sleep_sync() will wait for other vcpu’s state. But the other vcpu now is in the spurious_page_fault(), and spurious_page_fault() will try to hold BIGLOCK. So another situation of deadlock.

Is there anything wrong with the description?

If we’re right, then does the spurious_page_fault() need to hold the BIGLOCK? We have an ugly workaround to decrease the occurring frequency of the spurious_page_fault(), that is we try to map all the 4G P2M table area and fill it with INVALID_MFN accordingly at P2M table allocated time. And with the workaround, the 32bit HVM SMP Windows with PV drivers can now run more smoothly, and can be destroyed successfully. But we have no elegant solution now. :-(

Does anyone have some good suggestions? Any comments are welcome.

Thanks

Xioahui

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows