Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin

To:	Chris Lalancette <clalance@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin
From:	Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date:	Thu, 20 Nov 2008 14:46:58 +0000
Cc:
Delivery-date:	Thu, 20 Nov 2008 06:47:24 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<49253C9A.5020406@xxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AclLHta0FYHsnLcSEd2/hgAX8io7RQ==
Thread-topic:	[Xen-devel] [PATCH]: Fix deadlock in mm_pin
User-agent:	Microsoft-Entourage/11.4.0.080122

On 20/11/08 10:31, "Chris Lalancette" <clalance@xxxxxxxxxx> wrote:

> it applies to the 2.6.18 tree as well; the deadlock scenario is below.
> 
> "After running an arbitrary workload involving network traffic for some time
> (1-2 days), a xen guest running the 2.6.9-67 x86_64 xenU kernel locks up with
> both vcpu's spinning at 100%.
> 
> The problem is due to a race between the scheduler and network interrupts.  On
> one vcpu, the scheduler takes the runqueue spinlock of the other vcpu to
> schedule a process, and attempts to lock mm_unpinned_lock.  On the other vcpu,
> another process is holding mm_unpinned_lock (because it is starting or
> exiting), and is interrupted by a network interrupt.  The network interrupt
> handler attempts to wake up the same process that the first vcpu is trying to
> schedule, and will try to get the runqueue spinlock that the first vcpu is
> already holding."

I don't believe that mm_unpinned_lock can ever be taken while a runqueue
lock is already held in 2.6.18. If you can provide a call chain then I'll
consider the patch -- but I think you'd still be screwed by the
mm->page_table_lock (also acquired in mm_pin() code, also not IRQ safe, but
less easy for you to go convert all the users of that lock).

You might have some backporting from 2.6.18 to do...

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin