xen-devel

[Top] [All Lists]

Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin

from [Chris Lalancette]

[Permanent Link][Original]

To:	Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin
From:	Chris Lalancette <clalance@xxxxxxxxxx>
Date:	Thu, 20 Nov 2008 19:37:01 +0100
Cc:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Thu, 20 Nov 2008 10:37:56 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<C54B28E2.29419%keir.fraser@xxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<C54B28E2.29419%keir.fraser@xxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Thunderbird 2.0.0.16 (X11/20080723)

Keir Fraser wrote:
> On 20/11/08 10:31, "Chris Lalancette" <clalance@xxxxxxxxxx> wrote:
> 
>> it applies to the 2.6.18 tree as well; the deadlock scenario is below.
>>
>> "After running an arbitrary workload involving network traffic for some time
>> (1-2 days), a xen guest running the 2.6.9-67 x86_64 xenU kernel locks up with
>> both vcpu's spinning at 100%.
>>
>> The problem is due to a race between the scheduler and network interrupts.  
>> On
>> one vcpu, the scheduler takes the runqueue spinlock of the other vcpu to
>> schedule a process, and attempts to lock mm_unpinned_lock.  On the other 
>> vcpu,
>> another process is holding mm_unpinned_lock (because it is starting or
>> exiting), and is interrupted by a network interrupt.  The network interrupt
>> handler attempts to wake up the same process that the first vcpu is trying to
>> schedule, and will try to get the runqueue spinlock that the first vcpu is
>> already holding."
> 
> I don't believe that mm_unpinned_lock can ever be taken while a runqueue
> lock is already held in 2.6.18. If you can provide a call chain then I'll
> consider the patch -- but I think you'd still be screwed by the
> mm->page_table_lock (also acquired in mm_pin() code, also not IRQ safe, but
> less easy for you to go convert all the users of that lock).
> 
> You might have some backporting from 2.6.18 to do...

Arg.  I think I see what you mean.  In c/s 10343, mm_pin is moved from switch_mm
into activate_mm, which I *think* means that it is no longer called with the
runqueue lock held.  Indeed, the comment on that c/s says it removes a deadlock,
which may be the one the RHEL-4 kernel is running into.  OK, thanks for the
feedback, I'll look at backporting that code.

Chris Lalancette

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[Xen-devel] [PATCH]: Fix deadlock in mm_pin, Chris Lalancette Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin, Keir Fraser Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin, Chris Lalancette <=

Previous by Date:	Re: [Xen-devel] Re: issues with movnti emulation, Keir Fraser
Next by Date:	Re: [Xen-devel] Re: issues with movnti emulation, Kevin Wolf
Previous by Thread:	Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin, Keir Fraser
Next by Thread:	[Xen-devel] Power Aware Credit Scheduler Followup, Yu, Ke
Indexes:	[Date] [Thread] [Top] [All Lists]