WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [PATCH]: Fix deadlock in mm_pin

To: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] [PATCH]: Fix deadlock in mm_pin
From: Chris Lalancette <clalance@xxxxxxxxxx>
Date: Thu, 20 Nov 2008 11:31:54 +0100
Delivery-date: Thu, 20 Nov 2008 02:32:44 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.16 (X11/20080723)
All,
    Attached is a patch to fix a deadlock that can occur in the Xen kernel.  The
patch comes from Oracle, originally reported against the RHEL-4 PV kernel, but
it applies to the 2.6.18 tree as well; the deadlock scenario is below.

"After running an arbitrary workload involving network traffic for some time
(1-2 days), a xen guest running the 2.6.9-67 x86_64 xenU kernel locks up with
both vcpu's spinning at 100%.

The problem is due to a race between the scheduler and network interrupts.  On
one vcpu, the scheduler takes the runqueue spinlock of the other vcpu to
schedule a process, and attempts to lock mm_unpinned_lock.  On the other vcpu,
another process is holding mm_unpinned_lock (because it is starting or
exiting), and is interrupted by a network interrupt.  The network interrupt
handler attempts to wake up the same process that the first vcpu is trying to
schedule, and will try to get the runqueue spinlock that the first vcpu is
already holding."

The fix is fairly simple; make sure to take mm_unpinned_lock with
spin_lock_irqsave() so that we can't be interrupted on this vcpu until after we
leave the critical section.

Signed-off-by: Herbert van den Bergh <herbert.van.den.bergh@xxxxxxxxxx>
Signed-off-by: Chris Lalancette <clalance@xxxxxxxxxx>
--- linux-2.6.18.noarch/arch/x86_64/kernel/ldt-xen.c.orig       2008-11-06 
10:18:21.000000000 -0500
+++ linux-2.6.18.noarch/arch/x86_64/kernel/ldt-xen.c    2008-11-06 
10:19:48.000000000 -0500
@@ -109,6 +109,8 @@ static inline int copy_ldt(mm_context_t 
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
+       unsigned long flags;
+
        struct mm_struct * old_mm;
        int retval = 0;
 
@@ -121,9 +123,9 @@ int init_new_context(struct task_struct 
                up(&old_mm->context.sem);
        }
        if (retval == 0) {
-               spin_lock(&mm_unpinned_lock);
+               spin_lock_irqsave(&mm_unpinned_lock, flags);
                list_add(&mm->context.unpinned, &mm_unpinned);
-               spin_unlock(&mm_unpinned_lock);
+               spin_unlock_irqrestore(&mm_unpinned_lock, flags);
        }
        return retval;
 }
@@ -134,6 +136,8 @@ int init_new_context(struct task_struct 
  */
 void destroy_context(struct mm_struct *mm)
 {
+       unsigned long flags;
+
        if (mm->context.size) {
                if (mm == current->active_mm)
                        clear_LDT();
@@ -148,9 +152,9 @@ void destroy_context(struct mm_struct *m
                mm->context.size = 0;
        }
        if (!mm->context.pinned) {
-               spin_lock(&mm_unpinned_lock);
+               spin_lock_irqsave(&mm_unpinned_lock, flags);
                list_del(&mm->context.unpinned);
-               spin_unlock(&mm_unpinned_lock);
+               spin_unlock_irqrestore(&mm_unpinned_lock, flags);
        }
 }
 
--- linux-2.6.18.noarch/arch/x86_64/mm/pageattr-xen.c.orig      2008-11-06 
10:16:01.000000000 -0500
+++ linux-2.6.18.noarch/arch/x86_64/mm/pageattr-xen.c   2008-11-06 
10:18:10.000000000 -0500
@@ -70,6 +70,8 @@ static void mm_walk(struct mm_struct *mm
 
 void mm_pin(struct mm_struct *mm)
 {
+       unsigned long flags;
+
        if (xen_feature(XENFEAT_writable_page_tables))
                return;
 
@@ -87,15 +89,17 @@ void mm_pin(struct mm_struct *mm)
        xen_pgd_pin(__pa(mm->pgd)); /* kernel */
        xen_pgd_pin(__pa(__user_pgd(mm->pgd))); /* user */
        mm->context.pinned = 1;
-       spin_lock(&mm_unpinned_lock);
+       spin_lock_irqsave(&mm_unpinned_lock, flags);
        list_del(&mm->context.unpinned);
-       spin_unlock(&mm_unpinned_lock);
+       spin_unlock_irqrestore(&mm_unpinned_lock, flags);
 
        spin_unlock(&mm->page_table_lock);
 }
 
 void mm_unpin(struct mm_struct *mm)
 {
+       unsigned long flags;
+
        if (xen_feature(XENFEAT_writable_page_tables))
                return;
 
@@ -112,9 +116,9 @@ void mm_unpin(struct mm_struct *mm)
        mm_walk(mm, PAGE_KERNEL);
        xen_tlb_flush();
        mm->context.pinned = 0;
-       spin_lock(&mm_unpinned_lock);
+       spin_lock_irqsave(&mm_unpinned_lock, flags);
        list_add(&mm->context.unpinned, &mm_unpinned);
-       spin_unlock(&mm_unpinned_lock);
+       spin_unlock_irqrestore(&mm_unpinned_lock, flags);
 
        spin_unlock(&mm->page_table_lock);
 }

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>