This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] [PATCH] Fix OOS on domain crash.

To: haicheng.li@xxxxxxxxx, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: [Xen-devel] [PATCH] Fix OOS on domain crash.
From: Gianluca Guida <gianluca.guida@xxxxxxxxxxxxx>
Date: Wed, 13 Aug 2008 19:29:26 +0100
Cc: Tim Deegan <Tim.Deegan@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 13 Aug 2008 11:30:05 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla-Thunderbird (X11/20080110)

I couldn't reproduce the Nevada crash on my testbox, but this should fix the first Xen crash that was seen in the Nevada HVM (bugzilla #1322).

What I think most probably happened there is that the set_l2e call in shadow_get_and_create_l1e() has tried to resync a page, but somehow we weren't unable to remove the shadow (the real bug we should actually look after). sh_resync() then removes the page from the OOS hash and later in the page fault path we find the gw.l1mfn to be still OOS, so we try to update the snapshot and the bug happens.

Attached patch should fix this and other unlikely (like sh_unsync() failing to remove for hash collision the current gw.l1mfn) cases.


Signed-off-by: Gianluca Guida <gianluca.guida@xxxxxxxxxxxxx>
diff -r b75f0b3e2a7e xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c    Wed Aug 13 11:09:46 2008 +0100
+++ b/xen/arch/x86/mm/shadow/multi.c    Wed Aug 13 14:05:57 2008 -0400
@@ -3290,6 +3290,16 @@ static int sh_page_fault(struct vcpu *v,
     if ( sh_mfn_is_a_page_table(gmfn)
          && ft == ft_demand_write )
         sh_unsync(v, gmfn);
+    if ( unlikely(d->is_shutting_down) )
+    {
+        /* We might end up with a crashed domain here if
+         * sh_remove_shadows() in a previous sh_resync() call has
+         * failed. We cannot safely continue since some page is still
+         * OOS but not in the hash table anymore. */
+        shadow_unlock(d);
+        return 0;
+    }
 #endif /* OOS */
     /* Calculate the shadow entry and write it */
Xen-devel mailing list
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-devel] [PATCH] Fix OOS on domain crash., Gianluca Guida <=