WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] x86_64 live migration problems: help needed with shadow page

To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] x86_64 live migration problems: help needed with shadow page table
From: John Byrne <john.l.byrne@xxxxxx>
Date: Thu, 22 Jun 2006 11:58:31 -0700
Delivery-date: Thu, 22 Jun 2006 11:59:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.4 (X11/20060516)

Hi,

I've been doing most of my testing with Novell's SLES10 3.0.2 9742c; however, I did verify the problem existed in xen-unstable as of last week.

x86_64 live migration is unreliable, if the domain is under stress (I used a kernel make), the domU frequently OOPses or the compile gets a segfault afterwards.

The primary issue seems to be that L1 and L2 page table pages are not getting marked dirty it the shadow_dirty_bitmap. (It is running 4-level page tables and I have yet to see a verify problem with the L3 and L4 page tables.)

I have shown this to my satisfaction by adding code in xc_linux_save.c to mark all L1 and L2 pages for fixups on the last iteration. I've managed to migrate 10 times without obvious problems, so far. (I also found the the clear_bit() and set_bit() routines were broken on x86_64; a patch against the latest xen-unstable is attached.)

I cannot call what I've done a fix since it causing the transfer of 2000 extra, and mostly unnecessary, pages on the last iteration. So what I'd like help with is where to fix the shadow page table code so the L1 and L2 pages get marked dirty properly. It is not immediately obvious to me where this needs to be done and I'm hoping someone can save me a lot of time.

Thanks,

John Byrne





diff -r 411a3c01bb40 tools/libxc/xc_linux_save.c
--- a/tools/libxc/xc_linux_save.c       Tue Jun 20 18:51:46 2006 +0100
+++ b/tools/libxc/xc_linux_save.c       Thu Jun 22 11:45:56 2006 -0700
@@ -91,12 +91,12 @@ static inline int test_bit (int nr, vola
 
 static inline void clear_bit (int nr, volatile void * addr)
 {
-    BITMAP_ENTRY(nr, addr) &= ~(1 << BITMAP_SHIFT(nr));
+    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
 }
 
 static inline void set_bit ( int nr, volatile void * addr)
 {
-    BITMAP_ENTRY(nr, addr) |= (1 << BITMAP_SHIFT(nr));
+    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
 }
 
 /* Returns the hamming weight (i.e. the number of bits set) in a N-bit word */
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-devel] x86_64 live migration problems: help needed with shadow page table, John Byrne <=