Brendan:
Hi, my name is Yoshi Tamura, working for NTT Labs in Japan.
I tried your patches, and I liked your new feature to checkpoint a running
domain.
I also tried your patches for live migration, but xc_linux_restore() on the
remote machine failed.
I track downed the problem and fixed it by modifying __xen_checkpoint() in
machine_reboot.c. Take a look at the following patch.
As far as I have tested, it works for both xm save -c and xm migrate –live.
Let me know if you have any comments or better idea.
Regards,
Yoshi Tamura
Signed-off-by: Yoshi Tamura <tamura.yoshiaki@xxxxxxxxxxxxx>
diff -r 3bde632518a4 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu Dec 14 23:05:42
2006 -0800
+++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Wed Dec 20 16:21:43
2006 +0900
@@ -171,8 +171,6 @@ int __xen_suspend(void)
pre_suspend();
- gnttab_checkpoint();
-
/*
* We'll stop somewhere inside this hypercall. When it returns,
* we'll start resuming after the restore.
@@ -223,6 +221,8 @@ int __xen_checkpoint(void)
xenbus_lock();
+ gnttab_suspend();
+
preempt_disable();
mm_pin_all();
@@ -257,6 +257,8 @@ int __xen_checkpoint(void)
} else {
post_checkpoint();
+ gnttab_resume();
+
local_irq_enable();
xenbus_unlock();
Brendan Cully wrote:
I think maybe I forgot to mention that I have successfully
checkpointed domains and restored them from checkpoints (with
file-system activity between checkpoints). It seems to work pretty
well. I'll try to put together a demo of this next week.
Regarding full device disconnection, my understanding is that guest
domains are already prepared to deal with back-end driver crashes (by
maintaining shadows of the ring etc), so a forced reconnect on resume
should be able to recover even if there wasn't an orderly shutdown
before the suspend. I thought when I looked over the code that the
reconnect path did a paranoid forced disconnect first anyway (eg
checking for existing event channels and resetting them).
On the other hand, if checkpoints are taken more frequently than they
are restored, it seems odd to be constantly detaching and reattaching
back-ends in the parent.
But if this is unsafe, it should be fairly easy to make the code do a
full disconnect before suspend. It might be as easy as changing xm
save to write 'suspend' to control/shutdown instead of 'checkpoint'.
On Friday, 15 December 2006 at 08:07, Steven Hand wrote:
I'm not too sure about the last couple of patches in this
series. Because the checkpointing domain doesn't disconnect before
calling suspend, it retains a few references to pages it doesn't
own. These trigger a PT race detector in xc_linux_save, which causes
it to abort. So the last couple of patches explicitly identify the
references I've found so far (shared_info and some grant table shared
pages) and simply zero those PTEs during save, since they'll be
recreated on restore. Finding the grant table pages is a bit fragile -
I walk the page table loaded in CR3 at the time of suspend looking for
the virtual address I've stowed in the suspend record. I've only got
code for two-level page tables at the moment, since I'm not convinced
this is the right approach. Under what circumstances would a non-live
save have an unsafe PTE race?
Pretty much any PT race in a non-live save/migrate is a bug; the
domain is (in theory) suspended at this point, and all of the
devices are disconnected. Since you've chosen not to 'disconnect'
the devices, you'll get random updates occuring to any shared
pages (shared via grants or directly shared with Xen).
Maybe it's fine to simply zero these ptes without checking them.
I'd think not.
to clarify, the pages that have caused races in my experiments are
always the same 5: shared_info and four grant table shared pages. The
reason these don't cause races in plain save is simply that they are
unmapped before suspend is called. Since I've adjusted the kernel to
recreate these specific pages on restore (but not in the parent when
checkpoint returns), my patches do just zero out the PTEs (simulating
in the save code what had previously been done in the guest).
Finding the guest grant table pages is a little annoying though. I
ended up having the guest put the virtual address of its mapping into
an unused field in the suspend record, then walking the page table to
find the MFN. I was thinking it might be better to either get Xen to
export a list of pages that the guest has references to, or to assume
that any unowned MFNs in the page tables are either pages that will be
recreated on restore anyway and just zero them out. In short, I wonder
how often that PT race code has stopped a non-live save. If the answer
is 'never', then zeroing out the PTEs might be fine. Especially since
the original domain is still intact after the checkpoint.
Thanks again for looking this over.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
--
TAMURA, Yoshiaki
NTT Cyber Space Labs
OSS Computing Project
Kernel Group
E-mail: tamura.yoshiaki@xxxxxxxxxxxxx
TEL: (046)-859-2771
FAX: (046)-855-1152
Address: 1-1 Hikarinooka, Yokosuka
Kanagawa 239-0847 JAPAN
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|