Hi,
I'm running into a strange problem with DomU clocks after saving/restoring the
domain across a reboot of Dom0. After saving DomU, rebooting Dom0, and
restoring DomU, DomU's clock jumps into the future by an amount equal to the
previous uptime of Dom0, then freezes until the same amount of time has passed,
after which it start running normally again. This is on Xen 4.0.1, with Dom0
running Linux 2.6.32.24-xen-179eca5 (the pvops stable-2.6.32.x tree from a few
days ago), and a DomU running a vanilla paravirtualised 2.6.32.24 kernel.
Here is an example:
[root@mrhankey:~]# xm create drdoctor
Using config file "/etc/xen/drdoctor".
Started domain drdoctor (id=4)
[root@mrhankey:~]# uptime
18:47pm up 1:41, 1 user, load average: 1.04, 1.01, 1.00
[root@mrhankey:~]# ssh drdoctor date
Mon Oct 11 18:47:59 CEST 2010
Now we reboot Dom0 (which saves and restores "drdoctor"). After this the clock
in "drdoctor" is stuck in the future:
[root@mrhankey:~]# uptime
18:53pm up 0:01, 1 user, load average: 0.40, 0.15, 0.05
[root@mrhankey:~]# date
Mon Oct 11 18:53:49 CEST 2010
[root@mrhankey:~]# ssh drdoctor date
Mon Oct 11 20:33:21 CEST 2010
(wait a while...)
[root@mrhankey:~]# ssh drdoctor date
Mon Oct 11 20:33:21 CEST 2010
Note that the DomU kernel has jumped roughly 1:40 into the future, which was
Dom0's uptime prior to its reboot. The clock in DomU stays stuck at 20:33:21
until Dom0's clock reaches 20:33:21, after which it starts ticking again.
During this time, the machine is basically unusable because any time-dependent
function (such as sleep()) remains stuck.
The problem does not occur when DomU is saved and restored without a Dom0 reboot
in between. Whether NTP is running on Dom0 or DomU doesn't matter. I tried
"tsc_mode=1" (force RDTSC emulation) but it didn't have an effect. Neither did
changing the clocksource in DomU from "xen" to "tsc", or changing the date with
"date -s" on Dom0 or DomU.
The following messages in /var/log/xen/xend.log might be relevant:
(during save...)
[2010-10-11 16:48:10 2000] INFO (XendCheckpoint:423) xc_save: failed to get the
suspend evtchn port
...
(during restore...)
[2010-10-11 16:53:29 2066] INFO (XendCheckpoint:423) Reloading memory pages:
0%
[2010-10-11 16:53:34 2066] INFO (XendCheckpoint:423) ERROR Internal error: Error
when reading batch size
[2010-10-11 16:53:34 2066] INFO (XendCheckpoint:423) ERROR Internal error: error
when buffering batch, finishing
...
[2010-10-11 16:53:35 2066] INFO (XendCheckpoint:423) Restore exit with rc=0
And another time:
[2010-10-11 14:20:03 2044] INFO (XendCheckpoint:423) ERROR Internal error: Max
batch size exceeded (1970103633). Giving up.
[2010-10-11 14:20:03 2044] INFO (XendCheckpoint:423) ERROR Internal error: error
when buffering batch, finishing
These seem to suggest that the save is incomplete or corrupt. However, in all
cases the restore completes succesfully, apart from the clock issue.
Anybody have an idea what might be the cause? BTW, I'm packaging Xen for NixOS
(http://nixos.org/nixos/), which stores packages under non-standard prefixes
(i.e. not /usr), but I don't think this is an issue here.
--
Eelco Dolstra | http://www.st.ewi.tudelft.nl/~dolstra/
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|