|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Probable Xen bug triggered by localhost migration
Once again I have had a test fail during "10 migrations of a PV domain
to localhost", with an apparent Xen or dom0 lockup or other serious
problem.
Failure modes include:
* dom0 reporting soft lockup BUGs (showing xl stuck in a privcmd
ioctl, apparently in a hypercall)
* dom0 disk controller failure due to apparent lost/stuck
interrupt (dom0 decides disk not working, tries unsuccessfully to
reset)
* apparent dom0 lockup or networking failure
Problems occur with both XCP 2.6.27 and pvops 2.6.32 kernels.
Problems seem only to happen with xl but that's likely to be because
it's due to a race; xl and xend will make various calls in different
orders and with different timing.
Having added some machinery to request Xen debug keys, I now have some
more information:
http://www.chiark.greenend.org.uk/~xensrcts/logs/5639/test-amd64-i386-xl-credit2/info.html
The most relevant files there are these:
http://www.chiark.greenend.org.uk/~xensrcts/logs/5639/test-amd64-i386-xl-credit2/14.ts-guest-localmigrate.log
That shows the failure. The test harness ssh's to the dom0 to run "xl
migrate" and gets "No route to host", which typically means it has
stopped responding to arp requests. In this particular case the
failure happened after an apparently-successful previous migration,
but the more common failure mode is that "xl migrate" prints the 0%
progress message and then nothing else gets through.
http://www.chiark.greenend.org.uk/~xensrcts/logs/5639/test-amd64-i386-xl-credit2/serial-woodlouse.log
Serial log. Scroll to around "Feb 4 03:30:35" (timestamps, and the
messages about clients connecting and disconnecting, are from the
serial concentrator).
You'll see a series of debug key outputs, which you can correlate with
the test harness's requests, listed with timestamps here:
http://www.chiark.greenend.org.uk/~xensrcts/logs/5639/test-amd64-i386-xl-credit2/15.ts-logs-capture.log
After the Xen debug keys have been run through, the test harness sends
the "q" guest debug key, which also produces the output you can see in
the serial log.
Then the test harness switches the serial back to dom0 and sends RET
and we can see dom0 produce a new login prompt. So dom0 is not
entirely dead.
However, later entries in the "ts-logs-capture" log show that it still
isn't responding to the network, and eventually the test harness
decides to power cycle the host and collect what remains from the dom0
filesystem. So that's why you see a pile of boot messages at the end
of the test log - these should be disregarded.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-devel] Probable Xen bug triggered by localhost migration,
Ian Jackson <=
|
|
|
|
|