Keir Fraser <keir.fraser@xxxxxxxxxxxxx> writes:
> On 29/05/2009 22:53, "Pasi Kärkkäinen" <pasik@xxxxxx> wrote:
>
>> I've seen the exact same bug/problem with Xen in RHEL5/CentOS (5.0, 5.1,
>> 5.2).
>> I believe it's also in 5.3.
>>
>> I reported the problem to xen-devel, but I couldn't provide the needed
>> strace/backtrace to figure out the reason _why_ that happens.. (I had
>> already restarted xenconsoled..)
>>
>> I think developers would need more information to figure out what the
>> actual bug is.
>
> Yes, I think any kind of xenconsoled hang can eventually result in guests
> spinning waiting for their console buffers to be emptied. It might be
> interesting to build xenconsoled with debug symbols (-g compile option) and
> attach gdb when it gets in this state. Without that kind of info it'll be
> hard to track down.
I haven't had the opportunity to run xenconsoled with debugging
enabled yet, but the disaster stroke again while I was on holiday. My
co-workers restarted some stuck domains, but left a couple around.
Attaching strace to xenconsoled showed a pretty large timeout on select:
select(43, [6 8 9 11 12 14 15 18 20 21 24 26 27 29 30 32 33 35 36 38 39 41 42],
[9 12 21 24], NULL, {4144869, 572000} <unfinished ...>
which may or may not be a clue. The lsof output seemed reasonable:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
xenconsol 4566 root cwd DIR 253,4 4096 128 /
xenconsol 4566 root rtd DIR 253,4 4096 128 /
xenconsol 4566 root txt REG 253,2 21296 577488
/usr/lib/xen-3.2-1/bin/xenconsoled
xenconsol 4566 root mem REG 0,3 2147483647 /proc/xen/privcmd
(path inode=4026533301)
xenconsol 4566 root mem REG 253,4 116414 3175190
/lib/i686/cmov/libpthread-2.7.so
xenconsol 4566 root mem REG 253,4 1413540 3170117
/lib/i686/cmov/libc-2.7.so
xenconsol 4566 root mem REG 253,2 15300 2621918
/usr/lib/libxenstore.so.3.0.0
xenconsol 4566 root mem REG 253,2 71684 3217152
/usr/lib/xen-3.2-1/lib/libxenctrl.so
xenconsol 4566 root mem REG 253,4 9684 3175197
/lib/i686/cmov/libutil-2.7.so
xenconsol 4566 root mem REG 253,4 113248 1050535 /lib/ld-2.7.so
xenconsol 4566 root 0u CHR 1,3 936 /dev/null
xenconsol 4566 root 1u CHR 1,3 936 /dev/null
xenconsol 4566 root 2u CHR 1,3 936 /dev/null
xenconsol 4566 root 3uW REG 253,3 5 1573306
/var/run/xenconsoled.pid
xenconsol 4566 root 4u unix 0xcfb47180 10030 socket
xenconsol 4566 root 5u REG 0,3 0 4026533301 /proc/xen/privcmd
xenconsol 4566 root 6r FIFO 0,6 10032 pipe
xenconsol 4566 root 7w FIFO 0,6 10032 pipe
xenconsol 4566 root 8u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 9u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 10u CHR 136,1 3 /dev/pts/1
xenconsol 4566 root 11u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 12u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 13u CHR 136,2 4 /dev/pts/2
xenconsol 4566 root 14u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 15u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 16u CHR 136,3 5 /dev/pts/3
xenconsol 4566 root 17u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 18u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 19u CHR 136,4 6 /dev/pts/4
xenconsol 4566 root 20u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 21u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 22u CHR 136,5 7 /dev/pts/5
xenconsol 4566 root 23u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 24u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 25u CHR 136,6 8 /dev/pts/6
xenconsol 4566 root 26u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 27u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 28u CHR 136,7 9 /dev/pts/7
xenconsol 4566 root 29u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 30u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 31u CHR 136,8 10 /dev/pts/8
xenconsol 4566 root 32u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 33u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 34u CHR 136,9 11 /dev/pts/9
xenconsol 4566 root 35u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 36u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 37u CHR 136,10 12 /dev/pts/10
xenconsol 4566 root 38u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 39u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 40u CHR 136,11 13 /dev/pts/11
xenconsol 4566 root 41u CHR 10,63 1491 /dev/xen/evtchn
xenconsol 4566 root 42u CHR 5,2 1538 /dev/ptmx
xenconsol 4566 root 43u CHR 136,12 14 /dev/pts/12
After restarting xenconsoled, the stuck domain said:
[1052088.070488] BUG: soft lockup - CPU#0 stuck for 136469s! [nscd:1796]
pretty much as expected. I still plan to investigate this, but
sending now just in case it rings a bell somewhere...
--
Regards,
Feri.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|