On Fri, May 29, 2009 at 08:26:33PM +0200, Ferenc Wagner wrote:
> Hi,
>
> There's a problem I'm struggling with for quite some time in our Xen
> hosting environment. Basically, after a couple of months' smooth
> running time, suddenly most virtual machines get stuck into r state
> and stop responding to anything, including xm console and xm sysrq.
> It happens rather regularly, but I can't reproduce it by taxing the
> domUs or the dom0 with disk I/O, CPU or console I/O.
>
> However, a couple of days ago it turned out that this situation can be
> cured by restarting xenconsoled! After that, xm console spit out the
> previous random typing, sysrq help strings and whatnot for the domUs
> which weren't stuck in r state, and the stuck ones also started to
> respond and run normally (spending most of their time in b state) again.
>
> The whole phenomenon looked like xenconsoled stopped emptying the domU
> console buffers, and those domUs which were constantly writing to
> their consoles quickly filled it up and started busy-looping trying to
> put more characters onto their consoles, not caring to respond to
> ping, even. But those domUs which didn't write to their consoles,
> stayed functional until the desperate operator forced them to create
> enough console output to fill up their buffers as well, and then they
> stuck into r state just like the others. After restarting xenconsoled
> all were able to recover successfully.
>
> Of course the above is just guessing, I don't know the details of Xen
> console handling. But I wonder if it rings any bells here, or maybe
> this issue is known and fixed already. Oh, I experience this under
> Xen 3.2 and pv-ops guests (2.6.26+patches).
I've seen the exact same bug/problem with Xen in RHEL5/CentOS (5.0, 5.1, 5.2).
I believe it's also in 5.3.
I reported the problem to xen-devel, but I couldn't provide the needed
strace/backtrace to figure out the reason _why_ that happens.. (I had
already restarted xenconsoled..)
I think developers would need more information to figure out what the
actual bug is.
-- Pasi
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|