[Xen-devel] Re: Known console(d) bug?

To:	xen-devel@xxxxxxxxxxxxxxxxxxx
Subject:	[Xen-devel] Re: Known console(d) bug?
From:	Ferenc Wagner <wferi@xxxxxxx>
Date:	Sat, 30 May 2009 01:06:38 +0200
Delivery-date:	Fri, 29 May 2009 16:07:05 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<20090529215301.GU24960@xxxxxxxxxxxxxxx> (Pasi Kärkkäinen's message of "Sat, 30 May 2009 00:53:01 +0300")
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<87eiu74vfq.fsf@xxxxxxxxxxxxx> <20090529215301.GU24960@xxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)

Pasi Kärkkäinen <pasik@xxxxxx> writes:

> On Fri, May 29, 2009 at 08:26:33PM +0200, Ferenc Wagner wrote:
> 
>> There's a problem I'm struggling with for quite some time in our Xen
>> hosting environment.  Basically, after a couple of months' smooth
>> running time, suddenly most virtual machines get stuck into r state
>> and stop responding to anything, including xm console and xm sysrq.
>> It happens rather regularly, but I can't reproduce it by taxing the
>> domUs or the dom0 with disk I/O, CPU or console I/O.
>> 
>> However, a couple of days ago it turned out that this situation can be
>> cured by restarting xenconsoled!  After that, xm console spit out the
>> previous random typing, sysrq help strings and whatnot for the domUs
>> which weren't stuck in r state, and the stuck ones also started to
>> respond and run normally (spending most of their time in b state) again.
>> 
>> The whole phenomenon looked like xenconsoled stopped emptying the domU
>> console buffers, and those domUs which were constantly writing to
>> their consoles quickly filled it up and started busy-looping trying to
>> put more characters onto their consoles, not caring to respond to
>> ping, even.  But those domUs which didn't write to their consoles,
>> stayed functional until the desperate operator forced them to create
>> enough console output to fill up their buffers as well, and then they
>> stuck into r state just like the others.  After restarting xenconsoled
>> all were able to recover successfully.
>> 
>> Of course the above is just guessing, I don't know the details of Xen
>> console handling.  But I wonder if it rings any bells here, or maybe
>> this issue is known and fixed already.  Oh, I experience this under
>> Xen 3.2 and pv-ops guests (2.6.26+patches).
>
> I've seen the exact same bug/problem with Xen in RHEL5/CentOS (5.0, 5.1, 
> 5.2). 
> I believe it's also in 5.3. 
>
> I reported the problem to xen-devel, but I couldn't provide the needed
> strace/backtrace to figure out the reason _why_ that happens.. (I had
> already restarted xenconsoled..)
>
> I think developers would need more information to figure out what the
> actual bug is. 

Indeed I found your report now.  This means you're running for almost
a year without experiencing this!  I get it much more often, but still
pretty rarely.  I also noticed that the more or less regular

WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too 
long to execute: 200 ms (> 50 ms) (GSource: 0x811bf80)

messages from heartbeat came 50 times more often while xenstored was
stuck (it didn't take any significant CPU at least).  However, four
domUs in constantly r state surely sucked up all the CPU power of the
4-way host machine.

And this phenomenon is always triggered by some extra load, typically
by tiger starting an md5sum check of the installed packages at the
same time on a couple of domUs.  (Btw. doesn't some randomized crond
exist for helping this in general?)
-- 
Cheers,
Feri.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: Known console(d) bug?