WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] High Net and Disk Use == stuck domain

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] High Net and Disk Use == stuck domain
From: "Christopher S. Aker" <caker@xxxxxxxxxxxx>
Date: Mon, 01 Dec 2008 10:05:40 -0500
Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 01 Dec 2008 07:06:11 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4926E7DD.8040603@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4926E7DD.8040603@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.18 (Macintosh/20081105)
Christopher S. Aker wrote:
For the past year or so we've been seeing a bug whereby a domU's CPU would spin up to a steady 100, 200, 300 or 400% (4 vcpus), console would freeze, and some or all of the network-facing services within the domU would connect but block without any output. Disk IO would flatline. The domU would never recover and required rebooting.

Since pv_ops hasn't always been around, we previously had only seen this behavior with xen-patched domUs (2.6.18.x), but now we're seeing it with pv_ops. Identical symptoms. And, I have a user that is able to reliable reproduce it on 2.6.27.4!

His recipe is downloading an ISO from a very fast and close-by news server using nzbget. The trigger appears to be a combination of high network use and high disk use (like download from a very fast mirror) -- because we weren't able to reproduce the problem when saving to a tmpfs mount.

I was able to grab the output of sysrq t while it was in the bad state:

http://theshore.net/~caker/xen/BUGS/D-state/console.log

The number of processes in D state (39) is quite suspicious.

Let me know if there's anything else I can provide.

-Chris

Jeremy,

Did this one slip by you? I figured a reproducible bug would be just too tantalizing to resist.

What's the correct venue for these issues that overlap xen-devel, lkml, and virtualization/pv_ops stuff -- should I be blasting these to everybody?

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>