WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] High Net and Disk Use == stuck domain

To: "Christopher S. Aker" <caker@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] High Net and Disk Use == stuck domain
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Mon, 01 Dec 2008 12:19:50 -0800
Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 01 Dec 2008 12:20:18 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4933FD44.7050101@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4926E7DD.8040603@xxxxxxxxxxxx> <4933FD44.7050101@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.18 (X11/20081119)
Christopher S. Aker wrote:
Christopher S. Aker wrote:
For the past year or so we've been seeing a bug whereby a domU's CPU would spin up to a steady 100, 200, 300 or 400% (4 vcpus), console would freeze, and some or all of the network-facing services within the domU would connect but block without any output. Disk IO would flatline. The domU would never recover and required rebooting.

Since pv_ops hasn't always been around, we previously had only seen this behavior with xen-patched domUs (2.6.18.x), but now we're seeing it with pv_ops. Identical symptoms. And, I have a user that is able to reliable reproduce it on 2.6.27.4!

His recipe is downloading an ISO from a very fast and close-by news server using nzbget. The trigger appears to be a combination of high network use and high disk use (like download from a very fast mirror) -- because we weren't able to reproduce the problem when saving to a tmpfs mount.

I was able to grab the output of sysrq t while it was in the bad state:

http://theshore.net/~caker/xen/BUGS/D-state/console.log

The number of processes in D state (39) is quite suspicious.

Let me know if there's anything else I can provide.

-Chris

Jeremy,

Did this one slip by you? I figured a reproducible bug would be just too tantalizing to resist.

Hoping it would go away by itself? ;)

I'm trying to repro it now, copying ISOs at 25 Mbytes/sec. How long does it take to happen?

What's the correct venue for these issues that overlap xen-devel, lkml, and virtualization/pv_ops stuff -- should I be blasting these to everybody?

Me and xen-devel are a good start, and posting in a bugzilla cc:ing me if it looks like its been dropped on the floor.


   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>