WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] High Net and Disk Use == stuck domain

To: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] High Net and Disk Use == stuck domain
From: "Christopher S. Aker" <caker@xxxxxxxxxxxx>
Date: Fri, 21 Nov 2008 11:54:53 -0500
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Delivery-date: Fri, 21 Nov 2008 08:55:19 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.18 (Macintosh/20081105)
For the past year or so we've been seeing a bug whereby a domU's CPU would spin up to a steady 100, 200, 300 or 400% (4 vcpus), console would freeze, and some or all of the network-facing services within the domU would connect but block without any output. Disk IO would flatline. The domU would never recover and required rebooting.

Since pv_ops hasn't always been around, we previously had only seen this behavior with xen-patched domUs (2.6.18.x), but now we're seeing it with pv_ops. Identical symptoms. And, I have a user that is able to reliable reproduce it on 2.6.27.4!

His recipe is downloading an ISO from a very fast and close-by news server using nzbget. The trigger appears to be a combination of high network use and high disk use (like download from a very fast mirror) -- because we weren't able to reproduce the problem when saving to a tmpfs mount.

I was able to grab the output of sysrq t while it was in the bad state:

http://theshore.net/~caker/xen/BUGS/D-state/console.log

The number of processes in D state (39) is quite suspicious.

Let me know if there's anything else I can provide.

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>