WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Questions about pass through block devices

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Questions about pass through block devices
From: Tom Mornini <tmornini@xxxxxxxxxxxxxx>
Date: Sat, 10 Mar 2007 14:15:34 -0500
Delivery-date: Sat, 10 Mar 2007 11:14:48 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
What happens to the block layer if the block request queue gets very, very long?

Imagine there are many DomUs and they are all working against VERY SLOW disks.

I wish I understood the entire chain better, and I'm sorry to ask such a vague question.

I understand this is a weird question, because the performance implications of this scenario are horrific. The reason I ask is that I have an intuitive gut feeling that the people who have experienced Dom0 crashes due to heavy disk I/O (which includes me!) on Xen 3.0.2-3.0.4 may be hitting a corner case that most users would never experience.

In our case, we have Coraid AoE SAN storage that is accessed via drivers in the Dom0s of our cluster, and CLVM LVs are passed through into the DomUs.

Early on, we made a terrible disk-layout decision that left us with very poor disk I/O performance. That poor disk I/O performance was exacerbated by the fact that nearly every DomU has two LVs attached, and one is GFS, which adds another layer of performance degradation due to DLM locking.

We used to crash a lot. Now that we've re-organized the disks, we're crashing far less often. The change was made to improve performance, and we never in a million years imagined it might be related to this crashing behavior.

It suddenly occurred to me that perhaps we're crashing less often because the block request delivery times have improved dramatically, and the queues are likely growing shorter.

The only direct evidence that I and one other list member have seen that could be related to this issue is that accessing /dev/slabinfo on a regular basis in the Dom0 appears to greatly increase the frequency of crashes, which leads me to believe that the issue, at its core, is somehow related to SLAB corruption in Dom0.

Another list member has also recently reported that this issue appears to be fixed in the unstable line. Have changes been made in unstable that would shed some light on any of this?

Thanks, and sorry for the novelette.

--
-- Tom Mornini, CTO
-- Engine Yard, Ruby on Rails Hosting
-- Reliability, Ease of Use, Scalability
-- (866) 518-YARD (9273)


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] Questions about pass through block devices, Tom Mornini <=