WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] megasas stops I/O when running kernel as dom0 under xen4

To: Andreas Olsowski <andreas.olsowski@xxxxxxxxxxx>
Subject: Re: [Xen-devel] megasas stops I/O when running kernel as dom0 under xen4.1/4.2
From: Pasi Kärkkäinen <pasik@xxxxxx>
Date: Fri, 12 Aug 2011 19:25:51 +0300
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 12 Aug 2011 09:28:23 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4E43E04B.8010401@xxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4E43E04B.8010401@xxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
On Thu, Aug 11, 2011 at 03:59:39PM +0200, Andreas Olsowski wrote:
> Hello xen-devel,
>

Hello,

> as one of the people using Dell Servers i am aware that the LSI megaraid  
> drivers are quite old in the current 2.6.32 pvops tree,
> but it seems that, once again, i have run into problems that are more  
> rare than the usual "cant find disk" issues. (Of which i had none, ever)
>

Btw did you see this thread about lsi drivers and 2.6.32: 
http://lists.xensource.com/archives/html/xen-devel/2010-11/msg00250.html

I've been successfully using version 4.3x megaraid_sas drivers.. 
(Latest available from LSI's support site).

-- Pasi

>
> The situation:
> --------------
> I have 2 dom0 kernels, 2.6.32.44 and 3.0.1 that work fine when booted  
> bare-metal. I can run stress -m 40 -d 4 -i 1 for hours on end without  
> any error occuring.
> The 2.6.32.44 kernels use version 00.00.05.30 megasas modules.
>
> When i boot that kernel on my R610 servers under xen (4.1 and 4.2) the  
> kernels work fine too. I create 10 virtual machines, each running 4  
> "stress -m 40" and can do disk i/o on my local storage as much as i want 
> to.
>
> But on my Dell R710 system things dont look so good.
> Booted bare-metal, both kernels work fine.
> When i boot them as dom0 under xen, everything seems to be okay at first.
> Then i create my 10 virtual machines that put some load on the memory.
> And as soon as i do i/o to the local disk, even a "ls /usr/src/" can  
> suffice, i/o freezes, the system stops to respond to anything that  
> requires disk acccess.
> After a while the kernel will start spewing out error messages:
>
> #### lots of these
> sd 0:2:0:0: [sda] megasas: RESET -83318 cmd=2a retries=0
> megaraid_sas: HBA reset handler invoked without an internal reset condition.
> megasas: [ 0]waiting for 16 commands to complete
> megaraid_sas: no more pending commands remain after reset handling.
> megasas: reset successful
> ###
>
> ### then some of these
> sd 0:2:0:0: Device offlined - not ready after error recovery
> ###
>
> ### goes on to
> sd 0:2:0:0: [sda] Unhandled error code
> sd 0:2:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> sd 0:2:0:0: [sda] CDB: Write(10): 2a 00 08 45 6f 00 00 01 88 00
> end_request: I/O error, dev sda, sector 138768128
> Buffer I/O error on device sda2, logical block 5138912
> lost page write due to I/O error on sda2
> Buffer I/O error on device sda2, logical block 5138913
> ###
>
> ### and finally these, as often as one tries to access the disk
> sd 0:2:0:0: rejecting I/O to offline device
> sd 0:2:0:0: rejecting I/O to offline device
> sd 0:2:0:0: rejecting I/O to offline device
>
>
> If a kernel works fine on one set of servers (Dell R610 with LSI Logic /  
> Symbios Logic LSI MegaSAS 9260 (rev 05) raid controllers) and crashes on  
> another server (Dell R710 with a LSI Logic / Symbios Logic MegaRAID SAS  
> 1078 (rev 04) raid controller),
> it would seem logical to assume, that the kernel does not support the  
> hardware properly.
> But when run bare-metal, no errors occur.
>
> I for one ran out of things to try, the R710 worked fine before i  
> upgraded its firmware to the most current versions and went from  
> xen4.0.1 to xen4.1/4.2.
>
> So i put it to you, fine sirs of xen-devel:
> is it:
> A.) a hardware problem, because the software works on different hardware
> or
> B.) a xen problem, because the hardware runs fine in a non-virtualized  
> scenario with the same kernel
>
> Or is it something else entirely?
>
> Help, input, questions and suggestions are, as always, greatly appreciated.
>
>
> With best regards
>
> -- 
> Andreas Olsowski
> Leuphana Universität Lüneburg
> Rechen- und Medienzentrum
> Scharnhorststraße 1, C7.015
> 21335 Lüneburg
>
> Tel: ++49 4131 677 1309
>



> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel