On Thu, Aug 11, 2011 at 03:59:39PM +0200, Andreas Olsowski wrote:
> Hello xen-devel,
>
Hello,
> as one of the people using Dell Servers i am aware that the LSI megaraid
> drivers are quite old in the current 2.6.32 pvops tree,
> but it seems that, once again, i have run into problems that are more
> rare than the usual "cant find disk" issues. (Of which i had none, ever)
>
Btw did you see this thread about lsi drivers and 2.6.32:
http://lists.xensource.com/archives/html/xen-devel/2010-11/msg00250.html
I've been successfully using version 4.3x megaraid_sas drivers..
(Latest available from LSI's support site).
-- Pasi
>
> The situation:
> --------------
> I have 2 dom0 kernels, 2.6.32.44 and 3.0.1 that work fine when booted
> bare-metal. I can run stress -m 40 -d 4 -i 1 for hours on end without
> any error occuring.
> The 2.6.32.44 kernels use version 00.00.05.30 megasas modules.
>
> When i boot that kernel on my R610 servers under xen (4.1 and 4.2) the
> kernels work fine too. I create 10 virtual machines, each running 4
> "stress -m 40" and can do disk i/o on my local storage as much as i want
> to.
>
> But on my Dell R710 system things dont look so good.
> Booted bare-metal, both kernels work fine.
> When i boot them as dom0 under xen, everything seems to be okay at first.
> Then i create my 10 virtual machines that put some load on the memory.
> And as soon as i do i/o to the local disk, even a "ls /usr/src/" can
> suffice, i/o freezes, the system stops to respond to anything that
> requires disk acccess.
> After a while the kernel will start spewing out error messages:
>
> #### lots of these
> sd 0:2:0:0: [sda] megasas: RESET -83318 cmd=2a retries=0
> megaraid_sas: HBA reset handler invoked without an internal reset condition.
> megasas: [ 0]waiting for 16 commands to complete
> megaraid_sas: no more pending commands remain after reset handling.
> megasas: reset successful
> ###
>
> ### then some of these
> sd 0:2:0:0: Device offlined - not ready after error recovery
> ###
>
> ### goes on to
> sd 0:2:0:0: [sda] Unhandled error code
> sd 0:2:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> sd 0:2:0:0: [sda] CDB: Write(10): 2a 00 08 45 6f 00 00 01 88 00
> end_request: I/O error, dev sda, sector 138768128
> Buffer I/O error on device sda2, logical block 5138912
> lost page write due to I/O error on sda2
> Buffer I/O error on device sda2, logical block 5138913
> ###
>
> ### and finally these, as often as one tries to access the disk
> sd 0:2:0:0: rejecting I/O to offline device
> sd 0:2:0:0: rejecting I/O to offline device
> sd 0:2:0:0: rejecting I/O to offline device
>
>
> If a kernel works fine on one set of servers (Dell R610 with LSI Logic /
> Symbios Logic LSI MegaSAS 9260 (rev 05) raid controllers) and crashes on
> another server (Dell R710 with a LSI Logic / Symbios Logic MegaRAID SAS
> 1078 (rev 04) raid controller),
> it would seem logical to assume, that the kernel does not support the
> hardware properly.
> But when run bare-metal, no errors occur.
>
> I for one ran out of things to try, the R710 worked fine before i
> upgraded its firmware to the most current versions and went from
> xen4.0.1 to xen4.1/4.2.
>
> So i put it to you, fine sirs of xen-devel:
> is it:
> A.) a hardware problem, because the software works on different hardware
> or
> B.) a xen problem, because the hardware runs fine in a non-virtualized
> scenario with the same kernel
>
> Or is it something else entirely?
>
> Help, input, questions and suggestions are, as always, greatly appreciated.
>
>
> With best regards
>
> --
> Andreas Olsowski
> Leuphana Universität Lüneburg
> Rechen- und Medienzentrum
> Scharnhorststraße 1, C7.015
> 21335 Lüneburg
>
> Tel: ++49 4131 677 1309
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|