WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] disk io errors possibly caused by high network load?

To: Moritz Möller <m.moeller@xxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] disk io errors possibly caused by high network load?
From: "Ian Pratt" <Ian.Pratt@xxxxxxxxxxxxx>
Date: Fri, 19 Sep 2008 13:44:09 +0100
Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>
Delivery-date: Fri, 19 Sep 2008 05:44:54 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <122AA196D7CE4E4DBB92911EF4AB5AB3846045@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <122AA196D7CE4E4DBB92911EF4AB5AB3846045@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AckaTGDzATGLlbxxRXaBZvbEdrbgFgACImAQ
Thread-topic: [Xen-devel] disk io errors possibly caused by high network load?
> we had a very strange situation yesterday. In one second, 13 of 25 xen
> boxes died with disk errors (domU and dom0, something like end_request:
> I/O error dev hda sector ...), but worked well again after a reboot.
> 
> Some minutes before a technician plugged in a wrong cable, creating a
> network loop - so the error could be caused by a high network io load.
> The disks are okay, and the error occurred with both scsi raid
> controllers and plain sata disks.

This is quite remarkable -- I don't think anyone has reported anything similar 
before, despite there being many large xen deployments.

Are you saying that IO errors were reported from both dom0 and the domU's? 

Did you actually track down the specific device major/minor that was reporting 
the error?

Is there any network storage (e.g. iSCSI, AOE) in your setup?

Ian 

> Here is some info of a host that crashed:
> 
> root/mmoeller@srv002050:/root$ xm info
> host                   : srv002050
> release                : 2.6.21-2950.fc8xen
> version                : #1 SMP Tue Oct 23 12:23:33 EDT 2007
> machine                : x86_64
> nr_cpus                : 8
> nr_nodes               : 1
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 1866
> hw_caps                :
> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
> total_memory           : 8190
> free_memory            : 12
> node_to_cpu            : node0:0-7
> xen_major              : 3
> xen_minor              : 2
> xen_extra              : .0
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : unavailable
> cc_compiler            : gcc version 4.1.2 20061115 (prerelease)
> (Debian
> 4.1.1-21)
> cc_compile_by          : root
> cc_compile_domain      : office.bigpoint.net
> cc_compile_date        : Tue Mar 11 13:57:28 CET 2008
> xend_config_format     : 4
> root/mmoeller@srv002050:/root$ uname -r
> 2.6.21-2950.fc8xen
> 
> And here of a host that did not crash:
> 
> root/mmoeller@srv006215:/root$ xm info
> host                   : srv006215
> release                : 2.6.21-2950.fc8xen
> version                : #1 SMP Tue Oct 23 12:23:33 EDT 2007
> machine                : x86_64
> nr_cpus                : 4
> nr_nodes               : 1
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 2394
> hw_caps                :
> bfebfbff:20100800:00000000:00000140:0000e3bd:00000000:00000001
> total_memory           : 8190
> free_memory            : 10
> node_to_cpu            : node0:0-3
> xen_major              : 3
> xen_minor              : 2
> xen_extra              : .0
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : unavailable
> cc_compiler            : gcc version 4.1.2 20061115 (prerelease)
> (Debian
> 4.1.1-21)
> cc_compile_by          : root
> cc_compile_domain      : office.bigpoint.net
> cc_compile_date        : Tue Mar 11 13:57:28 CET 2008
> xend_config_format     : 4
> root/mmoeller@srv006215:/root$ uname -r
> 2.6.21-2950.fc8xen
> 
> Does someone have an idea how this could happen?
> 
> 
> Thanks,
> 
> 
> Moritz
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel