WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] random reboots on Debian Squeeze 6.0.2.1 + Xen 4.02 on top o

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] random reboots on Debian Squeeze 6.0.2.1 + Xen 4.02 on top of OCFS2 1.4.4-3
From: Benjamin Weaver <benjamin.weaver@xxxxxxxxxxxxx>
Date: Thu, 04 Aug 2011 14:50:08 +0100
Delivery-date: Thu, 04 Aug 2011 06:51:25 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11
This might be related to a posting a couple of days ago on random reboots, but the problem arises from a different environment and situation.

We are running a two-node cluster. Both nodes run Debian Squeeze 6.0.2.1 + Xen 4.02 on top of OCFS2 1.4.4-3. Kernel is 2.6.32-5-xen-amd64. Both nodes store and run vms on the ocfs2 partition, which is accessed from the 2 boxes via ISCSI.  We run a network stress test in which the 2 vms pass a large file between them. One vm has an nfs share with the file in it, and the other vm copies this file (arbitrarily, a large, 4.6 Gb debian.iso file) to and from the nfs file share to its own local directory. Currently, network configuration giving us no problems--no lost packets, collisions, etc.

The vms are lucid instances (ubuntu 10.04) created by the following command:

sudo xen-create-image --hostname lucidxentest --ip 163.1.86.9 --pygrub
+ xen-tools.conf params-- size = 8 Gb, image = full, mem. = 512, swap = 512


The stress proceeds successfully for anywhere from 1 to 12 hours, then the system reboots. The file move has been interrupted, the vms crashed, with one of the nodes rebooted.

I have noticed occasional reporting of a kernel error (linux/mm/slub.c 2969!), similar to a Debian bug (#634047). But I find no firm correlation, as often kern.log and messages logs do not usually report this kernel error.

Some things I have tried:

a basic reinstall of the all the components of the system (squeeze + xen + ocfs2)

a memtest on both nodes. (no problems).

changing the default Debian IO scheduler in combination with ocfs2: cfq, deadline, anticipatory, no op.

currently investigating, but have not yet investigated, adjusting: (1) halt state set in BIOS; (2) setting of cpufreq=dom0-kernel, frequency scaling.


Any suggestions are welcome!

Ben Weaver

 







    


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] random reboots on Debian Squeeze 6.0.2.1 + Xen 4.02 on top of OCFS2 1.4.4-3, Benjamin Weaver <=