WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] dom0 freezing

To: Bernhard Schmidt <berni@xxxxxxxxxxxxx>
Subject: Re: [Xen-users] dom0 freezing
From: Stephan Seitz <s.seitz@xxxxxxxxxxxx>
Date: Sun, 08 Jun 2008 18:20:50 +0200
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Sun, 08 Jun 2008 09:19:54 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <g2gtvf$8l6$1@xxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Organization: netz-haut e.K.
References: <g2gtvf$8l6$1@xxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.14 (X11/20080505)
Looks like headaches ;)

Anyway, I found the ubunty gutsy and ubuntu hardy (x32_64) kernels bleeding
edge, but NOT stable for production use...

If your hoster did have changed all relevant parts, I assume there are some
obscure BIOS settings "changed". In general, I would suggest to move from
gaming hardware to server hardware ;)

Based on Xensource's 2.6.18-8, I've made our "primary" kernel with added
Areca Raid and fixed 3ware Raid support. I found it rock-solid on a bunch
of different Xeon and Opteron driven boards, though it lacks disklabel
support.

Feel free to take it (and get a higher MTBF for debugging...)
http://boreas.netz-haut.net/pub/kernelpack-2.6.18.8-xen-2008.tar.gz

Cheers,

Stephan



Bernhard Schmidt schrieb:
Hello everyone,

I have an extremely annoying freeze problem with Xen that I can't get
fixed or at least debugged. It's a bit of a long story.

I ordered a x86_64 based coloserver middle of last year to run Xen and a
couple of personal domU on it. The box kept freezing all the time, I
tried a lot of things to debug it and I could not get a hold of it. The
description of this setup is in
http://thread.gmane.org/gmane.comp.emulators.xen.user/25347/focus=25500
and http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1007 .

Shortly after those mails (middle of July) after my hoster had swapped
each and every part in this box they finally replaced the previous VIA
based board for one with an AMD/ATI chipset and suddenly the box was
rock stable. During the last 10 months I did not have a single crash. It
ran with a self-compiled 3.1.0 first, was then changed to a Debian lenny
userland and hypervisor, did get a self-compiled dom0 kernel based on
Ubuntu Gutsy in January, the fresh Debian 3.2.1 hypervisor end of May.
No problems whatsoever.

A few days ago the box crashed and did not come back online, even after
issueing a hardware reset command. The IP-KVM my hoster connected showed
that the box was waiting for a keypress in BIOS saying POST was
interrupted before which might be caused by OverClocking (not in use,
definitely). When you pressed a key the box booted fine but crashed
within minutes, again dying in the BIOS. Definitely a hardware defect.
After almost all parts were replaced (CPU, RAM, power supply, fans) the
box did not crash in BIOS anymore, but suddenly started to experience
the dom0 hangs again. The software setup had not been changed since
January (the Gutsy kernel installation) and had been rebooted a couple
of times after that for maintenance, so it should definitely be fine.

I thought that maybe the board was faulty and got it changed to another
one, an nForce 560 based MSI-K9N NEO-F V3. Still, the same crashes.
Except for the harddisk the hardware has been completely replaced.

I tried changing the dom0 kernel to the Ubuntu Hardy 2.6.24-18-xen
distribution kernel, I tried numerous boot options for the Hypervisor
(noacpi, nolapic, watchdog) and the dom0 kernel (swiotlb, now trying
acpi=off and noapic). The problem is always the same, after some hours
the box freezes. There are no error messages in the log or on the
console, nothing. I still cannot send the 3*Ctrl-a to the box using the
IP-KVM so I can't tell whether dom0 or the hypervisor crashed, but I can
tell that nothing whatsoever responds anymore.

Does anyone have any idea how to debug this further? Any options I might
try to at least better understand this issue?

svr01:~# dpkg -l | grep xen
ii  libxenstore3.0                       3.2.1-1
ii  linux-image-2.6.24-18-xen            2.6.24-18.32            Linux
ii  xen-hypervisor-3.2-1-amd64           3.2.1-1                 The Xen
ii  xen-tools                            3.9-3                   Tools
ii  xen-utils-3.2-1                      3.2.1-1                 XEN
ii  xen-utils-common                     3.2.0-2                 XEN
ii  xenstore-utils                       3.2.1-1

Bernhard


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


--
Stephan Seitz
Senior System Administrator

*netz-haut* e.K.
multimediale kommunikation

zweierweg 22
97074 würzburg

fon: +49 931 2876247
fax: +49 931 2876248

web: www.netz-haut.de <http://www.netz-haut.de/>

registriergericht: amtsgericht würzburg, hra 5054

Attachment: s_seitz.vcf
Description: Vcard

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>