WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] xen 2.0.6, on_crash = 'restart' not restarting after cra

To: tim.post@xxxxxxxxxxx, xen-users@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-users] xen 2.0.6, on_crash = 'restart' not restarting after crash
From: Steve Wray <steve.wray@xxxxxxxxx>
Date: Tue, 01 May 2007 08:04:37 +1200
Delivery-date: Mon, 30 Apr 2007 13:03:21 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <1177904365.27119.302.camel@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <46350F92.1010609@xxxxxxxxx> <1177904365.27119.302.camel@xxxxxxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.5 (X11/20060813)
Tim Post wrote:
> On Mon, 2007-04-30 at 09:35 +1200, Steve Wray wrote:
>> Hi all,
>>
>> We have a xen instance (under xen 2.0.6) thats pretty unreliable; the
>> domU crashes fairly regularly.
> 
> If you must use Xen v2, try 2.0.7 (or the last 2.0-testing Mercurial).
> 2.0.7 isn't the most feature packed release but it is extremely stable.
>
> I'd really recommend upgrading to 3.0.4-testing or 3.0.5-testing (I
> think its at rc4 now) unless you depend on an older kernel version. I
> have some that have to stay at 2.0.7 until I find a better fit for PV
> open SSI clusters.

Unfortunately, for operational reasons, its a little difficult to change
the Xen version at this time. Definitely not to v3 but in the coming
month I should be able to try 2.0.7



> This really depends on Xen's ability to see the dom-u as 'crashed'.
> Typical 'crashes' on older kernels don't look much different to Xen than
> a running or blocking state.
> 
> Examples would be, if its non responsive and shown as running, the guest
> is most likely just spiraling out of control.
> 
> If its non responsive and blocking, any number of things could be going
> wrong, but Xen doesn't see it. Unless its a full out kernel panic, most
> likely Xen 2 won't see your guests crash.
> 
> Can you give more details of the crash?

Not really; there are no log entries on neither the domU nor on the dom0
which give any idea as to what has happened.

Symptoms are that the domU is no longer running. The Xen log says just
what I included; that the domain had a 'crash' and that it 'died'. The
domU does not show up in 'xm list'. There appears to be no unusual load
spike or any other unusual activity prior to the 'crash'.

I'm a little surprised that when the log entry shows:

xend.domain.exit ['domUhostname', '14', 'crash']

Xen does not interpret this as a 'crash' relative to 'on_crash'

:-/


>> The domain has indeed crashed since this was implemented and did not
>> appear to recover, at least not for the 6 minutes we gave it to restart
>> the domain:
>>
>> [2007-04-30 09:06:19 xend] INFO (XendRoot:112) EVENT> xend.domain.exit
>> ['domUhostname', '14', 'crash']
>> [2007-04-30 09:06:19 xend] INFO (XendRoot:112) EVENT>
>> xend.domain.destroy ['domUhostname', '14']
>> [2007-04-30 09:06:20 xend] INFO (XendRoot:112) EVENT> xend.domain.died
>> ['domUhostname', '14']
>> [2007-04-30 09:12:03 xend] DEBUG (XendDomainInfo:720) init_domain>
>> Created domain=15 name=domUhostname memory=1200
>> [2007-04-30 09:12:03 xend] INFO (console:94) Created console id=14
>> domain=15 port=9615


>> And are there any other things we can do to restart a domain after a crash?
> 
> Many people favor some kind of key pairing to enable a centralized
> monitor to be able to restart guests in the event of failure, even with
> newer versions of Xen, or using the API.
> 
> If you aren't depending on a very specific older patched kernel, I'd
> just move up to 3.0.4-testing. 3.0.5-testing has been pretty stable too.

Sadly, we are. There is a project underway to upgrade but that could be
months away.


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>