xen-devel

[Top] [All Lists]

Re: [Xen-devel] Prepping a bugfix push

from [Ian Campbell]

[Permanent Link][Original]

To:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Brendan Cully <brendan@xxxxxxxxx>
Subject:	Re: [Xen-devel] Prepping a bugfix push
From:	Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Date:	Fri, 4 Dec 2009 15:50:26 +0000
Cc:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Fri, 04 Dec 2009 07:50:58 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<1259912810.31045.175.camel@xxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization:	Citrix Systems, Inc.
References:	<4B1810DF.40309@xxxxxxxx> <20091203193540.GB4228@xxxxxxxxxxxxxxxxx> <4B184830.7070107@xxxxxxxx> <20091204002406.GB5897@xxxxxxxxxxxxxxxxx> <4B186192.8000201@xxxxxxxx> <1259912810.31045.175.camel@xxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
> I've been doing regular suspend/resumes not checkpoint ones as Brendan
> is doing, I did try a couple of checkpointed ones yesterday and they
> failed, IIRC with a similar softlockup to this one. 

So what is happening is that the device event channels are getting torn
down by the resume handler and never completely reinstated in the
cancelled suspend (aka checkpoint) case.

Taking blkfront as an example (although I expect netfront has a similar
issue), what see is: blkfront_resume calls blkif_free which calls
unbind_from_irqhandler, which causes EVTCHNOP_close and unbind the IRQ
from that IRQ (my patch "don't leak IRQs over suspend/resume" did not
change this behaviour in the cancelled suspend case). Then
blkfront_resume calls talk_to_backend which calls setup_blkring which
allocates a new event channel and binds it to a new IRQ. So far so good.
However because the domain never really goes away in the checkpoint case
the backend never notices that it needs to rebind to this new event
channel.

so, using lsevtchn on domain 0 I see:
        --- BEFORE.0    2009-12-04 14:47:58.000000000 +0000
        +++ AFTER.0     2009-12-04 14:48:55.000000000 +0000
        @@ -39,5 +39,5 @@
           39: VCPU 0: Virtual IRQ 3
           40: VCPU 0: Interdomain (Connected) - Remote Domain 1, Port 1
           41: VCPU 0: Interdomain (Connected) - Remote Domain 1, Port 2
        -  42: VCPU 0: Interdomain (Connected) - Remote Domain 1, Port 13
        -  43: VCPU 0: Interdomain (Connected) - Remote Domain 1, Port 14
        +  42: VCPU 0: Interdomain (Waiting connection) - Remote Domain 1
        +  43: VCPU 0: Interdomain (Waiting connection) - Remote Domain 1
and on the guest I see:
        --- BEFORE.U    2009-12-04 14:48:02.000000000 +0000
        +++ AFTER.U     2009-12-04 14:48:53.000000000 +0000
        @@ -10,5 +10,4 @@
           10: VCPU 1: IPI
           11: VCPU 1: Virtual IRQ 1
           12: VCPU 1: IPI
        -  13: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 42
        -  14: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 43
        +  13: VCPU 0: Interdomain (Waiting connection) - Remote Domain 0
(guest event channel 13 happens to be block front in both cases)

In 2.6.18 there was a separate ->suspend_cancel() callback for each
driver, called instead of the ->resume() callback in exactly these
circumstances. The cancel callback doesn't do any of the teardown, in
fact for blkfront it doesn't even exist.

(As a proof of concept, commenting out the entire contents of
blkfront_resume and netfront_resume makes checkpointing work OK for me,
at the cost of breaking regular resume, of course)

pv-ops uses the generic power management infrastructure which does not
have a concept of cancelling a suspend. Perhaps it should? Otherwise a
different solution will be required, I'm not sure what that might be yet
yet.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Brendan Cully Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Brendan Cully Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Brendan Cully Re: [Xen-devel] Prepping a bugfix push, Ian Campbell Re: [Xen-devel] Prepping a bugfix push, Ian Campbell <= Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Ian Campbell Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Brendan Cully Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge [Xen-devel] Re: Prepping a bugfix push, Ian Campbell Re: [Xen-devel] Re: Prepping a bugfix push, Konrad Rzeszutek Wilk Re: [Xen-devel] Re: Prepping a bugfix push, Jeremy Fitzhardinge

Previous by Date:	Re: [Xen-devel] Production Use of the PV OPS Kernel, Jeremy Fitzhardinge
Next by Date:	Re: [Xen-devel] [PATCH] libxl: build fix, Ian Jackson
Previous by Thread:	Re: [Xen-devel] Prepping a bugfix push, Ian Campbell
Next by Thread:	Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge
Indexes:	[Date] [Thread] [Top] [All Lists]