xen-devel

[Top] [All Lists]

Re: [Xen-devel] Prepping a bugfix push

from [Jeremy Fitzhardinge]

[Permanent Link][Original]

To:	Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Subject:	Re: [Xen-devel] Prepping a bugfix push
From:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date:	Fri, 04 Dec 2009 16:05:48 -0800
Cc:	Brendan Cully <brendan@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Fri, 04 Dec 2009 16:06:12 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<1259945803.2554.8.camel@xxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<4B1810DF.40309@xxxxxxxx> <20091203193540.GB4228@xxxxxxxxxxxxxxxxx> <4B184830.7070107@xxxxxxxx> <20091204002406.GB5897@xxxxxxxxxxxxxxxxx> <4B186192.8000201@xxxxxxxx> <1259912810.31045.175.camel@xxxxxxxxxxxxxxxxxxxxx> <1259941826.23698.16421.camel@xxxxxxxxxxxxxxxxxxxxxx> <4B193AB6.2080203@xxxxxxxx> <1259945803.2554.8.camel@xxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091125 Fedora/3.0-3.12.rc1.fc12 Lightning/1.0pre Thunderbird/3.0

On 12/04/09 08:56, Ian Campbell wrote:

On Fri, 2009-12-04 at 16:37 +0000, Jeremy Fitzhardinge wrote:

On 12/04/09 07:50, Ian Campbell wrote:

On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:

I've been doing regular suspend/resumes not checkpoint ones as Brendan
is doing, I did try a couple of checkpointed ones yesterday and they
failed, IIRC with a similar softlockup to this one.

So what is happening is that the device event channels are getting torn
down by the resume handler and never completely reinstated in the
cancelled suspend (aka checkpoint) case.

Hm.

In 2.6.18 there was a separate ->suspend_cancel() callback for each
driver, called instead of the ->resume() callback in exactly these
circumstances. The cancel callback doesn't do any of the teardown, in
fact for blkfront it doesn't even exist.

(As a proof of concept, commenting out the entire contents of
blkfront_resume and netfront_resume makes checkpointing work OK for me,
at the cost of breaking regular resume, of course)

pv-ops uses the generic power management infrastructure which does not
have a concept of cancelling a suspend. Perhaps it should? Otherwise a
different solution will be required, I'm not sure what that might be yet
yet.

Well, the obvious one is to treat it as a full suspend followed by
immediate resume.  That is, just remove all the special case handling
for checkpoint, and let it do the normal resume stuff when the hypercall
returns.

I'm not sure how much that will help, some of the resume stuff relies on
the domain actually changing underneath, i.e. the backends are torn down
and resetup by the tools and therefore expect a fresh reconnection, the
hypervisor side of event channels is implicitly reset (the kernel just
resets its own state) etc. None of these things happen during a
checkpoint. Presumably those who are interested in checkpointing would
prefer them not to happen in order to remain fast.


Yes, that's certainly all possible with some biggish performance hit...

How about this: if its a checkpoint, then don't bother calling all theresume functions. We may need to call the device model resume just tokeep everyone sane and happy, but at the xenbus nodes, filter out thecalls to the drivers.

I think the PM core can fail to suspend; it just resumes anything that
has been suspended so far.

An optional separate hook for that case (called in preference to
->resume) might be acceptable upstream? Adding a parameter to the
->resume handler itself might also be acceptable but would involve more
churn.

The whole area is so fragile and fraught, I don't really want to getinto it.


    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [Xen-devel] Prepping a bugfix push, (continued) Re: [Xen-devel] Prepping a bugfix push, Brendan Cully Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Brendan Cully Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Brendan Cully Re: [Xen-devel] Prepping a bugfix push, Ian Campbell Re: [Xen-devel] Prepping a bugfix push, Ian Campbell Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge Re: [Xen-devel] Prepping a bugfix push, Ian Campbell Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge <= Re: [Xen-devel] Prepping a bugfix push, Brendan Cully Re: [Xen-devel] Prepping a bugfix push, Jeremy Fitzhardinge [Xen-devel] Re: Prepping a bugfix push, Ian Campbell Re: [Xen-devel] Re: Prepping a bugfix push, Konrad Rzeszutek Wilk Re: [Xen-devel] Re: Prepping a bugfix push, Jeremy Fitzhardinge

Previous by Date:	[Xen-devel] [PATCH] fix compiler error on 32-bit compile from c/s20575, Dan Magenheimer
Next by Date:	[Xen-community] Xen.org Community Weekly Update - Dec 4, Stephen Spector
Previous by Thread:	Re: [Xen-devel] Prepping a bugfix push, Ian Campbell
Next by Thread:	Re: [Xen-devel] Prepping a bugfix push, Brendan Cully
Indexes:	[Date] [Thread] [Top] [All Lists]