WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Prepping a bugfix push

On Friday, 04 December 2009 at 08:37, Jeremy Fitzhardinge wrote:
> On 12/04/09 07:50, Ian Campbell wrote:
> >On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
> >>I've been doing regular suspend/resumes not checkpoint ones as Brendan
> >>is doing, I did try a couple of checkpointed ones yesterday and they
> >>failed, IIRC with a similar softlockup to this one.
> >So what is happening is that the device event channels are getting torn
> >down by the resume handler and never completely reinstated in the
> >cancelled suspend (aka checkpoint) case.
> 
> Hm.
> 
> >In 2.6.18 there was a separate ->suspend_cancel() callback for each
> >driver, called instead of the ->resume() callback in exactly these
> >circumstances. The cancel callback doesn't do any of the teardown, in
> >fact for blkfront it doesn't even exist.
> >
> >(As a proof of concept, commenting out the entire contents of
> >blkfront_resume and netfront_resume makes checkpointing work OK for me,
> >at the cost of breaking regular resume, of course)
> >
> >pv-ops uses the generic power management infrastructure which does not
> >have a concept of cancelling a suspend. Perhaps it should? Otherwise a
> >different solution will be required, I'm not sure what that might be yet
> >yet.
> 
> Well, the obvious one is to treat it as a full suspend followed by
> immediate resume.  That is, just remove all the special case handling
> for checkpoint, and let it do the normal resume stuff when the
> hypercall returns.
> 
> I think the PM core can fail to suspend; it just resumes anything
> that has been suspended so far.

Hmm. I just tried changing the SUSPEND_CANCEL elfnote to 0 in pvops,
and now save -c takes a very long time. From the xend log:

[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:3025) 
XendDomainInfo.resumeDomain(19)
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2319) Destroying device model
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2326) Releasing devices
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing vif/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) 
XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing vbd/51713
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) 
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51713
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing console/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) 
XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2009-12-04 08:57:58 4917] INFO (XendDomainInfo:3260) Dev 51713 still active, 
looping...

that last line repeats for a very long time, and eventually gives
up. The domain is still broken when save completes.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>