This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: Balloons, crash-dumps, populate-on-demand, and shared ze

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Subject: [Xen-devel] Re: Balloons, crash-dumps, populate-on-demand, and shared zero pages
From: Steven Smith <steven.smith@xxxxxxxxxx>
Date: Thu, 20 Aug 2009 11:39:21 +0100
Cc: Gianluca Guida <Gianluca.Guida@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Durrant <Paul.Durrant@xxxxxxxxxx>, Keir Fraser <Keir.Fraser@xxxxxxxxxxxxx>, Steven Smith <Steven.Smith@xxxxxxxxxxxxx>, Paul
Delivery-date: Thu, 20 Aug 2009 03:39:52 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <de76405a0908200318h520b1e52q6fb6e0219448b607@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <de76405a0908200318h520b1e52q6fb6e0219448b607@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> Paul recently pointed out that a side-effect of having the balloon
> driver replace guest p2m memory with empty space is that when Windows
> does a crash dump (perhaps Linux too), when it reaches the pages in
> the balloon, it will cause a page fault, which can cause cascading
> crashes and prevent any useful information from reaching the dump
> file.
Well, not quite.  During a crash dump, the only thing Windows does
with the page is write it out.  If you're using PV drivers, that means
you create grant references for the ballooned-out PFNs and pass them
off to the backend, which tries to map them, fails, and passes an
error back to the frontend.  If the frontend then passes those errors
back to Windows then it'll retry a couple of times, then give up and
crash.  It wouldn't be particularly difficult to avoid this by just
masking the error from the frontend, claiming to have written the data
even though the backend gave us an error.  That'd mean you'd have
garbage in the dump file for ballooned-out pages, but those pages
probably aren't very interesting, and the rest of the dump file would
be fine.

This might be relevant for hibernation files, though, because Windows
compresses those before writing them out, and hence has to touch them
through a virtual address.  At the moment, the Citrix drivers deal
with this by just blocking hibernation whenever the balloon driver's
active.  Making ballooned out pages implicitly all-zeroes would let us
turn that back on, which'd be kind of nice.  I'm not sure how valuable
that actually is in the real world, though: why would you hibernate a
VM when you could just vm-suspend it?

> After thinking about it for a bit, I wondered if it might be better to
> replace the "populate-on-demand" concept with a
> "shared-zero-populate-on-demand".  Reads to a PoD page would always
> map to a read-only shared zero page (or superpage, as the case may
> be).  We can change the balloon driver behavior to fill the p2m
> entries for the balloon with zPoD entries instead of empy p2m entries.
>  As a side-effect, the balloon driver no longer would need to
> explicitly fill in the p2m entries with ram when deflating the
> balloon; the tools already tell Xen about memory target increases, so
> it can increase the PoD "cache"; the balloon driver would simply need
> to free memory back to the kernel and it the balloon will be populated
> on-demand by the guest.
That would make things marginally easier on the drivers, but it's at
the expense of potentially more subtle errors when something goes
wrong.  At the moment, if the balloon driver tries to deflate the
balloon too far, the populate hypercall fails and it's very obvious
what's gone wrong, whereas with an implicit re-populate it'll look
like everything's working fine for some time afterwards, until the
guest touches too many pages and PoD kills it.


Attachment: signature.asc
Description: Digital signature

Xen-devel mailing list