[Xen-devel] Re: PoD issue

What seems likely to me is that Xen (setting the PoD target) and theballoon driver (allocating memory) have a different way of calculatingthe amount of guest memory. So the balloon driver thinks it's donehanding memory back to Xen when there are still more outstanding PoDentries than there are entries in the PoD memory pool. What balloondriver are you using? Can you let me know max_mem, target, and what theballoon driver has reached before calling it quits? (Although 13,000pages is an awful lot to be off by: 54 MB...)

Re what "B" means, below is a rather long-winded explanation that will,hopefully, be clear. :-)

Hmm, I'm not sure what the guest balloon driver's "Current allocation"means either. :-) Does it mean, "Size of the current balloon" (i.e.,starts at 0 and grows as the balloon driver allocates guest pages andhands them back to Xen)? Or does it mean, "Amount of memory guestcurrently has allocated to it" (i.e., starts at static_max and goes downas the balloon driver allocates guest pages and hands them back to Xen)?

In the comment, B does *not* mean "the size of the balloon" (i.e., thenumber of pages allocated from the guest OS by the balloon driver).Rather, B means "Amount of memory the guest currently thinks it hasallocated to it." B starts at M at boot. The balloon driver will tryto make B=T by inflating the size of the balloon to M-T. Clear as mud?

Let's make a concrete example. Let's say static max is 409,600K(100,000 pages).

M=100,000 and doesn't change.  Let's say that T is 50,000.

At boot:
B == M == 100,000.
P == 0
tot_pages = pod.count == 50,000
entry_count == 100,000

Thus things hold:
* 0 <= P (0) <= T (50,000) <= B (100,000) <= M (100,000)
* entry_count (100,000) == B (100,000) - P (0)
* tot_pages (50,000) == P (0) + pod.count (50,000)

As the guest boots, pages will be populated from the cache; P increases,but entry_count and pod.count decrease. Let's say that 25,000 pages getallocated just before the balloon driver runs:


* 0 <= P (25,000) <= T (50,000) <= B(100,000) <= M (100,000)
* entry_count (75,000) == B (100,000) - P (25,000)
* tot_pages (50,000) == P (25,000) + pod.count (25,000)

Then the balloon driver runs. It should try to allocate 50,000 pagestotal (M - T). For simplicity, let's say that the balloon driver onlyallocates un-allocated pages. When it's halfway there, having allocated25,000 pages, things look like this:


* 0 <= P (25,000) <= T (50,000) <= B (75,000) <= M (100,000)
* entry_count (50,000) == B (75,000) - P (25,000)
* tot_pages (50,000) == P (25,000) + pod.count (25,000)

Eventually the balloon driver should reach its new target of 50,000,having allocated 50,000 pages:


* 0 <= P (25,000) <= T (50,000) <= B (50,000) <= M(100,000)
* entry_count(25,000) == B(50,000) - P (25,000)
* tot_pages (50,000) == P(25,000) + pod.count(25,000)

The reason for the logic is so that we can do the Right Thing if, afterthe balloon driver has ballooned half way (to 75,000 pages), the targetis changed. If you're not changing the target before the balloon driverhas reached its target,


-George

Jan Beulich wrote:

George,

before diving deeply into the PoD code, I hope you have some idea that
might ease the debugging that's apparently going to be needed.

Following the comment immediately before p2m_pod_set_mem_target(),
there's an apparent inconsistency with the accounting: While the guest
in question properly balloons down to its intended setting (1G, with a
maxmem setting of 2G), the combination of the equations

  d->arch.p2m->pod.entry_count == B - P
  d->tot_pages == P + d->arch.p2m->pod.count

doesn't hold (provided I interpreted the meaning of B correctly - I
took this from the guest balloon driver's "Current allocation" report,
converted to pages); there's a difference of over 13000 pages.
Obviously, as soon as the guest uses up enough of its memory, it
will get crashed by the PoD code.

In two runs I did, the difference (and hence the number of entries
reported in the eventual crash message) was identical, implying to
me that this is not a simple race, but rather a systematical problem.

Even on the initial dump taken (when the guest was sitting at the
boot manager screen), there already appears to be a difference of
800 pages (it's my understanding that at this point the difference
between entries and cache should equal the difference between
maxmem and mem).

Does this ring any bells? Any hints how to debug this? In any case
I'm attaching the full log in case you want to look at it.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: PoD issue