What seems likely to me is that Xen (setting the PoD target) and the
balloon driver (allocating memory) have a different way of calculating
the amount of guest memory. So the balloon driver thinks it's done
handing memory back to Xen when there are still more outstanding PoD
entries than there are entries in the PoD memory pool. What balloon
driver are you using? Can you let me know max_mem, target, and what the
balloon driver has reached before calling it quits? (Although 13,000
pages is an awful lot to be off by: 54 MB...)
Re what "B" means, below is a rather long-winded explanation that will,
hopefully, be clear. :-)
Hmm, I'm not sure what the guest balloon driver's "Current allocation"
means either. :-) Does it mean, "Size of the current balloon" (i.e.,
starts at 0 and grows as the balloon driver allocates guest pages and
hands them back to Xen)? Or does it mean, "Amount of memory guest
currently has allocated to it" (i.e., starts at static_max and goes down
as the balloon driver allocates guest pages and hands them back to Xen)?
In the comment, B does *not* mean "the size of the balloon" (i.e., the
number of pages allocated from the guest OS by the balloon driver).
Rather, B means "Amount of memory the guest currently thinks it has
allocated to it." B starts at M at boot. The balloon driver will try
to make B=T by inflating the size of the balloon to M-T. Clear as mud?
Let's make a concrete example. Let's say static max is 409,600K
(100,000 pages).
M=100,000 and doesn't change. Let's say that T is 50,000.
At boot:
B == M == 100,000.
P == 0
tot_pages = pod.count == 50,000
entry_count == 100,000
Thus things hold:
* 0 <= P (0) <= T (50,000) <= B (100,000) <= M (100,000)
* entry_count (100,000) == B (100,000) - P (0)
* tot_pages (50,000) == P (0) + pod.count (50,000)
As the guest boots, pages will be populated from the cache; P increases,
but entry_count and pod.count decrease. Let's say that 25,000 pages get
allocated just before the balloon driver runs:
* 0 <= P (25,000) <= T (50,000) <= B(100,000) <= M (100,000)
* entry_count (75,000) == B (100,000) - P (25,000)
* tot_pages (50,000) == P (25,000) + pod.count (25,000)
Then the balloon driver runs. It should try to allocate 50,000 pages
total (M - T). For simplicity, let's say that the balloon driver only
allocates un-allocated pages. When it's halfway there, having allocated
25,000 pages, things look like this:
* 0 <= P (25,000) <= T (50,000) <= B (75,000) <= M (100,000)
* entry_count (50,000) == B (75,000) - P (25,000)
* tot_pages (50,000) == P (25,000) + pod.count (25,000)
Eventually the balloon driver should reach its new target of 50,000,
having allocated 50,000 pages:
* 0 <= P (25,000) <= T (50,000) <= B (50,000) <= M(100,000)
* entry_count(25,000) == B(50,000) - P (25,000)
* tot_pages (50,000) == P(25,000) + pod.count(25,000)
The reason for the logic is so that we can do the Right Thing if, after
the balloon driver has ballooned half way (to 75,000 pages), the target
is changed. If you're not changing the target before the balloon driver
has reached its target,
-George
Jan Beulich wrote:
George,
before diving deeply into the PoD code, I hope you have some idea that
might ease the debugging that's apparently going to be needed.
Following the comment immediately before p2m_pod_set_mem_target(),
there's an apparent inconsistency with the accounting: While the guest
in question properly balloons down to its intended setting (1G, with a
maxmem setting of 2G), the combination of the equations
d->arch.p2m->pod.entry_count == B - P
d->tot_pages == P + d->arch.p2m->pod.count
doesn't hold (provided I interpreted the meaning of B correctly - I
took this from the guest balloon driver's "Current allocation" report,
converted to pages); there's a difference of over 13000 pages.
Obviously, as soon as the guest uses up enough of its memory, it
will get crashed by the PoD code.
In two runs I did, the difference (and hence the number of entries
reported in the eventual crash message) was identical, implying to
me that this is not a simple race, but rather a systematical problem.
Even on the initial dump taken (when the guest was sitting at the
boot manager screen), there already appears to be a difference of
800 pages (it's my understanding that at this point the difference
between entries and cache should equal the difference between
maxmem and mem).
Does this ring any bells? Any hints how to debug this? In any case
I'm attaching the full log in case you want to look at it.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|