[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Design question for PV superpage support

To: Mick.Jordan@xxxxxxx
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Tue, 03 Mar 2009 09:23:36 -0800
Cc: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, Dave McCracken <dcm@xxxxxxxx>, Xen Developers List <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 03 Mar 2009 09:24:11 -0800
List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Mick Jordan wrote:

On 03/03/09 06:33, Dan Magenheimer wrote:
In general, I think the guest should assume that large pagemappings aremerely an optimization that (a) might not be possible on domain startdue to machine memory fragmentation and (b) that this condition mightalso occur on restore. Given these, it must always be prepared tofunction with 4K pages, which implies that it would need to preserveenough page table frame memory to be able revert from largeto small pages.
Mick
Do you disagree with my assertion that use of 2MB pages is
almost always an attempt to eke out a performance improvement,
that emulating 2MB pages with fragmented 4KB pages is likely
slower than just using 4KB pages to start with, and thus
that "must always be prepared to function with 4KB pages"
should NOT occur silently (if at all)?
I agree with the first statement. I'm not sure what you mean by"emulate 2MB pages with fragmented 4K pages" unless you assume nestedpage table support or you just mean falling back to 4K pages. As forwhether a change should be silent, I'm less clear on that. I certainlywouldn't consider it a fatal condition requiring domain termination,That position is consistent with the "optimization not correctness"view of using large tables. However, a guest might want to indicate insome way that it has downgraded

The tradeoff is between the performance gain one might get from usinglarge pages vs the intrusiveness of changes to a PV kernel. Given thatwhen paravirtualizing this we're going to be making small changes to thekernel's existing large page support, rather than adding it new or aseparate large-page mechanism, we need to make sure that as many of theguest's existing assumptions can be satisfied.

The requirement that a guest be able to come up with enough L1 pagetablepages to be able to map all the shattered 2M mappings at any timedefinitely doesn't fall into that category. You'd need to:


  1. Have an interface for Xen to tell the guest which pages need to be
     remapped.  Presumably this would be in terms of once contiguous
     pfn ranges which are now backed with discontinuous mfns.
  2. Get the guest to remap those pfns to the new mfns, which will
     require walking every pagetable of every process searching for
     those pfns, allocating memory for the new pagetable level.

However the main use of 2M mappings in Linux is to map the kernel textand data. That's clearly not going to be possible if we need to runkernel code to put things together after a restore. Hm, given that, Iguess we could just kludge it into hugetlbfs, but it really does make ita very narrow set of users.

BTW, thinking ahead to ballooning with 2MB pages, are we prepared
to assume that a relinquished 2MB page can't be fragmented?
While this may be appealing for systems where nearly all
guests are using 2MB pages, systems where the 2MB guest is
an odd duck might suffer substantially by making that
assumption.
Agreed. All of this really only becomes an issue when memory isovercommitted. Unfortunately, that is precisely when 2MB machinecontiguous pages are likely to be difficult to find.

If 2M pages are becoming more important, then we should change Xen to doall domain allocations in 2M units, while reserving separate superpagesspecifically for fragmenting into 4k allocations. Its certainlysensible to always round a domain's initial size up to 2M (most willalready be a 2M multiple, I suspect). Balloon is the obvious exception,but I would argue that ballooning in less than 2M units is a lot offiddly makework. The difference between a giving a domain 128MB vs126MB is already pretty trivial; dealing with 4k changes in domain sizeis laughably small.


(Now Keir brings up all difficulties...)

   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

References:
- RE: [Xen-devel] Design question for PV superpage support
  - From: Dan Magenheimer
- Re: [Xen-devel] Design question for PV superpage support
  - From: Mick Jordan

Prev by Date: [Xen-devel] PV superpage support status
Next by Date: Re: [Xen-devel] Design question for PV superpage support
Previous by thread: Re: [Xen-devel] Design question for PV superpage support
Next by thread: Re: [Xen-devel] Design question for PV superpage support
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.