> On Thu, Apr 21, 2005 at 02:51:34PM +0100, Ian Pratt wrote:
> > The downside of this scheme is that it will cripple the TLB flush
> > filter on Opteron. Linux used to do this until 2.6.11 anyhow, and
> > no-one really complained much. The far bigger problem is
> that it won't
> > work for SMP guests, at least without making the L2 per VCPU and
> > updating the L3 accordingly using mm ref counting, which
> would be messy but do-able.
> >
> > The alternative is to hack PAE Linux to force the L2
> containing kernel
> > mappings to be per-pagetable rather than shared. The
> downside of the
> > is that we use an extra 4KB per pagetable, and have the hassle of
> > faulting in kernel L2 mappings on demand (like non-PAE
> Linux has to).
> > This plays nicely with the TLB flush filter, and is fine
> for SMP guests.
>
> <without having looked at the Xen code much, but some
> familiarity with the i386 linux code>
>
> I thought about this a bit more and your section alternative
> sounds much better. Faulting on the kernel mappings is very
> infrequent and usually after some time the PGD is fully set
> up and only the lower level of the kernel mappings change
> with vmalloc etc.. On x86-64 Linux I even initialize it when
> the PGD is created from a static template page. The remaining
> cases for very big vmalloc can be handled on demand without
> too much code. It should be pretty easy to do on i386 too.
>
> > The simplest thing of all in the first instance is to turn
> all of the
> > linear pagetable accesses into macros taking (exec_domain,
> offset) and
> > then just implement them using pagetable walks.
> >
> > What do you guys think? Implement option #3 in the first instance,
> > then aim for #2.
>
> I dont get your numbering, didnt you have only two options?
> Or does the one below count too?
There really were three options. The third was just to avoid use of
linear page tables and replace them with page table walks. Thus, we only
have to worry about having a per-domain (as opposed to per-pagetable)
L2, so requires minimal changes to Linux. I think we're all agreed that
option #2 is where we want to end up, because linear pagetables are a
useful performance win for Xen.
> > One completely different approach would be to first implement a PAE
> > guest using the "translate, internal" shadow mode where we
> don't have
> > to worry about any of this gory stuff. Once its working, we could
> > then implement a paravirtualized mode to improve
> performance and save memory.
> > Getting shadow mode working on PAE shouldn't be too hard,
> as its been
> > written with 2, 3 and 4 level pagetables in mind.
>
> That sounds attractive too, except that duplicated page
> tables can be killer on some workloads (database with many
> processes and lots of shared memory, you end up with a lot of
> memory tied in page tables even with hugetlb). And normally
> databases are one of the most common workloads for PAE. It
> might be a good idea to avoid it at least for the para case.
Yep, the paravirtualized approach is definitely preferable.
Thanks,
Ian
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|