On Wed, 2010-01-27 at 18:50 +0000, Jeremy Fitzhardinge wrote:
> On 01/27/2010 09:26 AM, Ian Campbell wrote:
> > On Mon, 2010-01-25 at 20:02 +0000, Jeremy Fitzhardinge wrote:
> >
> >> IanC, Pasi, myself and others explored a number of other ways to try
> >> and fix it in the Xen pvops code, but they all turned out to be very
> >> expensive, just not work (they just pushed the race around), or
> >> require new pvops just for this case.
> >>
> > Just to brainstorm a bit more:
> >
> > There's no way a kunmap_atomic pvop would be acceptable? it would at
> > least make the API symmetrical.
> >
>
> We could propose it, but I think we have bigger things to spend our
> capital on. And I'm not sure it would help:
>
> In theory xen_kmap_atomic could take the pte lock and unmap_atomic could
> release it. But
> kmap_atomic doesn't have enough info be able to take the lock and unmap
> wouldn't either unless we passed it some odd parameters. And even if we
> did take the lock, the calling kernel code will also attempt to take the
> lock if it actually wants to make a pte change, so we'd have to change
> the logic there.
OK, so that idea is out.
> > What about a hypercall which would set a PTE with the writable bit set
> > atomically depending on the pinned status of the referenced page? (I
> > haven't even vaguely thought this idea through).
> >
>
> It doesn't really help because the core issue is the race which changes
> the page state half way through. If we create a writable mapping, a pin
> on another CPU is going to fail.
I think it could be constructed such that the pin and the new hypercall
collude and do the right thing, somehow... Anyway it doesn't matter, I
think the idea below is much more likely to yield a useful solution.
> We could fix it by locking the pte
> while it is mapped, but then we wouldn't need a new hypercall.
>
> > Is there some way we can disable HIGHPTE at runtime even if
> > CONFIG_HIGHPTE=y? Looks like that might be relatively self-contained in
> > pte_alloc_one(). All the actual uses of high PTEs goes through
> > kmap_atomic which explicitly tests for PageHighmem so by ensuring PTEs
> > are never high at allocation time we would skip all those paths.
> > Something like the untested patch below, but not so skanky, obviously.
> >
>
> That's a thought. It could be generally useful too; highpte should only
> be used in extreme circumstances (to prevent ptes from filling most of
> lowmem), not on every system with highmem. IOW use a generic flag
> rather than make it explicitly Xen-related, then we can set that flag.
I think this is the most plausible idea. Need to think about what
criteria would be used to set the flag on native, simply raw RAM size?
i.e. you wouldn't use HIGHPTE on a 4G system, even if CONFIG_HIGHPTE is
enabled, but where would the cut-off be?
Rather than a flag I guess I'd make a pte_gfp variable which could be
modified to suit.
> Or we could just put a big fat config dependency in.
I'd imagine that seemingly random "depends !XEN" would be unpopular
upstream.
> > This last would be nice since it also remove the
> > crippling-for-virtualisation overhead, so it would potentially benefit
> > KVM and VMI as well...
> >
>
> VMI is a non-issue, and I don't think HIGHPTE is extraordinarily
> expensive on kvm.
It would be expensive for shadow mode (three traps to update a PTE) but
I guess for EPT/NPT it is around as cheap as on native.
> >> Given that HIGHPTE is generally a bad idea and should be deprecated
> >> (any machine big enough to need it should definitely be running a
> >> 64-bit kernel), I've left it on the backburner hoping for some
> >> inspiration to strike. So far it has not.
> >>
> > Unfortunately distros seem to be using it for their native kernels and
> > since pvops means they won't have a separate xen kernel I think we need
> > to figure something out.
> >
>
> We could lobby for them to turn it off.
As a separate action to the above that seem like it might be worthwhile.
> I wonder if they have a real
> user demand for it these days. It could only be important for users
> with lots of physical memory and a 32-bit only CPU, which can't be
> common now.
I guess it is hard for them to judge and so they are relatively
conservative about switching a long standing option off?
> (There should be no problem with using a 64-bit kernel,
> even if userspace is all 32-bit.).
A lot of distros have been a bit slow on the uptake with that
configuration.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|