Keir,
On Mon, Oct 10, 2011 at 6:06 AM, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
> On 10/10/2011 10:21, "Tim Deegan" <tim@xxxxxxx> wrote:
>
>> At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla wrote:
>>> I have a proposal. I'd like to hear from the list what they think.
>>>
>>> - 1. change p2m lock to a read/write lock
>>> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
>>> callers of p2m_lock will become write lockers.
>>> - 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
>>> while holding the read lock.
>>> - 4. Have all lookup callers put_page on the obtained mfn, once done.
>>
>> This seems like a step in the right direction, but if we're going to
>> make this big an interface change there might be better interfaces to
>> end up with.
>>
>> A few issues I can see with it:
>> - p2m lookups are on some very performance-sensitive paths
>> (e.g. multiple times in any pagetable walk or instruction emulation
>> in a HVM guest) so adding the rwlock might have a noticeable impact.
>
> If the read sections are short, may as well use a plain spinlock.
>
> The best (but hard) way to make the locking cheaper is to work out a way to
> use finer-grained locks (e.g., per-page / per-mapping) or avoid locks
> altogether (e.g., RCU).
No clue about RCU. But the p2m tree structure lends itself naturally
to fine-grained locking. In fact, hierarchical locking given 2M and 1G
superpages.
Now, this moves all the locking into the specific p2m implementations,
ept and traditional pt. Do you think a test_and_set-style spinlock
could fit in the unused bits of a p2m entry. It would have scarce
debug information :) I don't know if ept would freak out with someone
spinning on an entry it has loaded in the translation hardware.
Probably.
So, perhaps the most decent idea is to have a tree/array of locks on
the side. This would not have to live inside the ept/pt
implementation-specific layer. Although locking unaligned,
arbitrarily-sized ranges of pages (Does anyone do that? PoD?) would
become a big headache.
>
> Multi-reader locks are rarely going to be a good choice in the hypervisor.
>
> A good first step anyhow would be to make the p2m_ synchronisation correct,
> and then optimise it. Sounds like that is hard enough. :-)
Pigybacking another question: ultimately, if we get p2m sync correct,
paging can introduce arbitrary waits. Currently the code bails out,
rather ungracefully (e.g. hvm_copy). Is this what wait queues were
introduced for? Hasn't that been implemented purely out of lack of
cycles, or something more fundamental awaits?
Thanks!
Andres
>
> -- Keir
>
>> - This fixes one class of races (page gets freed-to-xen underfoot) but
>> leaves another one (gfn -> mfn map changes underfoot) untouched. In
>> particular it doesn't solve the race where a foreign mapper
>> gets a r/w map of what's meant to be a read-only frame.
>>
>> I think that to fix things properly we need to have some refcount
>> associated with the p2m mapping itself. That would be taken by all
>> lookups (or at least some - we could have a flag to the p2m lookup) and
>> released as you suggest, but more importantly it would block all p2m changes
>> while the count was raised. (I think that a least in the common case we
>> could encode such a refcount using the existing typecount).
>>
>> One problem then is how to make all the callers of the p2m update
>> functions handle failure, either by waiting (new deadlock risk?) or
>> returning EAGAIN at the hypercall interface. Paths where the update
>> isn't caused by an explicit request (like log-dirty and the mem-event
>> rx-to-rw conversion) would be particularly tricky.
>>
>> More seriously, it introduces another round of the sort of priority
>> inversion we already get with the existing refcounts - a foreign
>> mapping, caused by a user-space program in another VM, could arbitrarily
>> delay a p2m update (and so prevent a VCPU from making progress), without
>> any mechanism to even request that the mapping be removed.
>>
>> Any ideas how to avoid that? Potentially with some extra bookkeeping on
>> foreign mappings we could revoke or redirect them when the p2m changes.
>> That would fit nicely with the abstraction in the interfaces where HVM
>> domains' memory is always indexed by pfn. I can imagine it being quite
>> tricky though.
>>
>>> I'm more wary that turning p2m locking into read/write will result in
>>> code deadlocking itself: taking a read lock first and a write lock
>>> later. Possibly the current rwlock implementation could be improved to
>>> keep a cpumask of read-lockers, and provide an atomic "promote from
>>> read to write" atomic operation (something along the lines of wait
>>> until you're the only reader in the cpumask, and then cmpxchg(lock,
>>> -1, WRITE_BIAS))
>>
>> I think that would deadlock if two cpus tried it at once.
>>
>> Tim.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|