Re: [Xen-devel] Re: Re: mapping problems in xenpaging

To:	Keir Fraser <keir.xen@xxxxxxxxx>
Subject:	Re: [Xen-devel] Re: Re: mapping problems in xenpaging
From:	Andres Lagar Cavilla <andres@xxxxxxxxxxxxxxxx>
Date:	Mon, 10 Oct 2011 15:31:06 -0400
Cc:	Olaf Hering <olaf@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Tim Deegan <tim@xxxxxxx>, zhen shi <bickys1986@xxxxxxxxx>, Adin Scannell <adin@xxxxxxxxxxxxxx>
Delivery-date:	Tue, 11 Oct 2011 02:38:24 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=35G2AC0oPAqazRJjWoEGQo9i8COluMeGq0UHUw31GpI=; b=h3KRpqCtF1j/2uaKkrKNO/7Y0FYOMZPcR0i2bCYK5WXrilDviS2lfvVZ1MstB7epln +oVv6F3oTBGMZxhgFfElBwmvkQHzP/J/Iq4XFHvCnez+ammqcG2Q8xeM6mwOYFSUJMds qJRNoQNEhomy7cDB12KOV/krnPwKMVDsQ7GpU=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<CAB8821A.2291B%keir.xen@xxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<20111010092111.GB31800@xxxxxxxxxxxxxxxxxxxxx> <CAB8821A.2291B%keir.xen@xxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Keir,

On Mon, Oct 10, 2011 at 6:06 AM, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
> On 10/10/2011 10:21, "Tim Deegan" <tim@xxxxxxx> wrote:
>
>> At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla wrote:
>>> I have a proposal. I'd like to hear from the list what they think.
>>>
>>> - 1. change p2m lock to a read/write lock
>>> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
>>> callers of p2m_lock will become write lockers.
>>> - 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
>>> while holding the read lock.
>>> - 4. Have all lookup callers put_page on the obtained mfn, once done.
>>
>> This seems like a step in the right direction, but if we're going to
>> make this big an interface change there might be better interfaces to
>> end up with.
>>
>> A few issues I can see with it:
>>  - p2m lookups are on some very performance-sensitive paths
>>    (e.g. multiple times in any pagetable walk or instruction emulation
>>    in a HVM guest) so adding the rwlock might have a noticeable impact.
>
> If the read sections are short, may as well use a plain spinlock.
>
> The best (but hard) way to make the locking cheaper is to work out a way to
> use finer-grained locks (e.g., per-page / per-mapping) or avoid locks
> altogether (e.g., RCU).
No clue about RCU. But the p2m tree structure lends itself naturally
to fine-grained locking. In fact, hierarchical locking given 2M and 1G
superpages.

Now, this moves all the locking into the specific p2m implementations,
ept and traditional pt. Do you think a test_and_set-style spinlock
could fit in the unused bits of a p2m entry. It would have scarce
debug information :) I don't know if ept would freak out with someone
spinning on an entry it has loaded in the translation hardware.
Probably.

So, perhaps the most decent idea is to have a tree/array of locks on
the side. This would not have to live inside the ept/pt
implementation-specific layer. Although locking unaligned,
arbitrarily-sized ranges of pages (Does anyone do that? PoD?) would
become a big headache.

>
> Multi-reader locks are rarely going to be a good choice in the hypervisor.
>
> A good first step anyhow would be to make the p2m_ synchronisation correct,
> and then optimise it. Sounds like that is hard enough. :-)

Pigybacking another question: ultimately, if we get p2m sync correct,
paging can introduce arbitrary waits. Currently the code bails out,
rather ungracefully (e.g. hvm_copy). Is this what wait queues were
introduced for? Hasn't that been implemented purely out of lack of
cycles, or something more fundamental awaits?

Thanks!
Andres
>
>  -- Keir
>
>>  - This fixes one class of races (page gets freed-to-xen underfoot) but
>>    leaves another one (gfn -> mfn map changes underfoot) untouched.  In
>>    particular it doesn't solve the race where a foreign mapper
>>    gets a r/w map of what's meant to be a read-only frame.
>>
>> I think that to fix things properly we need to have some refcount
>> associated with the p2m mapping itself.  That would be taken by all
>> lookups (or at least some - we could have a flag to the p2m lookup) and
>> released as you suggest, but more importantly it would block all p2m changes
>> while the count was raised.  (I think that a least in the common case we
>> could encode such a refcount using the existing typecount).
>>
>> One problem then is how to make all the callers of the p2m update
>> functions handle failure, either by waiting (new deadlock risk?) or
>> returning EAGAIN at the hypercall interface.  Paths where the update
>> isn't caused by an explicit request (like log-dirty and the mem-event
>> rx-to-rw conversion) would be particularly tricky.
>>
>> More seriously, it introduces another round of the sort of priority
>> inversion we already get with the existing refcounts - a foreign
>> mapping, caused by a user-space program in another VM, could arbitrarily
>> delay a p2m update (and so prevent a VCPU from making progress), without
>> any mechanism to even request that the mapping be removed.
>>
>> Any ideas how to avoid that?  Potentially with some extra bookkeeping on
>> foreign mappings we could revoke or redirect them when the p2m changes.
>> That would fit nicely with the abstraction in the interfaces where HVM
>> domains' memory is always indexed by pfn.  I can imagine it being quite
>> tricky though.
>>
>>> I'm more wary that turning p2m locking into read/write will result in
>>> code deadlocking itself: taking a read lock first and a write lock
>>> later. Possibly the current rwlock implementation could be improved to
>>> keep a cpumask of read-lockers, and provide an atomic "promote from
>>> read to write" atomic operation (something along the lines of wait
>>> until you're the only reader in the cpumask, and then cmpxchg(lock,
>>> -1, WRITE_BIAS))
>>
>> I think that would deadlock if two cpus tried it at once.
>>
>> Tim.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Re: Re: mapping problems in xenpaging