Re: [Xen-devel] [PATCH 5 of 9] Fine-grained concurrency control

To:	"Tim Deegan" <tim@xxxxxxx>
Subject:	Re: [Xen-devel] [PATCH 5 of 9] Fine-grained concurrency control structure for the p2m
From:	andres@xxxxxxxxxxxxxxxx
Date:	Wed, 2 Nov 2011 07:20:09 -0700
Cc:	olaf@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, andres@xxxxxxxxxxxxxx, keir.xen@xxxxxxxxx, Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>, adin@xxxxxxxxxxxxxx
Delivery-date:	Wed, 02 Nov 2011 07:21:07 -0700
Dkim-signature:	v=1; a=rsa-sha1; c=relaxed; d=lagarcavilla.com; h= message-id:in-reply-to:references:date:subject:from:to:cc :mime-version:content-type:content-transfer-encoding; s= lagarcavilla.com; bh=wfWnPBMh1rowyXlHw17cPa1KlEE=; b=WR3/Az/ed1O L1Cf7pGSAZIyGjolqViOC7N5q7/Q/sY9pXrx7kIzpiM/kUdBOiTFYyCv0qz3PzJZ 1P9Gwwc8p2TQ/PLV5jChOl6FQDsLKQUViUES4UgLPd4ZOaR3PiJZZI0nYstDT2u/ LMFa8jCS0dkjtThlXLXfJJO6gVryiMYA=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=lagarcavilla.com; h=message-id :in-reply-to:references:date:subject:from:to:cc:mime-version :content-type:content-transfer-encoding; q=dns; s= lagarcavilla.com; b=mLgX8Nt2PiFGWGrps9eqWDfZrrE3gr4rxfzSdiSIy2Dv BgQRzGIEWf5JMMJc8MseJk8/cAso0ov5olrxP4HMd/3kWx176v9YURs6pmkcszIb l+0CsA37TUqMYuQdEVZewFZ2YKKp/K3P6zj7bB9ekSNQZAU0tp0vUHDWYRORycs=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<20111027144333.GM59656@xxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<patchbomb.1319690025@xxxxxxxxxxxxxxxxxxx> <a23e1262b1240dcabfa0.1319690030@xxxxxxxxxxxxxxxxxxx> <20111027144333.GM59656@xxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	SquirrelMail/1.4.21

Hey there,
(many inlines on this one)

> At 00:33 -0400 on 27 Oct (1319675630), Andres Lagar-Cavilla wrote:
>> Introduce a fine-grained concurrency control structure for the p2m. This
>> allows for locking 2M-aligned chunks of the p2m at a time, exclusively.
>> Recursive locking is allowed. Global locking of the whole p2m is also
>> allowed for certain operations. Simple deadlock detection heuristics are
>> put in place.
>>
>> Note the patch creates backwards-compatible shortcuts that will lock the
>> p2m globally. So it should remain functionally identical to what is
>> currently
>> in place.
>>
>> Signed-off-by: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
>
> Wow.  What a lot of code. :)  I took a look through, but I can't
> guarantee to have got all the details.  Things I saw:
>
> - You use atomic_t for the count but only ever update it under a
>   lock. :)  If you just need to be sure of atomic writes, then
>   atomic_set will do that without using a locked increment/decrement.

I'm a bit flaky on my atomics. And paranoid. I'll be less lenient next time.

>
> - You allocate the bitmaps from xenheap - they should really be using
>   p2m memory, so as to avoid changing the memory overhead of the domain
>   as it runs.   That will involve map_domain_page()ing the bitmaps as
>   you go, but at least on x86_64 that's very cheap.

p2m_alloc_ptp? Sure.

>
> - panic() on out-of-memory is pretty rude.
>
Yeah, but unwinding all possible lock callers to handle ENOMEM was over my
threshold. Reality is that on your run-of-the-mill 4GB domain you have 4
or 5 single page allocations. You have bigger problems if that fails.

> But stepping back, I'm not sure that we need all this just yet.  I think
> it would be worth doing the interface changes with a single p2m lock and
> measuring how bad it is before getting stuck in to fine-grained locking
> (fun though it might be).

Completely agree. I think this will also ease adoption and bug isolation.
It'll allow me to be more gradual. I'll rework the order. Thanks, very
good.

>
> I suspect that if this is a contention point, allowing multiple readers
> will become important, especially if there are particular pages that
> often get emulated access.
>
> And also, I'd  like to get some sort of plan for handling long-lived
> foreign mappings, if only to make sure that this phase-1 fix doesn't
> conflict wih it.
>

If foreign mappings will hold a lock/ref on a p2m subrange, then they'll
disallow global operations, and you'll get a clash between log-dirty and,
say, qemu. Ka-blam live migration.

Read-only foreign mappings are only problematic insofar paging happens.
With proper p2m update/lookups serialization (global or fine-grained) that
problem is gone.

Write-able foreign mappings are trickier because of sharing and w^x. Is
there a reason left, today, to not type PGT_writable an hvm-domain's page
when a foreign mapping happens? That would solve sharing problems. w^x
really can't be solved short of putting the vcpu on a waitqueue
(preferable to me), or destroying the mapping and forcing the foreign OS
to remap later. All a few steps ahead, I hope.

Who/what's using w^x by the way? If the refcount is zero, I think I know
what I'll do ;)

That is my current "long term plan".

> Oh, one more thing:
>
>> +/* Some deadlock book-keeping. Say CPU A holds a lock on range A, CPU B
>> holds a
>> + * lock on range B. Now, CPU A wants to lock range B and vice-versa.
>> Deadlock.
>> + * We detect this by remembering the start of the current locked range.
>> + * We keep a fairly small stack of guards (8), because we don't
>> anticipate
>> + * a great deal of recursive locking because (a) recursive locking is
>> rare
>> + * (b) it is evil (c) only PoD seems to do it (is PoD therefore evil?)
>> */
>
> If PoD could ba adjusted not to do it, could we get rid of all the
> recursive locking entirely?  That would simplify things a lot.
>

My comment is an exaggeration. In a fine-grained scenario, recursive
locking happens massively throughout the code. We just need to live with
it. I was ranting for free on the "evil" adjective.

What is a real problem is that pod sweeps can cause deadlocks. There is a
simple step to mitigate this: start the sweep from the current gfn and
never wrap around -- too bad if the gfn is too high. But this alters the
sweeping algorithm. I'll deal with it when its it's turn.

Andres
> Tim.
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [PATCH 5 of 9] Fine-grained concurrency control structur