Hey there,
(many inlines on this one)
> At 00:33 -0400 on 27 Oct (1319675630), Andres Lagar-Cavilla wrote:
>> Introduce a fine-grained concurrency control structure for the p2m. This
>> allows for locking 2M-aligned chunks of the p2m at a time, exclusively.
>> Recursive locking is allowed. Global locking of the whole p2m is also
>> allowed for certain operations. Simple deadlock detection heuristics are
>> put in place.
>>
>> Note the patch creates backwards-compatible shortcuts that will lock the
>> p2m globally. So it should remain functionally identical to what is
>> currently
>> in place.
>>
>> Signed-off-by: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
>
> Wow. What a lot of code. :) I took a look through, but I can't
> guarantee to have got all the details. Things I saw:
>
> - You use atomic_t for the count but only ever update it under a
> lock. :) If you just need to be sure of atomic writes, then
> atomic_set will do that without using a locked increment/decrement.
I'm a bit flaky on my atomics. And paranoid. I'll be less lenient next time.
>
> - You allocate the bitmaps from xenheap - they should really be using
> p2m memory, so as to avoid changing the memory overhead of the domain
> as it runs. That will involve map_domain_page()ing the bitmaps as
> you go, but at least on x86_64 that's very cheap.
p2m_alloc_ptp? Sure.
>
> - panic() on out-of-memory is pretty rude.
>
Yeah, but unwinding all possible lock callers to handle ENOMEM was over my
threshold. Reality is that on your run-of-the-mill 4GB domain you have 4
or 5 single page allocations. You have bigger problems if that fails.
> But stepping back, I'm not sure that we need all this just yet. I think
> it would be worth doing the interface changes with a single p2m lock and
> measuring how bad it is before getting stuck in to fine-grained locking
> (fun though it might be).
Completely agree. I think this will also ease adoption and bug isolation.
It'll allow me to be more gradual. I'll rework the order. Thanks, very
good.
>
> I suspect that if this is a contention point, allowing multiple readers
> will become important, especially if there are particular pages that
> often get emulated access.
>
> And also, I'd like to get some sort of plan for handling long-lived
> foreign mappings, if only to make sure that this phase-1 fix doesn't
> conflict wih it.
>
If foreign mappings will hold a lock/ref on a p2m subrange, then they'll
disallow global operations, and you'll get a clash between log-dirty and,
say, qemu. Ka-blam live migration.
Read-only foreign mappings are only problematic insofar paging happens.
With proper p2m update/lookups serialization (global or fine-grained) that
problem is gone.
Write-able foreign mappings are trickier because of sharing and w^x. Is
there a reason left, today, to not type PGT_writable an hvm-domain's page
when a foreign mapping happens? That would solve sharing problems. w^x
really can't be solved short of putting the vcpu on a waitqueue
(preferable to me), or destroying the mapping and forcing the foreign OS
to remap later. All a few steps ahead, I hope.
Who/what's using w^x by the way? If the refcount is zero, I think I know
what I'll do ;)
That is my current "long term plan".
> Oh, one more thing:
>
>> +/* Some deadlock book-keeping. Say CPU A holds a lock on range A, CPU B
>> holds a
>> + * lock on range B. Now, CPU A wants to lock range B and vice-versa.
>> Deadlock.
>> + * We detect this by remembering the start of the current locked range.
>> + * We keep a fairly small stack of guards (8), because we don't
>> anticipate
>> + * a great deal of recursive locking because (a) recursive locking is
>> rare
>> + * (b) it is evil (c) only PoD seems to do it (is PoD therefore evil?)
>> */
>
> If PoD could ba adjusted not to do it, could we get rid of all the
> recursive locking entirely? That would simplify things a lot.
>
My comment is an exaggeration. In a fine-grained scenario, recursive
locking happens massively throughout the code. We just need to live with
it. I was ranting for free on the "evil" adjective.
What is a real problem is that pod sweeps can cause deadlocks. There is a
simple step to mitigate this: start the sweep from the current gfn and
never wrap around -- too bad if the gfn is too high. But this alters the
sweeping algorithm. I'll deal with it when its it's turn.
Andres
> Tim.
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|