This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: Xen spinlock questions

To: "Jeremy Fitzhardinge" <jeremy@xxxxxxxx>
Subject: [Xen-devel] Re: Xen spinlock questions
From: "Jan Beulich" <jbeulich@xxxxxxxxxx>
Date: Wed, 06 Aug 2008 08:15:36 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Wed, 06 Aug 2008 00:16:00 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4898976A.3080406@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4896F39A.76E4.0078.0@xxxxxxxxxx> <48975A30.2080408@xxxxxxxx> <48981005.76E4.0078.0@xxxxxxxxxx> <4898976A.3080406@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>> Jeremy Fitzhardinge <jeremy@xxxxxxxx> 05.08.08 20:09 >>>
>Jan Beulich wrote:
>> Later yesterday I noticed another issue: The code setting lock_spinners
>> isn't interruption safe - you'll need to return the old value from
>> spinning_lock() and restore it in unspinning_lock().
>Good catch.

More on that: You'll really need two per-CPU variables afaics, one for the
current non-irq lock being spun upon, and one for the current irqs-disabled
one. The latter one might not need saving/restoring as long as you don't
re-enable interrupts, but the code might turn out cleaner when doing the
save/restore regardless, e.g. for me (doing ticket locking):

int xen_spin_wait(raw_spinlock_t *lock, unsigned int token)
        struct spinning *spinning;
        int rc;

        /* if kicker interrupt not initialized yet, just spin */
        if (spinlock_irq < 0)
                return 0;

        /* announce we're spinning */
        spinning = &__get_cpu_var(spinning);
        if (spinning->lock) {
                spinning = &__get_cpu_var(spinning_irq);
        spinning->ticket = token >> TICKET_SHIFT;
        spinning->lock = lock;

        /* clear pending */

        /* check again make sure it didn't become free while
           we weren't looking  */
        rc = __raw_spin_trylock(lock);
        if (!rc) {
                /* block until irq becomes pending */

        /* announce we're done */
        spinning->lock = NULL;

        return rc;

>> Also I'm considering doing it ticket-based nevertheless, as "mix(ing) up
>> next cpu selection" won't really help fairness in xen_spin_unlock_slow().
>Why's that?  An alternative might be to just wake all cpus waiting for 
>the lock up, and then let them fight it out.  It should be unusual that 
>there's a significant number of waiters anyway, since contention is 
>(should be) rare.

On an 8-core system I'm seeing between 20,000 (x86-64) and 35,000
(i686) wakeup interrupts per CPU. I'm not certain this still counts as rare.
Though that number may go down a little once the hypervisor doesn't
needlessly wake all polling vCPU-s anymore.

>The main reason for ticket locks is to break the egregious unfairness 
>that (some) bus protocols implement.  That level of fairness shouldn't 
>be necessary here because once the cpus fall to blocking in the 
>hypervisor, it's up to Xen to tie-break.

Why? The hypervisor doing the tie-break makes it possibly even more
unfair, whereas with tickets and a way to kick the next owner (and only
it) almost the same level of fairness as on native can be achieved. The
only heuristic to determine by measurement is the number of spin loops
before going into poll mode (which your original patch's description and
implementation for some reason disagree about - the description says
2^16 loops, the implementation uses 2^10). Obviously, the optimal
numbers may turn out different for byte and ticket locks.

And doing a wake-them-all approach isn't good here, as was
(supposedly, I wasn't there) explained in a talk on the last summit.


Xen-devel mailing list