[Xen-devel] Re: Xen spinlock questions

Jan Beulich wrote:

More on that: You'll really need two per-CPU variables afaics, one for the
current non-irq lock being spun upon, and one for the current irqs-disabled
one. The latter one might not need saving/restoring as long as you don't
re-enable interrupts, but the code might turn out cleaner when doing the
save/restore regardless, e.g. for me (doing ticket locking):

Not sure I follow. How do you use the second array at the kick end ofthe process?

I ended up just storing the previous value locally, and then restoringit. It assumes that locks will strictly nest, of course, but I thinkthat's reasonable.


--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -135,25 +135,39 @@
static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);

-static inline void spinning_lock(struct xen_spinlock *xl)
+/*
+ * Mark a cpu as interested in a lock.  Returns the CPU's previous
+ * lock of interest, in case we got preempted by an interrupt.
+ */
+static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl)
{
-       __get_cpu_var(lock_spinners) = xl;
-       wmb();                  /* set lock of interest before count */
+       struct xen_spinlock *prev;
+
+       prev = xchg(&__get_cpu_var(lock_spinners), xl);
+       /* xchg is a barrier */
+
        asm(LOCK_PREFIX " incw %0"
            : "+m" (xl->spinners) : : "memory");
+
+       return prev;
}

-static inline void unspinning_lock(struct xen_spinlock *xl)
+/*
+ * Mark a cpu as no longer interested in a lock.  Restores previous
+ * lock of interest (NULL for none).
+ */
+static inline void unspinning_lock(struct xen_spinlock *xl, struct 
xen_spinlock *prev)
{
        asm(LOCK_PREFIX " decw %0"
            : "+m" (xl->spinners) : : "memory");
-       wmb();                  /* decrement count before clearing lock */
-       __get_cpu_var(lock_spinners) = NULL;
+       wmb();                  /* decrement count before restoring lock */
+       __get_cpu_var(lock_spinners) = prev;
}

static noinline int xen_spin_lock_slow(struct raw_spinlock *lock, bool 
irq_enable)
{
        struct xen_spinlock *xl = (struct xen_spinlock *)lock;
+       struct xen_spinlock *prev;
        int irq = __get_cpu_var(lock_kicker_irq);
        int ret;
        unsigned long flags;
@@ -163,7 +177,8 @@
                return 0;

        /* announce we're spinning */
-       spinning_lock(xl);
+       prev = spinning_lock(xl);
+
        flags = __raw_local_save_flags();
        if (irq_enable) {
                ADD_STATS(taken_slow_irqenable, 1);
@@ -189,7 +204,7 @@

out:
        raw_local_irq_restore(flags);
-       unspinning_lock(xl);
+       unspinning_lock(xl, prev);
        return ret;
}

int xen_spin_wait(raw_spinlock_t *lock, unsigned int token)
{
        struct spinning *spinning;
        int rc;

        /* if kicker interrupt not initialized yet, just spin */
        if (spinlock_irq < 0)
                return 0;

        /* announce we're spinning */
        spinning = &__get_cpu_var(spinning);
        if (spinning->lock) {
                BUG_ON(!raw_irqs_disabled());
                spinning = &__get_cpu_var(spinning_irq);
                BUG_ON(spinning->lock);
        }
        spinning->ticket = token >> TICKET_SHIFT;
        smp_wmb();
        spinning->lock = lock;

        /* clear pending */
        xen_clear_irq_pending(spinlock_irq);

        /* check again make sure it didn't become free while
           we weren't looking  */
        rc = __raw_spin_trylock(lock);
        if (!rc) {
                /* block until irq becomes pending */
                xen_poll_irq(spinlock_irq);
                kstat_this_cpu.irqs[spinlock_irq]++;
        }

        /* announce we're done */
        spinning->lock = NULL;

        return rc;
}

On an 8-core system I'm seeing between 20,000 (x86-64) and 35,000

(i686) wakeup interrupts per CPU. I'm not certain this still counts as rare.
Though that number may go down a little once the hypervisor doesn't
needlessly wake all polling vCPU-s anymore.

What workload are you seeing that on? 20-35k interrupts over what timeperiod?

In my tests, I only see it fall into the slow path a couple of thousandtimes per cpu for a kernbench run.

The main reason for ticket locks is to break the egregious unfairnessthat (some) bus protocols implement. That level of fairness shouldn'tbe necessary here because once the cpus fall to blocking in thehypervisor, it's up to Xen to tie-break.
Why? The hypervisor doing the tie-break makes it possibly even more
unfair, whereas with tickets and a way to kick the next owner (and only
it) almost the same level of fairness as on native can be achieved.

I don't think strict fairness is a particularly desirable property; it'scertainly not an unambiguous win. The important thing is solves istotal starvation, and if the Xen scheduler ends up starving a CPU thenthat's a scheduler bug we can fix. We can't fix the cache coherenceprotocols, so we need to use something like a ticket lock.

 The
only heuristic to determine by measurement is the number of spin loops
before going into poll mode (which your original patch's description and
implementation for some reason disagree about - the description says
2^16 loops, the implementation uses 2^10). Obviously, the optimal
numbers may turn out different for byte and ticket locks.

Yes, there's a bit of confusion about loop iterations vs cycles. Theoriginal paper said that after 2^16 *cycles* 90% of locks have beentaken, which I map (approximately) to 2^10 loop iterations (butoriginally I had 2^16 iterations). I don't think it's all thatcritical; it probably depends too much on workload and precise systemand VM configuration to really finely tune, and doesn't end up makingall that much difference.

That said, I've implemented a pile of debugfs infrastructure forextracting lots of details about lock performance so there's some scopefor tuning it (including being able to change the timeout on the fly tosee how things change).

And doing a wake-them-all approach isn't good here, as was
(supposedly, I wasn't there) explained in a talk on the last summit.

Well, yes, wake-all is a total disaster for ticket locks, since it justcauses scheduler thrashing. But for byte locks it doesn't matter allthat much since whoever runs next will take the lock and make progress.The others will wake up and spin for a bit before sleeping; not great,but still better than plain spinning.

It might be worth using a smaller timeout for re-lock attempts afterwaking up, but that could also lead to starvation as a sleeper will beat a disadvantage against someone who's currently spinning on the lock,and so will tend to lose against new lock takers.

I think we can also mitigate poll's wake-all behaviour by seeing if ourparticular per-cpu interrupt is pending and drop back into pollimmediately if not (ie, detect a spurious wakeup).


   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: Xen spinlock questions