This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-changelog] [xen-3.2-testing] Fix race between scheduler and CPUs be

To: xen-changelog@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-changelog] [xen-3.2-testing] Fix race between scheduler and CPUs being offlined
From: "Xen patchbot-3.2-testing" <patchbot-3.2-testing@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 17 Dec 2008 06:20:26 -0800
Delivery-date: Wed, 17 Dec 2008 06:20:19 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-changelog-request@lists.xensource.com?subject=help>
List-id: BK change log <xen-changelog.lists.xensource.com>
List-post: <mailto:xen-changelog@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-changelog>, <mailto:xen-changelog-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-changelog>, <mailto:xen-changelog-request@lists.xensource.com?subject=unsubscribe>
Reply-to: xen-devel@xxxxxxxxxxxxxxxxxxx
Sender: xen-changelog-bounces@xxxxxxxxxxxxxxxxxxx
# HG changeset patch
# User Keir Fraser <keir.fraser@xxxxxxxxxx>
# Date 1229440848 0
# Node ID 98aba5761c5aad01edd5cfcde1c610923f4a27ae
# Parent  6f47c2aae100dcf727df699fd3fbe03cc2bbdd44
Fix race between scheduler and CPUs being offlined

Since the credit scheduler depends on cpu_core_map and cpu_sibling_map
to be populated for all CPUs marked online in cpu_online_map
(otherwise csched_cpu_pick() can get into an endless loop due to
nxt_idlers being empty and hence no bit being cleared from cpus),
sibling info must be cleared *after* removing a CPU from cpu_online_map.

But that is only reducing the original race window - since the
clearing of the CPU maps happens on the dying CPU while the scheduler
runs on an active one (generally CPU0), the scheduler must also be
enabled to deal with the potential of finding empty nxt_idlers. While
this change alone would suffice to fix the race, clearing the maps in
proper order still seems like a reasonable thing to do.

Note that this is *not* applicable to 3.3 or -unstable, since there
scheduling doesn't happen anymore while CPUs are being brought down.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxxxx>
 xen/arch/x86/smpboot.c    |    4 ++--
 xen/common/sched_credit.c |   10 ++++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff -r 6f47c2aae100 -r 98aba5761c5a xen/arch/x86/smpboot.c
--- a/xen/arch/x86/smpboot.c    Tue Dec 16 13:29:00 2008 +0000
+++ b/xen/arch/x86/smpboot.c    Tue Dec 16 15:20:48 2008 +0000
@@ -1216,12 +1216,12 @@ int __cpu_disable(void)
-       remove_siblinginfo(cpu);
        cpu_clear(cpu, map);
        /* It's now safe to remove this processor from the online map */
        cpu_clear(cpu, cpu_online_map);
+       remove_siblinginfo(cpu);
        return 0;
diff -r 6f47c2aae100 -r 98aba5761c5a xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Tue Dec 16 13:29:00 2008 +0000
+++ b/xen/common/sched_credit.c Tue Dec 16 15:20:48 2008 +0000
@@ -474,6 +474,16 @@ csched_cpu_pick(struct vcpu *vc)
             cpu = nxt;
             cpu_clear(cpu, cpus);
+        else if ( unlikely(cpus_empty(nxt_idlers)) )
+        {
+            /*
+             * This can happen when CPUs are being brought down for S3
+             * or S5: cpu_{core,sibling}_map may have got cleared by
+             * the time we get here, while we may have found the CPU
+             * still set in cpu_online_map earlier.
+             */
+            cpu_clear(nxt, cpus);
+        }
             cpus_andnot(cpus, cpus, nxt_idlers);

Xen-changelog mailing list

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-changelog] [xen-3.2-testing] Fix race between scheduler and CPUs being offlined, Xen patchbot-3.2-testing <=