WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: PROBLEM: 3.0-rc kernels unbootable since -rc3

> > http://darnok.org/xen/cpu1.log
> 
> OK, a fair amount of variety, then lots and lots of task_waking_fair(),
> so I still feel good about asking you for the following.
.. snup..
> Hmmm...  Given that this is persisting for many many seconds, it might
> be better to check for at least 10,000,000 passes.  In contrast, 1000
> passes might elapse just waiting for a cache miss to complete.

Changed it to that large number. This is the diff I used:

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 433491c..e185c04 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1392,14 +1392,19 @@ static void task_waking_fair(struct task_struct *p)
        struct sched_entity *se = &p->se;
        struct cfs_rq *cfs_rq = cfs_rq_of(se);
        u64 min_vruntime;
+       u64 loop_cnt = 0UL;
 
 #ifndef CONFIG_64BIT
        u64 min_vruntime_copy;
-
+       loop_cnt = 0UL;
        do {
                min_vruntime_copy = cfs_rq->min_vruntime_copy;
                smp_rmb();
                min_vruntime = cfs_rq->min_vruntime;
+               if (loop_cnt++ > 10000000) {
+                       printk(KERN_INFO "POKE!\n");
+                       loop_cnt = 0UL;
+               }
        } while (min_vruntime != min_vruntime_copy);
 #else
        min_vruntime = cfs_rq->min_vruntime;

And the log is:
http://darnok.org/xen/loop_cnt.log

which seems to imply that we are indeed stuck in that loop
forever.

> 
> Other possible causes include:

What is really strange is that I can only reproduce this on 32-bit builds.
> 
> o     A mismatch between Xen's and RCU's ideas of how CONFIG_NO_HZ
>       works.  If Xen thinks that the CPU is in CONFIG_NO_HZ's
>       dyntick-idle mode, but RCU thinks otherwise, the grace period
>       might stall.

One sure way to figure this out is to disable CONFIG_NO_HZ right?
Or will that take away task_waking_fair case as well?
> 
> o     Problems due to portions of the code attempting to use
>       RCU read-side critical sections while in dyntick-idle mode.
>       Frederic Weisbecker has located some of these, (though not yet
>       in Xen) and he has some diagnositics which may be found at:
> 
>       git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
> 
>       on branch eqscheck.2011.07.08a.
> 
>       You need to enable CONFIG_PROVE_RCU for these diagnostics to
>       be executed.

Ok, let me try those too.
> 
> o     As always, there might be bugs in RCU.  ;-)
> 
> But the loop in task_waking_fair() looks like the most prominent smoking
> gun at the moment.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>