WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: PROBLEM: 3.0-rc kernels unbootable since -rc3

On Tue, Jul 12, 2011 at 12:03:24PM -0400, Konrad Rzeszutek Wilk wrote:
> > > http://darnok.org/xen/cpu1.log
> > 
> > OK, a fair amount of variety, then lots and lots of task_waking_fair(),
> > so I still feel good about asking you for the following.
> .. snup..
> > Hmmm...  Given that this is persisting for many many seconds, it might
> > be better to check for at least 10,000,000 passes.  In contrast, 1000
> > passes might elapse just waiting for a cache miss to complete.
> 
> Changed it to that large number. This is the diff I used:
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 433491c..e185c04 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1392,14 +1392,19 @@ static void task_waking_fair(struct task_struct *p)
>       struct sched_entity *se = &p->se;
>       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>       u64 min_vruntime;
> +     u64 loop_cnt = 0UL;
> 
>  #ifndef CONFIG_64BIT
>       u64 min_vruntime_copy;
> -
> +     loop_cnt = 0UL;
>       do {
>               min_vruntime_copy = cfs_rq->min_vruntime_copy;
>               smp_rmb();
>               min_vruntime = cfs_rq->min_vruntime;
> +             if (loop_cnt++ > 10000000) {
> +                     printk(KERN_INFO "POKE!\n");
> +                     loop_cnt = 0UL;
> +             }
>       } while (min_vruntime != min_vruntime_copy);
>  #else
>       min_vruntime = cfs_rq->min_vruntime;
> 
> And the log is:
> http://darnok.org/xen/loop_cnt.log
> 
> which seems to imply that we are indeed stuck in that loop
> forever.

It does indeed, thank you!  Also it looks like interrupts are
disabled, and that timekeeping is similarly out of action.

> > Other possible causes include:
> 
> What is really strange is that I can only reproduce this on 32-bit builds.

Not strange at all.  If you have a 64-bit build, the function doesn't
have a loop.  ;-)

> > o   A mismatch between Xen's and RCU's ideas of how CONFIG_NO_HZ
> >     works.  If Xen thinks that the CPU is in CONFIG_NO_HZ's
> >     dyntick-idle mode, but RCU thinks otherwise, the grace period
> >     might stall.
> 
> One sure way to figure this out is to disable CONFIG_NO_HZ right?
> Or will that take away task_waking_fair case as well?

Disabling CONFIG_NO_HZ would be an interesting test case.

> > o   Problems due to portions of the code attempting to use
> >     RCU read-side critical sections while in dyntick-idle mode.
> >     Frederic Weisbecker has located some of these, (though not yet
> >     in Xen) and he has some diagnositics which may be found at:
> > 
> >     git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
> > 
> >     on branch eqscheck.2011.07.08a.
> > 
> >     You need to enable CONFIG_PROVE_RCU for these diagnostics to
> >     be executed.
> 
> Ok, let me try those too.

Thank you!

> > o   As always, there might be bugs in RCU.  ;-)
> > 
> > But the loop in task_waking_fair() looks like the most prominent smoking
> > gun at the moment.

And could you also please try out the patch that I posted earlier?

                                                        Thaxn, Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>