Over the weekend, I tried cpuidle=off and it didn't
make any difference.
I didn't have a chance to fall back to a 2.6.18 test
run but did start up another 2.6.29 run which ran
for over 24 hours before my test script failed with
the following and a stack dump:
"BUG: soft lockup - CPU#3 stuck after 4099s!"
The guest didn't freeze or crash though.
> -----Original Message-----
> From: Dan Magenheimer
> Sent: Friday, April 17, 2009 9:34 AM
> To: Tian, Kevin; Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] Second release candidate for Xen 3.4.0
>
>
> Last night's run ran for over 15 hours before the same
> "blocked for more than 480 seconds" occurred. This
> time the tmem patch was running so I/O was greatly
> reduced, which might account for the change in behavior
> (or it might be completely random).
>
> Interestingly, the domain isn't completely frozen.
> It is still doing some things but is mostly non-responsive.
> I was able to do a ctrl-Z on the console and get the
> normal shell response, but then no prompt. I am also
> able to see stuff by sending it sysrq's using xm.
>
> I'll give cpuidle=off a spin this weekend but...
>
> > Hmm could be the kernel I suppose.
>
> Yes, this article would lead me to believe so:
>
> http://lwn.net/Articles/326490/
>
> I'll also try to reproduce on 2.6.18. If I can't, I'd
> chalk it up as a kernel problem.
>
> Dan
>
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> > Sent: Friday, April 17, 2009 2:13 AM
> > To: Keir Fraser; Dan Magenheimer; xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: RE: [Xen-devel] Second release candidate for Xen 3.4.0
> >
> >
> > >From: Keir Fraser
> > >Sent: 2009年4月17日 16:06
> > >
> > >On 17/04/2009 08:55, "Keir Fraser"
> <keir.fraser@xxxxxxxxxxxxx> wrote:
> > >
> > >> On 16/04/2009 18:09, "Dan Magenheimer"
> > ><dan.magenheimer@xxxxxxxxxx> wrote:
> > >>
> > >>> FYI, I can still reproduce the "blocked for more than
> 480 seconds"
> > >>> problem I reported yesterday. After running >2 hours of load,
> > >>> the 2.6.29 guest spews out a number of Call Trace's and freezes.
> > >>> Each is prefixed with:
> > >>
> > >> Hmm could be the kernel I suppose. Or perhaps there's a time
> > >issue lurking.
> > >
> > >And if the latter, the cpuidle stuff would still be most
> > >likely culprit in
> > >my opinion. Did you repro problems with cpuidle=off?
> > >
> >
> > I think Dan mentioned 'cpuidle=off' in his previous post, but
> > of course
> > it's worthy of further confirmation about this option:
> >
> > > > -----Original Message-----
> > > > From: Dan Magenheimer
> > > > Sent: Wednesday, April 15, 2009 8:59 AM
> > > > To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail);
> Tian, Kevin
> > > > Subject: RE: [Xen-devel] Time goes backwards in dom0 in
> > xen-unstable
> > > >
> > > >
> > > > Hmmm... after only a few minutes with cpuidle=off,
> > > > my test domPV froze up after printing a number of
> > > > call traces starting with:
> > > >
> > > > INFO: task xxx:nnn blocked for more than 480 seconds.
> > > >
> > > > At the top of all of the traces is either
> > > > getnstimeofday+51 or io_schedule+44.
> > > >
> > > > (Note that this PV domain is a 2.6.29 kernel... don't
> > > > know if the messages are the same on an older kernel.)
> >
> > Thanks,
> > Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|