WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Second release candidate for Xen 3.4.0

To: dan.magenheimer@xxxxxxxxxx, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] Second release candidate for Xen 3.4.0
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Mon, 20 Apr 2009 15:58:18 +0000 (GMT)
Cc:
Delivery-date: Mon, 20 Apr 2009 08:59:25 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <d3eecd16-a56f-426e-87d0-14d9c6f98e25@default>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Over the weekend, I tried cpuidle=off and it didn't
make any difference.

I didn't have a chance to fall back to a 2.6.18 test
run but did start up another 2.6.29 run which ran
for over 24 hours before my test script failed with
the following and a stack dump:

"BUG: soft lockup - CPU#3 stuck after 4099s!"

The guest didn't freeze or crash though.

> -----Original Message-----
> From: Dan Magenheimer
> Sent: Friday, April 17, 2009 9:34 AM
> To: Tian, Kevin; Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] Second release candidate for Xen 3.4.0
>
>
> Last night's run ran for over 15 hours before the same
> "blocked for more than 480 seconds" occurred.  This
> time the tmem patch was running so I/O was greatly
> reduced, which might account for the change in behavior
> (or it might be completely random).
>
> Interestingly, the domain isn't completely frozen.
> It is still doing some things but is mostly non-responsive.
> I was able to do a ctrl-Z on the console and get the
> normal shell response, but then no prompt.  I am also
> able to see stuff by sending it sysrq's using xm.
>
> I'll give cpuidle=off a spin this weekend but...
>
> > Hmm could be the kernel I suppose.
>
> Yes, this article would lead me to believe so:
>
> http://lwn.net/Articles/326490/
>
> I'll also try to reproduce on 2.6.18.  If I can't, I'd
> chalk it up as a kernel problem.
>
> Dan
>
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> > Sent: Friday, April 17, 2009 2:13 AM
> > To: Keir Fraser; Dan Magenheimer; xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: RE: [Xen-devel] Second release candidate for Xen 3.4.0
> >
> >
> > >From: Keir Fraser
> > >Sent: 2009年4月17日 16:06
> > >
> > >On 17/04/2009 08:55, "Keir Fraser"
> <keir.fraser@xxxxxxxxxxxxx> wrote:
> > >
> > >> On 16/04/2009 18:09, "Dan Magenheimer"
> > ><dan.magenheimer@xxxxxxxxxx> wrote:
> > >>
> > >>> FYI, I can still reproduce the "blocked for more than
> 480 seconds"
> > >>> problem I reported yesterday.  After running >2 hours of load,
> > >>> the 2.6.29 guest spews out a number of Call Trace's and freezes.
> > >>> Each is prefixed with:
> > >>
> > >> Hmm could be the kernel I suppose. Or perhaps there's a time
> > >issue lurking.
> > >
> > >And if the latter, the cpuidle stuff would still be most
> > >likely culprit in
> > >my opinion. Did you repro problems with cpuidle=off?
> > >
> >
> > I think Dan mentioned 'cpuidle=off' in his previous post, but
> > of course
> > it's worthy of further confirmation about this option:
> >
> > > > -----Original Message-----
> > > > From: Dan Magenheimer
> > > > Sent: Wednesday, April 15, 2009 8:59 AM
> > > > To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail);
> Tian, Kevin
> > > > Subject: RE: [Xen-devel] Time goes backwards in dom0 in
> > xen-unstable
> > > >
> > > >
> > > > Hmmm... after only a few minutes with cpuidle=off,
> > > > my test domPV froze up after printing a number of
> > > > call traces starting with:
> > > >
> > > > INFO: task xxx:nnn blocked for more than 480 seconds.
> > > >
> > > > At the top of all of the traces is either
> > > > getnstimeofday+51 or io_schedule+44.
> > > >
> > > > (Note that this PV domain is a 2.6.29 kernel... don't
> > > > know if the messages are the same on an older kernel.)
> >
> > Thanks,
> > Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel