I agree. Since it happens so rarely and the failure is
very visible, we should worry about tracking it later.
From the symptoms, I suspect it is another case where
a rid is not getting mangled or unmangled or something
like that.
> -----Original Message-----
> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> Sent: Monday, October 17, 2005 10:39 PM
> To: Magenheimer, Dan (HP Labs Fort Collins)
> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> xen0 more stable
>
> Yes, I need wait very long to trigger this, the build process
> is very slow on my machine. Can we leave it alone, and
> revisit it later?
>
> >-----Original Message-----
> >From: Magenheimer, Dan (HP Labs Fort Collins)
> [mailto:dan.magenheimer@xxxxxx]
> >Sent: 2005年10月17日 10:49
> >To: Xu, Anthony
> >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to
> make xen0 more stable
> >
> >I ran tests all weekend long. 59 out of 60 builds were
> >successful. One failed, with the same message as below.
> >At least it is reproducible... if you wait long enough :-(
> >
> >> -----Original Message-----
> >> From: Magenheimer, Dan (HP Labs Fort Collins)
> >> Sent: Friday, October 14, 2005 1:57 PM
> >> To: 'Xu, Anthony'
> >> Cc: 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx'
> >> Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> >> xen0 more stable
> >>
> >> After 12 successful builds, I got two in a row that failed
> >> with a segmentation fault. :-( Since the heartbeat is now
> turned off,
> >> I can see that Xen is giving a clue as to what the problem is.
> >> When both faults happened, even though the failure shows up at
> >> a different place in the build I got an identical
> non-fatal message:
> >>
> >> vcpu_translate: bad address: 0000000005a65a69,
> viip=2000000000163750,
> >> vipsr=00001213081a6018, iip=20000000001d6180,
> ipsr=0000101308126018
> >>
> >> I wonder what that address is... I have seen it before.
> >> Perhaps it is predicates?
> >>
> >> I won't have much of an opportunity to look further for this
> >> for awhile so wanted to post what I've seen to date.
> >>
> >> Dan
> >>
> >> > -----Original Message-----
> >> > From: Magenheimer, Dan (HP Labs Fort Collins)
> >> > Sent: Friday, October 14, 2005 12:05 PM
> >> > To: Xu, Anthony
> >> > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> >> > xen0 more stable
> >> >
> >> > There were definitely some bugs involving the itir in
> >> > vcpu_translate. In the process of fixing them,
> >> > I was over-aggressive in cleaning up some code.
> >> > When I backed out some of that cleanup, everything
> >> > seems to be fine. (I still get a couple of NaT fault
> >> > messages every compile, but they seem to be harmless.)
> >> >
> >> > The segfault problem occurs rarely enough that I don't
> >> > know if I fixed it but have run 9 builds without
> >> > a problem now and I definitely fixed some itir
> >> > problems, so I have committed the changeset to
> >> > xen-ia64-unstable.
> >> >
> >> > > -----Original Message-----
> >> > > From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >> > > [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf
> >> > > Of Magenheimer, Dan (HP Labs Fort Collins)
> >> > > Sent: Thursday, October 13, 2005 10:37 PM
> >> > > To: Xu, Anthony
> >> > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> > > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> >> > > xen0 more stable
> >> > >
> >> > > In my testing, I now saw what appeared to be an infinite loop
> >> > > of NaT faults. The "ps" command showed a "sh" with several
> >> > > minutes of CPU time while the console window scrolled
> continually
> >> > > with "NaT fault... attempting to handle as privop". This may
> >> > > or may not be a side effect of the patch I am testing. I'll
> >> > > see if it shows up again (but am logging off now until the
> >> > > morning).
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> >> > > > Sent: Thursday, October 13, 2005 8:41 PM
> >> > > > To: Magenheimer, Dan (HP Labs Fort Collins)
> >> > > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> > > > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> >> > > > xen0 more stable
> >> > > >
> >> > > > We shouldn't see any Nat faults. And I didn't see Nat faults
> >> > > > on my test.
> >> > > >
> >> > > >
> >> > > > >-----Original Message-----
> >> > > > >From: Magenheimer, Dan (HP Labs Fort Collins)
> >> > > > [mailto:dan.magenheimer@xxxxxx]
> >> > > > >Sent: 2005年10月14日 3:59
> >> > > > >To: Xu, Anthony
> >> > > > >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> > > > >Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to
> >> > > > make xen0 more stable
> >> > > > >
> >> > > > >> However, my testing is not going well so far. I had just
> >> > > > >> completed compiling Linux 15 times on tip (with Tristan's
> >> > > > >> SMP patch) without any problems, but 2 of 5 runs
> so far with
> >> > > > >> this new patch failed with segment faults.
> >> > > > >
> >> > > > >Followed by six successful builds :-%
> >> > > > >
> >> > > > >I'm going to assume this is a random occurrence of a bug
> >> > > > >unrelated to your patch that happens to occur only every
> >> > > > >few hours or so and will commit your patch.
> >> > > > >
> >> > > > >By the way, I am now seeing two NaT faults per Linux build
> >> > > > >that are printing "attempting to handle as privop."
> >> > > > >I assume your fix exposed these but the messages are
> >> > > > >harmless?
> >> > > > >
> >> > > > >Dan
> >> > > >
> >> > >
> >> >
> >>
>
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|