xen-devel
RE: [Xen-devel] memory error?
In one of my follow up posts i mentioned that I have now seen this behaviour on another similar server, and seen reports of it on the web. Basically under some circumstances the tlan driver gets or causes pci parity errors and barfs, most likely it doesn't play well with other cards. I've since moved the realtek (natsemi) adapter to another slot which is on a different pci bus to the tlan (that's what I love about servers - separate pci busses to play with!), and have so far not had any more errors. I'm just building the latest version of xen and then will boot into it and give it a thorough thrashing.
But so far it looks like the bulk of the problems I have been experiencing lately were of my own making.
James
From: Ian Pratt Sent: Tue 3/08/2004 4:42 PM To: James Harper Cc: xen-devel@xxxxxxxxxxxxxxxxxxxxx; Ian.Pratt@xxxxxxxxxxxx Subject: Re: [Xen-devel] memory error?
> I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i've been having. This particular error is occuring when i'm not running xen so is obviously not something brought on by xen itself.
>
> The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error = 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs...
>
> Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot?
Hmm, given that it's not something we've ever been able to test,
'spontaneous reboot' sounds quite possible...
In normal operation, it's relatively hard for Xen to reboot
without printing anything. It requires a 'triple fault', which
basically means the hypervisor area of the pagetable has to be
corrupt. We haven't seen a bug like that for a very long time.
The link between the NMI and the adaptor error is interesting. I
wander if its a parity error on the PCI bus rather than a memory
ECC failure? Try re-seating the PCI card?
Ian
> I'm running memtest now, and will run memtest86 once I am back in the office.
>
> James
>
> eth2: Promiscuous mode enabled.
> eth2: Promiscuous mode enabled.
> br2: port 1(eth2) entering learning state
> br2: port 1(eth2) entering forwarding state
> br2: topology change detected, propagating
> Uhhuh. NMI received. Dazed and confused, but trying to continue
> You probably have a hardware problem with your RAM chips
> TLAN: eth0: Adaptor Error = 0x180002
> TLAN: eth0: Starting autonegotiation.
> TLAN: eth0: Autonegotiation complete.
> TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex
> TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD
> TLAN: eth0: Adaptor Error = 0x180002
> TLAN: eth0: Starting autonegotiation.
> TLAN: eth0: Autonegotiation complete.
> TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex
> TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD
-=- MIME -=-
--_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
I have just noticed this message in my kernel logs, reporting the possibili=
ty of an error with my memory. This would go a long way towards explaining =
the problems i've been having. This particular error is occuring when i'm n=
ot running xen so is obviously not something brought on by xen itself.
The strange thing is that the NMI error is always followed by the TLAN: eth=
0: Adaptor Error =3D 0x180002, which says to me that either there is someth=
ing wrong with my network card which is triggering an NMI, or that the NMI =
triggers an error in that network adapter. The memory itself is ECC memory =
in a Compaq Proliant 1600, maybe i can access the memory logs...
Either way, what would xen do upon receiving an NMI? Would it spontaneously=
reboot?
I'm running memtest now, and will run memtest86 once I am back in the offic=
e.
James
eth2: Promiscuous mode enabled.
eth2: Promiscuous mode enabled.
br2: port 1(eth2) entering learning state
br2: port 1(eth2) entering forwarding state
br2: topology change detected, propagating
Uhhuh. NMI received. Dazed and confused, but trying to continue
You probably have a hardware problem with your RAM chips
TLAN: eth0: Adaptor Error =3D 0x180002
TLAN: eth0: Starting autonegotiation.
TLAN: eth0: Autonegotiation complete.
TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duple=
x
TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD
TLAN: eth0: Adaptor Error =3D 0x180002
TLAN: eth0: Starting autonegotiation.
TLAN: eth0: Autonegotiation complete.
TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duple=
x
TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD
--_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<HTML dir=3Dltr><HEAD></HEAD>
<BODY>
<DIV><FONT face=3DArial color=3D#000000 size=3D2>I have just noticed this m=
essage in my kernel logs, reporting the possibility of an error with my mem=
ory. This would go a long way towards explaining the problems i've been hav=
ing. This particular error is occuring when i'm not running xen so is obvio=
usly not something brought on by xen itself.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>The strange thing is that the NMI err=
or is always followed by the TLAN: eth0: Adaptor Error =3D 0x180002, which =
says to me that either there is something wrong with my network card which =
is triggering an NMI, or that the NMI triggers an error in that network ada=
pter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i ca=
n access the memory logs...</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Either way, what would xen do upon receivi=
ng an NMI? Would it spontaneously reboot?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>I'm running memtest now, and will run memt=
est86 once I am back in the office.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>James</FONT></DIV>
<DIV><FONT face=3DArial color=3D#000000 size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial color=3D#000000 size=3D2>eth2: Promiscuous mode ena=
bled.<BR>eth2: Promiscuous mode enabled.<BR>br2: port 1(eth2) entering lear=
ning state<BR>br2: port 1(eth2) entering forwarding state<BR>br2: topology =
change detected, propagating<BR>Uhhuh. NMI received. Dazed and confused, bu=
t trying to continue<BR>You probably have a hardware problem with your RAM =
chips<BR>TLAN: eth0: Adaptor Error =3D 0x180002<BR>TLAN: eth0: Starti=
ng autonegotiation.<BR>TLAN: eth0: Autonegotiation complete.<BR>TLAN: eth0:=
Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex<BR>TLAN: =
Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD<BR>TLAN=
: eth0: Adaptor Error =3D 0x180002<BR>TLAN: eth0: Starting autonegoti=
ation.<BR>TLAN: eth0: Autonegotiation complete.<BR>TLAN: eth0: Link active =
with AutoNegotiation enabled, at 100Mbps Full-Duplex<BR>TLAN: Partner capab=
ility: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD<BR></DIV></FONT>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV></BODY></HTML>
--_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_--
-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel
|
|
|