WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] segfault in VM

To: James Harper <JamesH@xxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] segfault in VM
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Mon, 19 Jul 2004 08:27:11 +0100
Cc: Derek Glidden <dglidden@xxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 19 Jul 2004 08:29:13 +0100
Envelope-to: steven.hand@xxxxxxxxxxxx
In-reply-to: Your message of "Mon, 19 Jul 2004 15:50:24 +1000." <4C410FCD-C845-4961-AFF2-14037BFFC197@mimectl>
List-archive: <http://sourceforge.net/mailarchive/forum.php?forum=xen-devel>
List-help: <mailto:xen-devel-request@lists.sourceforge.net?subject=help>
List-id: List for Xen developers <xen-devel.lists.sourceforge.net>
List-post: <mailto:xen-devel@lists.sourceforge.net>
List-subscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=subscribe>
List-unsubscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=unsubscribe>
Sender: xen-devel-admin@xxxxxxxxxxxxxxxxxxxxx
Clearly there's some fairly random memory corruption going on, which
then causes segfaults (if the corruption hits code pages) and
filesystem corruption (if the corruption hits buffer-cache pages).

The "Bailing: not a -ve offset" and "GPF (0004):" messages are almost
certainly just symptoms of executing a corrupted block of code. i.e.,
the bug has already triggered some time ago - probably corrupted a
page of glibc or the kernel.

It would be interesting to see whether or not this is SMP-related.
It's also interesting that someone said they couldn't reproduce
corruption when using 2.6.7 for the non-privileged guest OSes.

 -- Keir

> that sounds like the same sort of errors i'm getting which appeared to be 
> filesystem corruption. First the corruption starts, then everything you do 
> causes a segfault, although i've only seen funny things happen in dom0.
> 
> In the limited testing i've done it looks like dom0 by itself is stable, but 
> crashes start occuring once I start up other domains and work dom0 hard 
> (other domains running under light load). I'm running this script in dom0:
> 
> #!/bin/sh
> while [ 1 = 1 ]
> do
>  diff file3 file4 && echo okay
> done
> 
> where file3 and file4 are around 300mb files, and the vm has 128mb of memory 
> with no swap. This ensures that none of the file is cached so there's lots of 
> I/O.
> 
> When i've seen it crash most readily has been when i'm running a few other 
> domains and then start running dom0 out of memory, but nothing conclusive yet.
> 
> I'll let this test keep running for another hour (otherwise idle, no other 
> domains running) or so then start my running-out-of-memory program.
> 
> I wonder if it is coincidence that we both have smp boxes... each of the 
> domains only sees 1 cpu so I wouldn't have thought that would be a problem 
> unless there's a race in xen itself.
> 
> James
> 
> 
> From: Derek Glidden
> Sent: Mon 19/07/2004 3:22 PM
> To: xen-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: [Xen-devel] segfault in VM
> 
> 
> Maybe related or maybe not, but it was the same VM getting all the 
> scheduling time in my previous post.  (SMP Celeron box with 512M of 
> RAM, no himem enabled.)
> 
> At the time, four VMs were all compiling, with dom0 copying a linux 
> source tree from one place to another with rsync.  Everything copacetic 
> until I started the big rsync in dom0, where within a minute or so, vm2 
> bombed.  No messages on the dom0 console or in the VM other than the 
> "Segmentation Fault" in the VM during compliation.
> 
> However XEN (compiled with debug=y) console spits out:
> 
> (XEN) (file=x86_32/emulate.c, line=228) Bailing: not a -ve offset into 
> 4GB segment.
> 
> at the time of the segmentation fault.
> 
> (and there are lots of these, pretty much any time there is heavy i/o 
> on the machine, all with the same values:)
> 
> (XEN) (file=traps.c, line=466) GPF (0004): fc5277a8 -> fc52a294
> 
> Any further activity inside vm2 results in more segmentation faults and 
> more "Bailing" messages.  The other VMs and dom0 seem to be ok.
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> "We all enter this world in the    | Support Electronic Freedom
> same way: naked; screaming; soaked |        http://www.eff.org/
> in blood. But if you live your     |  http://www.anti-dmca.org/
> life right, that kind of thing     |---------------------------
> doesn't have to stop there." -- Dana Gould
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/xen-devel
 -=- MIME -=- 
--_DA10D165-B49A-46A6-8E62-3E81282C36E8_
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
        charset="iso-8859-1";
        format=flowed

that sounds like the same sort of errors i'm getting which appeared to be f=
ilesystem corruption. First the corruption starts, then everything you do c=
auses a segfault, although i've only seen funny things happen in dom0.

In the limited testing i've done it looks like dom0 by itself is stable, bu=
t crashes start occuring once I start up other domains and work dom0 hard (=
other domains running under light load). I'm running this script in dom0:

#!/bin/sh
while [ 1 =3D 1 ]
do
 diff file3 file4 && echo okay
done

where file3 and file4 are around 300mb files, and the vm has 128mb of memor=
y with no swap. This ensures that none of the file is cached so there's lot=
s of I/O.

When i've seen it crash most readily has been when i'm running a few other =
domains and then start running dom0 out of memory, but nothing conclusive y=
et.

I'll let this test keep running for another hour (otherwise idle, no other =
domains running) or so then start my running-out-of-memory program.

I wonder if it is coincidence that we both have smp boxes... each of the do=
mains only sees 1 cpu so I wouldn't have thought that would be a problem un=
less there's a race in xen itself.

James









From: Derek Glidden
Sent: Mon 19/07/2004 3:22 PM
To: xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] segfault in VM


Maybe related or maybe not, but it was the same VM getting all the=20
scheduling time in my previous post.  (SMP Celeron box with 512M of=20
RAM, no himem enabled.)

At the time, four VMs were all compiling, with dom0 copying a linux=20
source tree from one place to another with rsync.  Everything copacetic=20
until I started the big rsync in dom0, where within a minute or so, vm2=20
bombed.  No messages on the dom0 console or in the VM other than the=20
"Segmentation Fault" in the VM during compliation.

However XEN (compiled with debug=3Dy) console spits out:

(XEN) (file=3Dx86_32/emulate.c, line=3D228) Bailing: not a -ve offset into=
=20
4GB segment.

at the time of the segmentation fault.

(and there are lots of these, pretty much any time there is heavy i/o=20
on the machine, all with the same values:)

(XEN) (file=3Dtraps.c, line=3D466) GPF (0004): fc5277a8 -> fc52a294

Any further activity inside vm2 results in more segmentation faults and=20
more "Bailing" messages.  The other VMs and dom0 seem to be ok.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-
"We all enter this world in the    | Support Electronic Freedom
same way: naked; screaming; soaked |        http://www.eff.org/
in blood. But if you live your     |  http://www.anti-dmca.org/
life right, that kind of thing     |---------------------------
doesn't have to stop there." -- Dana Gould



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=3D4721&alloc_id=3D10040&op=3Dclick
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel

--_DA10D165-B49A-46A6-8E62-3E81282C36E8_
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD></HEAD>
<BODY>
<DIV id=3DidOWAReplyText53940 dir=3Dltr>
<DIV dir=3Dltr><FONT face=3DArial color=3D#000000 size=3D2>that sounds like=
 the same sort of errors i'm getting which appeared to be filesystem corrup=
tion. First the corruption starts, then everything you do causes a segfault=
, although i've only seen funny things happen in dom0.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>In the limited testing i've done=
 it looks like dom0 by itself is stable, but crashes start occuring once I =
start up other domains and work dom0 hard (other domains running under ligh=
t load). I'm running this script in dom0:</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>#!/bin/sh<BR>while [ 1 =3D 1 ]<B=
R>do<BR>&nbsp;diff file3 file4 &amp;&amp; echo okay<BR>done<BR></FONT></DIV=
>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>where file3 and file4 are around=
 300mb files, and the vm has 128mb of memory with no swap. This ensures tha=
t none of the file is cached so there's lots of I/O.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>When i've seen it crash most rea=
dily has been when i'm running a few other domains and then start running d=
om0 out of memory, but nothing conclusive yet.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>I'll let this test keep running =
for another hour (otherwise idle, no other domains running) or so then star=
t&nbsp;my running-out-of-memory program.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>I wonder if it is coincidence th=
at we both have smp boxes... each of the domains only sees 1 cpu so I would=
n't have thought that would be a problem unless there's a race in xen itsel=
f.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT><FONT face=3DArial size=
=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>James</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr>&nbsp;</DIV></DIV>
<DIV dir=3Dltr><BR>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> Derek Glidden<BR><B>Sent:</B> Mon=
 19/07/2004 3:22 PM<BR><B>To:</B> xen-devel@xxxxxxxxxxxxxxxxxxxxx<BR><B>Sub=
ject:</B> [Xen-devel] segfault in VM<BR></FONT><BR></DIV>
<DIV><PRE style=3D"WORD-WRAP: break-word">Maybe related or maybe not, but i=
t was the same VM getting all the=20
scheduling time in my previous post.  (SMP Celeron box with 512M of=20
RAM, no himem enabled.)

At the time, four VMs were all compiling, with dom0 copying a linux=20
source tree from one place to another with rsync.  Everything copacetic=20
until I started the big rsync in dom0, where within a minute or so, vm2=20
bombed.  No messages on the dom0 console or in the VM other than the=20
"Segmentation Fault" in the VM during compliation.

However XEN (compiled with debug=3Dy) console spits out:

(XEN) (file=3Dx86_32/emulate.c, line=3D228) Bailing: not a -ve offset into=
=20
4GB segment.

at the time of the segmentation fault.

(and there are lots of these, pretty much any time there is heavy i/o=20
on the machine, all with the same values:)

(XEN) (file=3Dtraps.c, line=3D466) GPF (0004): fc5277a8 -&gt; fc52a294

Any further activity inside vm2 results in more segmentation faults and=20
more "Bailing" messages.  The other VMs and dom0 seem to be ok.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-
"We all enter this world in the    | Support Electronic Freedom
same way: naked; screaming; soaked |        http://www.eff.org/
in blood. But if you live your     |  http://www.anti-dmca.org/
life right, that kind of thing     |---------------------------
doesn't have to stop there." -- Dana Gould



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=3D4721&amp;alloc_id=3D10040&amp;op=3Dclick
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel
</PRE></DIV></BODY></HTML>

--_DA10D165-B49A-46A6-8E62-3E81282C36E8_--


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel