WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] segfault in VM

To: James Harper <JamesH@xxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] segfault in VM
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Thu, 22 Jul 2004 03:03:33 +0100
Cc: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 22 Jul 2004 03:05:02 +0100
Envelope-to: steven.hand@xxxxxxxxxxxx
In-reply-to: Your message of "Thu, 22 Jul 2004 11:57:13 +1000." <E140B4A4-805C-4F78-A9CE-4681DB2277B3@mimectl>
List-archive: <http://sourceforge.net/mailarchive/forum.php?forum=xen-devel>
List-help: <mailto:xen-devel-request@lists.sourceforge.net?subject=help>
List-id: List for Xen developers <xen-devel.lists.sourceforge.net>
List-post: <mailto:xen-devel@lists.sourceforge.net>
List-subscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=subscribe>
List-unsubscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=unsubscribe>
Sender: xen-devel-admin@xxxxxxxxxxxxxxxxxxxxx
> i'm building this now, and am just thinking about how to test this... I was 
> using a ping as my test mechanism. I guess i'll do lots of block device 
> copies. I guess this lends weight to your thoughts that it probably is a net 
> problem and not a block problem.
> 
> Instead of changing the source code to disable the net stuff, would it work 
> if I just specified 'nics=0' or is some part of the net subsystem still 
> activated? I'll test this too anyway.

I think the source will need to be changed. In any case, it's a
trivial change and then we can be certain that no device channel is
being set up.

> In order to test disabling send or receive, this might be a bit trickier than 
> you first make out. Send-only should be easy enough, just start another 
> domain and then ping it (a manual arp table entry should alleviate the need 
> to broadcast). Receive-only will be tricker. How do you get a domain to send 
> to it? This problem of course assumes that corruption is not limited to the 
> domain... if it is limited to the domain then you should be able to have a 
> send/receive domain and ignore crashes in there, just focus on the crashes in 
> the receive-only domain.

That's the reason for the broadcast ping. Unfortunately I'm not sure
how useful that will turn out to be -- e.g., we may just end up hosing
DOM0. 

> i'm almost confused, but am about to start testing - firstly with no network.

Stage 1 (isolating blkdev and network) shouldn't be too
hard. Basically we're ensuring the data paths in teh backend drivers
do not get executed -- they will only ever execute if there is a
device channel set up to a frontend in another guest, so disabling the
frontend drivers ensures this.

 -- Keir


> James
> 
> 
> From: Keir Fraser
> Sent: Wed 21/07/2004 11:30 PM
> To: James Harper
> Cc: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] segfault in VM
> 
> 
> Could someone try to isolate this to either the network backend driver
> or the blkdev backend driver?
> 
> The best way to do this is to disable the frontend drivers so that
> they never try to coinnect to the backend driver...
> 
> To disable networking:
> Edit arch/xen/drivers/netif/frontend/main.c. Change netif_init() to
> always 'return 0;'.
> 
> To disable block devices:
> Edit arch/xen/drivers/blkif/frontend/main.c. Change xlblk_init() to
> always 'return 0;'.
> 
> Oh yes -- the 2.4 sparse tree no longer contains the net frontend
> driver - you'll find the build tree symlinks to
> linux-2.6.7-xen-sparse/drivers/xen/net/network.c. So you might want to
> edit that instead...
> 
> Obviously, if you disable blkdevs you'll need to boot off a ramdisk
> or via a networked mount. :-)
> 
>  Cheers,
>  Keir
> 
> 
> > I downloaded these (from a tgz that Keir had given me a link to as bk was 
> > down - I assume it's identical to his latest fixes) and started my tests 
> > running and went to bed, but it looks like I got errors within a very short 
> > time.
> > The tests I was running were my 'compare' script and pinging the two 
> > domains I had running with
> > ping -q -i 0.01 -s 1400 <ip address>
> > 
> > Lots of oopses in the logs, most are probably as a result of the corruption 
> > and not indicative of the cause. They look similar to Jody's dump so I 
> > won't bother sending them unless someone thinks they might be useful.
> > 
> > btw, can the install be modified to give us a System.map-2.4.26-xen[0U] in 
> > /boot? ksymoops would be much happier.
> > 
> > James
 -=- MIME -=- 
--_6A1C7D2E-1D2E-47A8-818D-57D5389770AA_
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

i'm building this now, and am just thinking about how to test this... I was=
 using a ping as my test mechanism. I guess i'll do lots of block device co=
pies. I guess this lends weight to your thoughts that it probably is a net =
problem and not a block problem.

Instead of changing the source code to disable the net stuff, would it work=
 if I just specified 'nics=3D0' or is some part of the net subsystem still =
activated? I'll test this too anyway.

In order to test disabling send or receive, this might be a bit trickier th=
an you first make out. Send-only should be easy enough, just start another =
domain and then ping it (a manual arp table entry should alleviate the need=
 to broadcast). Receive-only will be tricker. How do you get a domain to se=
nd to it? This problem of course assumes that corruption is not limited to =
the domain... if it is limited to the domain then you should be able to hav=
e a send/receive domain and ignore crashes in there, just focus on the cras=
hes in the receive-only domain.

i'm almost confused, but am about to start testing - firstly with no networ=
k.

James


From: Keir Fraser
Sent: Wed 21/07/2004 11:30 PM
To: James Harper
Cc: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] segfault in VM


Could someone try to isolate this to either the network backend driver
or the blkdev backend driver?

The best way to do this is to disable the frontend drivers so that
they never try to coinnect to the backend driver...

To disable networking:
Edit arch/xen/drivers/netif/frontend/main.c. Change netif_init() to
always 'return 0;'.

To disable block devices:
Edit arch/xen/drivers/blkif/frontend/main.c. Change xlblk_init() to
always 'return 0;'.

Oh yes -- the 2.4 sparse tree no longer contains the net frontend
driver - you'll find the build tree symlinks to
linux-2.6.7-xen-sparse/drivers/xen/net/network.c. So you might want to
edit that instead...

Obviously, if you disable blkdevs you'll need to boot off a ramdisk
or via a networked mount. :-)

 Cheers,
 Keir


> I downloaded these (from a tgz that Keir had given me a link to as bk was=
 down - I assume it's identical to his latest fixes) and started my tests r=
unning and went to bed, but it looks like I got errors within a very short =
time.
> The tests I was running were my 'compare' script and pinging the two doma=
ins I had running with
> ping -q -i 0.01 -s 1400 <ip address>
>=20
> Lots of oopses in the logs, most are probably as a result of the corrupti=
on and not indicative of the cause. They look similar to Jody's dump so I w=
on't bother sending them unless someone thinks they might be useful.
>=20
> btw, can the install be modified to give us a System.map-2.4.26-xen[0U] i=
n /boot? ksymoops would be much happier.
>=20
> James

--_6A1C7D2E-1D2E-47A8-818D-57D5389770AA_
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD></HEAD>
<BODY>
<DIV id=3DidOWAReplyText8898 dir=3Dltr>
<DIV dir=3Dltr><FONT face=3DArial color=3D#000000 size=3D2>i'm building thi=
s now, and am</FONT><FONT face=3DArial size=3D2> just thinking about how to=
 test this... I was using a ping as my test mechanism. I guess i'll do lots=
 of block device copies. I guess this lends weight to your thoughts that it=
 probably is a net problem and not a block problem.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>Instead of changing the source c=
ode to disable the net stuff, would it work if I just specified 'nics=3D0' =
or is some part of the net subsystem still activated? </FONT><FONT face=3DA=
rial size=3D2>I'll test this too anyway.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>In order to test disabling send =
or receive, this might be a bit trickier than you first make out. Send-only=
 should be easy enough, just start another domain and then ping it (a manua=
l arp table entry should alleviate the need to broadcast). Receive-only wil=
l be tricker. How do you get a domain to send to it? This problem of course=
 assumes that corruption is not&nbsp;limited to the domain... if it is limi=
ted to the domain then you should be able to have a send/receive domain and=
 ignore crashes in there, just focus on the crashes in the receive-only dom=
ain.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>i'm almost confused, but am abou=
t to start testing - firstly with no network.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>James</FONT></DIV></DIV>
<DIV dir=3Dltr>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> Keir Fraser<BR><B>Sent:</B> Wed 2=
1/07/2004 11:30 PM<BR><B>To:</B> James Harper<BR><B>Cc:</B> Keir Fraser; xe=
n-devel@xxxxxxxxxxxxxxxxxxxxx<BR><B>Subject:</B> Re: [Xen-devel] segfault i=
n VM<BR></FONT><BR></DIV>
<DIV><PRE style=3D"WORD-WRAP: break-word">Could someone try to isolate this=
 to either the network backend driver
or the blkdev backend driver?

The best way to do this is to disable the frontend drivers so that
they never try to coinnect to the backend driver...

To disable networking:
Edit arch/xen/drivers/netif/frontend/main.c. Change netif_init() to
always 'return 0;'.

To disable block devices:
Edit arch/xen/drivers/blkif/frontend/main.c. Change xlblk_init() to
always 'return 0;'.

Oh yes -- the 2.4 sparse tree no longer contains the net frontend
driver - you'll find the build tree symlinks to
linux-2.6.7-xen-sparse/drivers/xen/net/network.c. So you might want to
edit that instead...

Obviously, if you disable blkdevs you'll need to boot off a ramdisk
or via a networked mount. :-)

 Cheers,
 Keir


&gt; I downloaded these (from a tgz that Keir had given me a link to as bk =
was down - I assume it's identical to his latest fixes) and started my test=
s running and went to bed, but it looks like I got errors within a very sho=
rt time.
&gt; The tests I was running were my 'compare' script and pinging the two d=
omains I had running with
&gt; ping -q -i 0.01 -s 1400 &lt;ip address&gt;
&gt;=20
&gt; Lots of oopses in the logs, most are probably as a result of the corru=
ption and not indicative of the cause. They look similar to Jody's dump so =
I won't bother sending them unless someone thinks they might be useful.
&gt;=20
&gt; btw, can the install be modified to give us a System.map-2.4.26-xen[0U=
] in /boot? ksymoops would be much happier.
&gt;=20
&gt; James
</PRE></DIV></BODY></HTML>

--_6A1C7D2E-1D2E-47A8-818D-57D5389770AA_--



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel