WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] segfault in VM

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>, Derek Glidden <dglidden@xxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] segfault in VM
From: James Harper <JamesH@xxxxxxxxxxxxxxxx>
Date: Fri, 23 Jul 2004 11:03:57 +1000
Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 23 Jul 2004 02:13:55 +0100
Envelope-to: steven.hand@xxxxxxxxxxxx
In-reply-to: <E1Bnhft-00061v-00@xxxxxxxxxxxxxxxxx>
List-archive: <http://sourceforge.net/mailarchive/forum.php?forum=xen-devel>
List-help: <mailto:xen-devel-request@lists.sourceforge.net?subject=help>
List-id: List for Xen developers <xen-devel.lists.sourceforge.net>
List-post: <mailto:xen-devel@lists.sourceforge.net>
List-subscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=subscribe>
List-unsubscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=unsubscribe>
References: <E1Bnhft-00061v-00@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-admin@xxxxxxxxxxxxxxxxxxxxx
Thread-index: AcRwUO5n9ikKFf1YQS+BDBbcCKNn1g==
Thread-topic: [Xen-devel] segfault in VM
I just made a change so that the skbuf is always copied in netif_be_start_xmit but it still crashes, which means most likely that bit is fine or at least isn't the only code containing bugs.
 
As another test I also put the 'goto done;' after the 'if ( skb_shared(skb) || skb_cloned(skb) || ...' block, (still block the receive but do it later) and there were no crashes, so i'm comfortable that we've exhausted netif_be_start_xmit as a source for bugs.
 
So I guess that leaves net_rx_action. I'm unsure on one thing though, the pages that get passed from dom0 to domU, how/where/do they get recycled back to dom0? Is it possible that domU could still write to a page that dom0 thought it had free to use for something else? If so, where would that be?
 
Keir: have you been able to reproduce these errors at all?
 
James
 


From: Keir Fraser
Sent: Fri 23/07/2004 3:48 AM
To: Derek Glidden
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] segfault in VM

It's useful to have the extra data points -- it adds to our confidence
that it's the network driver that is somehow at fault here.

Quite how to proceed in narrowing down the problem is
unclear. One approach is to perturb the backend driver's data path
(e.g., always copying packets into a known-safe page-sized buffer, as
a check that our current copy-avoidancxe checks are not at fault; and
replacing the current high-performance but convoluted code for
batching hypercalls with something slower but easier to grok). The
latter is useful because if the bug goes away then we have a smaller
chunk of code to look at; if the bug remains then we end up with a
less complex data path that is easier to instrument and bughunt.

If anyone is interested in pursuing this bug independently, the
functions most under suspicion are netif_be_start_xmit and
net_rx_action, both in linux/arch/xen/drivers/netif/backend/main.c.
These two form the data path for packets getting sent to guest OSes.

 -- Keir


> 
> On Jul 22, 2004, at 7:22 AM, Keir Fraser wrote:
> >
> > Anyway - currently sounds like teh bug resides in the most complex
> > half of the most complex driver. Who'd've thought it? ;-)
> 
> At this point this data is surely redundant but...
> 
> When I went to sleep last night I let my box run dom0 and four VMs 
> doing md5sum checks on a couple of large files, hammering the heck out 
> of the block i/o drivers and CPU but with all the ifaces/vifs on the 
> machine down.  When I woke up, all compares had been correct for the 
> six hours or so it ran.  I re-upped the ifaces and started to ping dom0 
> and the VMs and within a minute of the pings starting dom0 started to 
> report incorrect md5sums.
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> "We all enter this world in the    | Support Electronic Freedom
> same way: naked; screaming; soaked |        http://www.eff.org/
> in blood. But if you live your     |  http://www.anti-dmca.org/
> life right, that kind of thing     |---------------------------
> doesn't have to stop there." -- Dana Gould
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel