WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out i

To: Anthony Wright <anthony@xxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Fri, 26 Aug 2011 10:44:38 -0400
Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>, Todd Deshane <todd.deshane@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 26 Aug 2011 07:45:43 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110826142606.GA25511@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <24093349.14.1311837878822.JavaMail.root@xxxxxxxxxxxxxxxxxxxxxx> <CAMrPLWKGtozo6YK5FXdJzR4duzsxvR6F7Fuj4_0b4x6GkLateA@xxxxxxxxxxxxxx> <4E31820C.5030200@xxxxxxxxxxxxxxx> <1311870512.24408.153.camel@xxxxxxxxxxxxxxxxxxxxxx> <4E3266DE.9000606@xxxxxxxxxxxxxxx> <20110803152841.GA2860@xxxxxxxxxxxx> <4E4E3957.1040007@xxxxxxxxxxxxxxx> <20110819125615.GA26558@xxxxxxxxxxxx> <4E56B132.9050708@xxxxxxxxxxxxxxx> <20110826142606.GA25511@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Aug 26, 2011 at 10:26:06AM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 25, 2011 at 09:31:46PM +0100, Anthony Wright wrote:
> > On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
> > >> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
> > >>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
> > >>>> I've just upgraded to xen 4.1.1 with a stock 3.0 kernel on dom0 (with
> > >>>> the vga-support patch backported). I can't get my DomU's to work due to
> > >>>> the phy disks and vifs timing out in DomU and looking through my logs
> > >>>> this morning I'm getting a consistent kernel bug report with xen
> > >>>> mentioned at the top of the stack trace and vifdisconnect mentioned on
> > >>> Yikes! Ian any ideas what to try?
> > >>>
> > >>> Anthony, can you compile the kernel with debug=y and when this happens
> > >>> see what 'xl dmesg' gives? Also there is also the 'xl debug-keys g' 
> > >>> which
> > >>> should dump the grants in use.. that might help a bit.
> > >> I've compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of other
> > >> config values appeared at this point, and I took defaults for them).
> > >>
> > >> The output from /var/log/messages & 'xl dmesg' is attached. There was no
> > >> output from 'xl debug-keys g'.
> > > Ok, so I am hitting this too - I was hoping that the patch from Stefano
> > > would have fixed the issue, but sadly it did not.
> > >
> > > Let me (I am traveling right now) see if I can come up with an internim
> > > solution until Ian comes with the right fix.
> > >
> > Hi Konrad - any progress on this - it's a bit of a show stopper for me.
> 
> What is interesting is that it happens only with 32-bit guests and with
> not-so fast hardware: Atom D510 for me and in your case MSI MS-7309 
> motherboard
> (with what kind of processor?). I've a 64-bit hypervisor - not sure if you
> are using a 32-bit or 64-bit.
> 
> I hadn't tried to reproduce this on the Atom D510 with a 64-bit Dom0.
> But I was wondering if you had this setup before - with a 64-bit dom0?
> Or is that really not an option with your CPU?

So while I am still looking at the hypervisor code to figure out why
it would give me:

(XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000

I've cobbled this patch^H^H^Hhack to retry the transaction to see if this is
a tempory issue (race) or really - somehow that L1 PTE is gone.

If you could, can you try it out and see if the errors that are spit
are repeated - mainly the "Could not find L1 PTE". You will need to
run the hypervisor with "loglvl=all" to get that information.

to compile the hypervisor with debug=y to get that

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index fd00f25..7bee981 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1607,7 +1607,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
        struct gnttab_map_grant_ref op;
        struct xen_netif_tx_sring *txs;
        struct xen_netif_rx_sring *rxs;
-
+       int retry = 3;
        int err = -ENOMEM;
 
        vif->tx_comms_area = alloc_vm_area(PAGE_SIZE);
@@ -1620,7 +1620,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 
        gnttab_set_map_op(&op, (unsigned long)vif->tx_comms_area->addr,
                          GNTMAP_host_map, tx_ring_ref, vif->domid);
-
+       op.status = 0;
+retry_tx:
        if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
                BUG();
 
@@ -1628,6 +1629,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
                netdev_warn(vif->dev,
                            "failed to map tx ring. err=%d status=%d\n",
                            err, op.status);
+               if (retry-- > 0)
+                       goto retry_tx;
                err = op.status;
                goto err;
        }
@@ -1641,6 +1644,9 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
        gnttab_set_map_op(&op, (unsigned long)vif->rx_comms_area->addr,
                          GNTMAP_host_map, rx_ring_ref, vif->domid);
 
+       retry = 3;
+       op.status = 0;
+retry_rx:
        if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
                BUG();
 
@@ -1648,6 +1654,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
                netdev_warn(vif->dev,
                            "failed to map rx ring. err=%d status=%d\n",
                            err, op.status);
+               if (retry-- > 0)
+                       goto retry_rx;
                err = op.status;
                goto err;
        }
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>