Re: [Xen-devel] new netfront and occasional receive path lockup

To:	Christophe Saout <christophe@xxxxxxxx>
Subject:	Re: [Xen-devel] new netfront and occasional receive path lockup
From:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date:	Mon, 23 Aug 2010 17:46:37 -0700
Cc:	"Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Mon, 23 Aug 2010 17:47:26 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<1282495384.12843.11.camel@xxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<1282495384.12843.11.camel@xxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100720 Fedora/3.1.1-1.fc13 Lightning/1.0b2pre Thunderbird/3.1.1

 On 08/22/2010 09:43 AM, Christophe Saout wrote:
> Hi,
>
> I've been playing with some of the new pvops code, namely DomU guest
> code.  What I've been observing on one of the virtual machines is that
> the network (vif) is dying after about ten to sixty minutes of uptime.
> The unfortunate thing here is that I can only repoduce it on a
> production VM and have been unlucky so far to trigger the bug on a test
> machine.  While this has not been tragic - rebooting fixed the issue,
> unfortunately I can't spend very much time on debugging after the issue
> pops up.

Ah, OK.  I've seen this a couple of times as well.  And it just happened
to me then...


> Now, what is happening is that the receive path goes dead.  The DomU can
> send packets to Dom0 and those are visible using tcpdump on the Dom0 on
> the virtual interface, but not the other way around.

I hadn't got to that level of diagnosis, but I can confirm that that's
what seems to be happening here too.

> Now, I have done more than one change at a time (I'd like to avoid going
> into pinning it down since I can only reproduce it on a production
> machine, as I said, so suggestions are welcome), but my suspicion is
> that it might have to do with the new "smart polling" feature in
> xen/netfront.  Note that I have also updated Dom0 to pull in the latest
> dom0/backend and netback changes, just to make sure it's not due to an
> issue that has been fixed there, but I'm still seeing the same.

I agree.  I think I started seeing this once I merged smartpoll into
netfront.

    J

> The production machine is a machine that doesn't have much network load,
> but deals with a lot of small network requests (DNS and smtp mostly).  A
> workload which is hard to reproduce on the test machine.  Heavy network
> load (NFS, FTP and so on) for days hasn't triggered the problem.  Also,
> segmentation offloading and similar settings don't have any effect.
>
> The machine has 2 physical and the VM 2 virtual CPUs, DomU has PREEMPT
> enabled.
>
> I've been looking at the code, if there might be a race condition
> somewhere, something like where one could run into a situation where the
> hrtimer doesn't run and Dom0 believes the DomU should be polling and
> doesn't emit an interrupt or something, but I'm afraid I don't know
> enough to judge this (I mean, there are spinlocks which look safe to
> me).
>
> Do you have any suggestions what to try?  I can trigger the issue on the
> production VM again, but debugging should not take more than a few
> minutes if it happens.  Access is only possible via the console.
> Neither Dom0 nor the guest show anything unusual in the kernel message
> and continue to behave normally after the network goes dead (also able
> to shut down the guest normally).
>
> Thanks,
>       Christophe
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] new netfront and occasional receive path lockup