RE: [Xen-devel] new netfront and occasional receive path lockup

To:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Christophe Saout <christophe@xxxxxxxx>
Subject:	RE: [Xen-devel] new netfront and occasional receive path lockup
From:	"Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>
Date:	Wed, 25 Aug 2010 08:51:09 +0800
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Tue, 24 Aug 2010 17:52:14 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<4C73166D.3030000@xxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<1282495384.12843.11.camel@xxxxxxxxxxxxxxxxxxxx> <4C73166D.3030000@xxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	ActDJdQ+/HvWcP4hRluZDdw/kuis0wAyVNpw
Thread-topic:	[Xen-devel] new netfront and occasional receive path lockup

Hi Christophe,

Thanks for finding and checking the problem.
I will try to reproduce the issue and check what caused the problem.

Thanks,
Dongxiao

Jeremy Fitzhardinge wrote:
>  On 08/22/2010 09:43 AM, Christophe Saout wrote:
>> Hi,
>> 
>> I've been playing with some of the new pvops code, namely DomU guest
>> code.  What I've been observing on one of the virtual machines is
>> that 
>> the network (vif) is dying after about ten to sixty minutes of
>> uptime. 
>> The unfortunate thing here is that I can only repoduce it on a
>> production VM and have been unlucky so far to trigger the bug on a
>> test machine.  While this has not been tragic - rebooting fixed the
>> issue, unfortunately I can't spend very much time on debugging after
>> the issue pops up.
> 
> Ah, OK.  I've seen this a couple of times as well.  And it just
> happened to me then... 
> 
> 
>> Now, what is happening is that the receive path goes dead.  The DomU
>> can send packets to Dom0 and those are visible using tcpdump on the
>> Dom0 on the virtual interface, but not the other way around.
> 
> I hadn't got to that level of diagnosis, but I can confirm that
> that's what seems to be happening here too. 
> 
>> Now, I have done more than one change at a time (I'd like to avoid
>> going into pinning it down since I can only reproduce it on a
>> production machine, as I said, so suggestions are welcome), but my
>> suspicion is that it might have to do with the new "smart polling"
>> feature in xen/netfront.  Note that I have also updated Dom0 to pull
>> in the latest dom0/backend and netback changes, just to make sure
>> it's 
>> not due to an issue that has been fixed there, but I'm still seeing
>> the same. 
> 
> I agree.  I think I started seeing this once I merged smartpoll into
> netfront. 
> 
>     J
> 
>> The production machine is a machine that doesn't have much network
>> load, but deals with a lot of small network requests (DNS and smtp
>> mostly).  A workload which is hard to reproduce on the test machine.
>> Heavy network load (NFS, FTP and so on) for days hasn't triggered the
>> problem.  Also, segmentation offloading and similar settings don't
>> have any effect. 
>> 
>> The machine has 2 physical and the VM 2 virtual CPUs, DomU has
>> PREEMPT 
>> enabled.
>> 
>> I've been looking at the code, if there might be a race condition
>> somewhere, something like where one could run into a situation where
>> the hrtimer doesn't run and Dom0 believes the DomU should be polling
>> and doesn't emit an interrupt or something, but I'm afraid I don't
>> know enough to judge this (I mean, there are spinlocks which look
>> safe 
>> to me).
>> 
>> Do you have any suggestions what to try?  I can trigger the issue on
>> the production VM again, but debugging should not take more than a
>> few 
>> minutes if it happens.  Access is only possible via the console.
>> Neither Dom0 nor the guest show anything unusual in the kernel
>> message 
>> and continue to behave normally after the network goes dead (also
>> able 
>> to shut down the guest normally).
>> 
>> Thanks,
>>      Christophe
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] new netfront and occasional receive path lockup