Re: [Xen-devel] new netfront and occasional receive path lockup

On Wed, Aug 25, 2010 at 08:51:09AM +0800, Xu, Dongxiao wrote:
> Hi Christophe,
> 
> Thanks for finding and checking the problem.
> I will try to reproduce the issue and check what caused the problem.
> 

Hello,

Was this issue resolved? Some users have been complaining
"network freezing up" issues recently on ##xen on irc..

-- Pasi

> Thanks,
> Dongxiao
> 
> Jeremy Fitzhardinge wrote:
> >  On 08/22/2010 09:43 AM, Christophe Saout wrote:
> >> Hi,
> >> 
> >> I've been playing with some of the new pvops code, namely DomU guest
> >> code.  What I've been observing on one of the virtual machines is
> >> that 
> >> the network (vif) is dying after about ten to sixty minutes of
> >> uptime. 
> >> The unfortunate thing here is that I can only repoduce it on a
> >> production VM and have been unlucky so far to trigger the bug on a
> >> test machine.  While this has not been tragic - rebooting fixed the
> >> issue, unfortunately I can't spend very much time on debugging after
> >> the issue pops up.
> > 
> > Ah, OK.  I've seen this a couple of times as well.  And it just
> > happened to me then... 
> > 
> > 
> >> Now, what is happening is that the receive path goes dead.  The DomU
> >> can send packets to Dom0 and those are visible using tcpdump on the
> >> Dom0 on the virtual interface, but not the other way around.
> > 
> > I hadn't got to that level of diagnosis, but I can confirm that
> > that's what seems to be happening here too. 
> > 
> >> Now, I have done more than one change at a time (I'd like to avoid
> >> going into pinning it down since I can only reproduce it on a
> >> production machine, as I said, so suggestions are welcome), but my
> >> suspicion is that it might have to do with the new "smart polling"
> >> feature in xen/netfront.  Note that I have also updated Dom0 to pull
> >> in the latest dom0/backend and netback changes, just to make sure
> >> it's 
> >> not due to an issue that has been fixed there, but I'm still seeing
> >> the same. 
> > 
> > I agree.  I think I started seeing this once I merged smartpoll into
> > netfront. 
> > 
> >     J
> > 
> >> The production machine is a machine that doesn't have much network
> >> load, but deals with a lot of small network requests (DNS and smtp
> >> mostly).  A workload which is hard to reproduce on the test machine.
> >> Heavy network load (NFS, FTP and so on) for days hasn't triggered the
> >> problem.  Also, segmentation offloading and similar settings don't
> >> have any effect. 
> >> 
> >> The machine has 2 physical and the VM 2 virtual CPUs, DomU has
> >> PREEMPT 
> >> enabled.
> >> 
> >> I've been looking at the code, if there might be a race condition
> >> somewhere, something like where one could run into a situation where
> >> the hrtimer doesn't run and Dom0 believes the DomU should be polling
> >> and doesn't emit an interrupt or something, but I'm afraid I don't
> >> know enough to judge this (I mean, there are spinlocks which look
> >> safe 
> >> to me).
> >> 
> >> Do you have any suggestions what to try?  I can trigger the issue on
> >> the production VM again, but debugging should not take more than a
> >> few 
> >> minutes if it happens.  Access is only possible via the console.
> >> Neither Dom0 nor the guest show anything unusual in the kernel
> >> message 
> >> and continue to behave normally after the network goes dead (also
> >> able 
> >> to shut down the guest normally).
> >> 
> >> Thanks,
> >>    Christophe
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] new netfront and occasional receive path lockup