This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU cau

To: "Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>
Subject: Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU causes OOPS and instability
From: Scott Garron <xen-devel@xxxxxxxxxxxxxxxxxx>
Date: Tue, 31 Aug 2010 04:16:09 -0400
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Delivery-date: Tue, 31 Aug 2010 01:17:09 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <D5AB6E638E5A3E4B8F4406B113A5A19A2A4D1D5B@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C7864BB.1010808@xxxxxxxxxxxxxxxxxx> <4C7BE1C6.5030602@xxxxxxxx> <D5AB6E638E5A3E4B8F4406B113A5A19A2A4D1D5B@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20100802 Thunderbird/3.1.2
Scott Garron wrote:
Another issue that comes up is that if I run the pvops
kernel for my Linux domUs, after a time (usually only about an
hour or so), the network interfaces stop responding.

Jeremy Fitzhardinge wrote:
That's a separate problem in netfront that appears to be a bug in
the "smartpoll" code.  I think Dongxiao is looking into it.

On 8/31/2010 2:59 AM, Xu, Dongxiao wrote:
Yes, I tried to reproduce these days, however I could catch it
locally. I tried both netperf and ping for a long time, but the bug
is not triggered. What workload are you using when met the bug?

     I'd say that the whole machine is under moderate to high
utilization because it has 10 virtual machines running - three of which
are Windows 2008 Servers as HVM guests.  However, as far as the "load"
goes, most of the virtual machines are fairly idle and probably not
under much stress, overall.  Just to give you an idea, we have a
10Mbit/s connection to the Internet, and this server's physical network
interface (all 10 of the domUs' traffic, combined) usually accounts for
less than 2Mbit/s of the outbound traffic at any given point in the day.
 Aside from Windows being Windows (the HVM guests are running graphical
desktops), I wouldn't say that any of them cause a high CPU load,
either.  Database load is fairly low to moderate on guests running MySQL
and/or PostgreSQL.  The only guest that seems to use more CPU and
RAM is one serving e-mail, and that's because it runs ClamAV and
SpamAssassin.  That e-mail server was one that kept its network
connectivity the longest, though (after a few hours, it did stop
responding, but that was after some guests with lighter loads stopped

     An observation that I made, and it may just be coincidental,
but at least noteworthy, is that the virtual machines that are assigned
less RAM seem to lose connectivity more quickly than those with more
RAM.  The most recent time that I was able to trigger the bug, the
virtual machine that lost connectivity was only assigned 384MB RAM,
running  At the time, the rest of my paravirtualized guests
were running, and they didn't experience the problem.

     I've previously triggered the bug in multiple domUs that were
running a more recent kernel (I think it was - before I
reverted to a netback-patched kernel), and the first ones to
disappear from the network were ones that were only assigned 256MB.
Eventually, they all disappeared, though.  The only "load" on one of the
first to disappear is an installation of bind9, servicing about 50
domain names - none of which receive an abnormally high hit count.

     The first time I noticed the problem, I had started 7
paravirtualized guests, of varying memory assignments.  The moment I
started the 8th guest, an HVM Windows 2008 Server, the networking on all
of the running of the guests (the paravirt ones) stopped responding at
the same time.  That may also be something to try/look at.

     After a reboot, I avoided starting any of the HVM guests, and the
connectivity lasted a couple of hours on the 7 running paravirt guests,
but started disappearing one guest at a time, over the course of the
next few hours.

     I didn't mention in my previous e-mail that in order to get
networking to work in a stable fashion in the kernel (the one
I reverted to), I had to apply the patch mentioned here:
Otherwise, networking became unstable immediately at the time of guest
creation.  That patch was already applied to the kernel that
is giving me the eventual network loss problems, though.

     More specifics about my configuration can be found here:

Scott Garron

Xen-devel mailing list