This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] tg3 network stall in xen-3.4.x but not in xen-3.3.x

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] tg3 network stall in xen-3.4.x but not in xen-3.3.x
From: Teck Choon Giam <giamteckchoon@xxxxxxxxx>
Date: Sat, 4 Jul 2009 14:32:26 +0800
Delivery-date: Fri, 03 Jul 2009 23:32:47 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type:content-transfer-encoding; bh=ro4LAJYdTtY+z2nTqGfPOaswpNZf1lGxpSibXcfCGB0=; b=seduj0Fmt5YiLYreeO3UPr27c+5xlCIaUoJKhZgoFt3WeXWUe9K7tOpjIfJGUSYVUb TNWGBRjsXs47GNtJh06CjljKlKY1sf3xpHKmRt+AJOP8I84wfrNL2cUrWM231g54Ldjo 62yUm76+Ay7O5+ML/jmUqt4za2b+oR2oVBJN4=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=rdUsXpDqppTgPpPBahhFO4efCbSP55/rQUA5knsyYf7vHOSoDjKjmpGxN7B13YYQzp JhBtSAfLMbkddqt5ptCg9xDOTRFAVMiJ07BnxX2VcJqCA13gdkNJ0QJx9jiSZXo84nvg u2ZI6no5ANw8+DEjr25RqKzuxcWZIaDyGAQ2M=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

I have experienced network stall when running in xen 3.4.x on all DELL
PE850/860/R200 which are using onboard broadcom driver tg3 driver.

I have done some testing on both xen-3.3.2-rc3 and xen-3.4.1-rc5 with
linux-2.6.18-xen.hg changeset 913 which in domU doing scp transfer of
couple 1MB/10MB/100MB files to another server in few instances
concurrently.  Within an hour the network will stall in xen-3.4.1-rc5
but not in xen-3.3.2-rc3.  ifconfig, route -n and ip link show normal
but unable to ping gateway.  Sometimes, doing the following (in
crontab using custom script to check ping gateway and if 100% packet
lost will execute the below can bring back the network but not always
and needed a reboot):

1. xm shutdown all domUs
2. service xendomains stop
3. stop network-bridge
4. service xend stop
5. service xend start
6. xm create all domUs

However the above might cause some domU ext3 file system dirty and
e2fsck is required.

I have done many tests (at least more than 5 times on 3 DELL PE850/860
servers) and the results are the same.  With xen-3.3.2-rc3 no issue
and network will not be down/stalled doing the scp transfer test to
other server.  Whereby with xen-3.4.1-rc5, it will happen within an
hour if such test are carried out at least 5 instances running
concurrently.  In fact from xen-3.4.0 to xen-3.4.1-rc1 to rc5 are the

/var/log/messages will show the following when network stall:
tg3: peth0: transmit timed out, resetting

I have tried:
/sbin/ethtool -K eth0 tx off
/sbin/ethtool -K eth0 rx off
/sbin/ethtool -K eth0 gso off
/sbin/ethtool -K eth0 tso off

Is there any netfront/netback changes between xen-3.3.x and xen-3.4.x
which cause such issue?  Anybody experience such network stall in your
tg3 in bridge network environment?

The above test also carried out in non tg3 servers such as with
e100/e1000 drivers do not cause such network stall problem.

All servers are running CentOS 5.3 with linux- for all
dom0s and domUs.

Any idea?


Kindest regards,
Giam Teck Choon

Xen-devel mailing list