This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] network misbehaviour with gplpv and 2.6.30

To: Paul Durrant <paul.durrant@xxxxxxxxxx>
Subject: Re: [Xen-devel] network misbehaviour with gplpv and 2.6.30
From: Andrew Lyon <andrew.lyon@xxxxxxxxx>
Date: Wed, 29 Jul 2009 10:48:17 +0100
Cc: James Harper <james.harper@xxxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 29 Jul 2009 02:48:43 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=F8BWpOy7spfnmYpwwOGPfhBqz0JY0Nin922ZDOqqXoI=; b=K8bG1RzwsK+3nNBJPfi1ZpsKu8ckZbgVycfEylWvbVktkyHJV2jSMkGbre+YVwL3/2 XD9fgXdnnoVbht5nurtpRS2xLjr8RKlyhpTNACA3bEyec5FHKaxa0t9BwvR1WA74DH7G xMg/JULHsYzTul84wA6F60xf1KI2o8/5piY64=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=ZTeYTpU/M2aBGm3q6bd3EN2quCijA7Tw6Lzwybh2c9OanyXR4FXMHjmaRSJiO0OjnE j0xxNSArT2K678H9I92h5xy94uqUF9887/V33dU7E1pWVOaZHgjc4TgJ6Sf/fyYJz3Ir o0wkEkf9FMYtAmTQzw9b9aoIEwJoUw+7tFIOs=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A6594DD.9010808@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AEC6C66638C05B468B556EA548C1A77D016DDD27@trantor> <4A658BE6.4010803@xxxxxxxxxx> <AEC6C66638C05B468B556EA548C1A77D016DDD99@trantor> <4A6594DD.9010808@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Tue, Jul 21, 2009 at 11:13 AM, Paul Durrant<paul.durrant@xxxxxxxxxx> wrote:
> James Harper wrote:
>>> Are you saying that ring slot n
>>> has only NETRXF_extra_info and *not* NETRXF_more_data?
>> Yes. From the debug I have received from Andrew Lyon, NETRXF_more_data
>> is _never_ set.
>> From what Andrew tells me (and it's not unlikely that I misunderstood),
>> the packets in question come from a physical machine external to the
>> machine running xen. I can't quite understand how that could be as they
>> are 'large' packets (>1514 byte total packet length) which should only
>> be locally originated. Unless he's running with jumbo frames (are you
>> Andrew?).
> It's not unusual for h/w drivers to support 'LRO', i.e. they re-assemble
> consecutive in-order TCP segments into a large packet before passing up the
> stack. I believe that these would manifest themselves as TSOs coming into
> the transmit side of netback, just as locally originated large packets
> would.
>> I've asked for some more debug info but he's in a different timezone to
>> me and probably isn't awake yet. I'm less and less inclined to think
>> that this is actually a problem with GPLPV and more a problem with
>> netback (or a physical network driver) in 2.6.30, but a tcpdump in Dom0,
>> HVM without GPLPV and maybe in a Linux DomU should tell us more.
> Yes, a tcpdump of what's being passed into netback in dom0 should tell us
> what's happening.
>  Paul

I did more testing including running various wireshark captures which
James looked at, the problem is not the gplpv drivers as it also
affects the linux pv netfront driver, it seems to be a dom0 problem,
packets arrive with frame.len < 72 but ip.len > 72 which of course
causes terrible throughput in domU networking, and also crashed the
gplpv drivers until James added a check for the condition (see
now it triggers a warning message, for example:

XenNet     XN_HDR_SIZE + ip4_length (2974) > total_length (54)

Yesterday I noticed something quite interesting, if I switch off
receive checksum offloading on the dom0 nic (ethtool -K peth0 rx off)
the network performance in domU is much improved, but something is
still wrong because some network performance tests are still very
slow, and a different warning message is triggered in the Xennet

XenNet     Size Mismatch 54 (ip4_length + XN_HDR_SIZE) != 60 (total_length)

Now the really strange thing is that if I re-enable rx checksum
offload (ethtool -K peth0 rx on) everything works perfectly,
networking throughput is the same as with 2.6.29 and no warning
messages are triggered in the Xennet driver.

The dom0 NIC is a 82575EB, I have tried using both the 1.3.16-k2
driver which is included in 2.6.30, and the which I
downloaded from Intel's support site, I will try another nic if I can
find one.

I don't understand how toggling rx offload off and on can fix the
problem but it does.


Xen-devel mailing list