On Tue, Jul 21, 2009 at 11:13 AM, Paul Durrant<paul.durrant@xxxxxxxxxx> wrote:
> James Harper wrote:
>>
>>> Are you saying that ring slot n
>>> has only NETRXF_extra_info and *not* NETRXF_more_data?
>>>
>>
>> Yes. From the debug I have received from Andrew Lyon, NETRXF_more_data
>> is _never_ set.
>>
>> From what Andrew tells me (and it's not unlikely that I misunderstood),
>> the packets in question come from a physical machine external to the
>> machine running xen. I can't quite understand how that could be as they
>> are 'large' packets (>1514 byte total packet length) which should only
>> be locally originated. Unless he's running with jumbo frames (are you
>> Andrew?).
>>
>
> It's not unusual for h/w drivers to support 'LRO', i.e. they re-assemble
> consecutive in-order TCP segments into a large packet before passing up the
> stack. I believe that these would manifest themselves as TSOs coming into
> the transmit side of netback, just as locally originated large packets
> would.
>
>> I've asked for some more debug info but he's in a different timezone to
>> me and probably isn't awake yet. I'm less and less inclined to think
>> that this is actually a problem with GPLPV and more a problem with
>> netback (or a physical network driver) in 2.6.30, but a tcpdump in Dom0,
>> HVM without GPLPV and maybe in a Linux DomU should tell us more.
>>
>
> Yes, a tcpdump of what's being passed into netback in dom0 should tell us
> what's happening.
>
> Paul
>
I did more testing including running various wireshark captures which
James looked at, the problem is not the gplpv drivers as it also
affects the linux pv netfront driver, it seems to be a dom0 problem,
packets arrive with frame.len < 72 but ip.len > 72 which of course
causes terrible throughput in domU networking, and also crashed the
gplpv drivers until James added a check for the condition (see
http://xenbits.xensource.com/ext/win-pvdrivers.hg?rev/0436238bcda5),
now it triggers a warning message, for example:
XenNet XN_HDR_SIZE + ip4_length (2974) > total_length (54)
Yesterday I noticed something quite interesting, if I switch off
receive checksum offloading on the dom0 nic (ethtool -K peth0 rx off)
the network performance in domU is much improved, but something is
still wrong because some network performance tests are still very
slow, and a different warning message is triggered in the Xennet
driver:
XenNet Size Mismatch 54 (ip4_length + XN_HDR_SIZE) != 60 (total_length)
Now the really strange thing is that if I re-enable rx checksum
offload (ethtool -K peth0 rx on) everything works perfectly,
networking throughput is the same as with 2.6.29 and no warning
messages are triggered in the Xennet driver.
The dom0 NIC is a 82575EB, I have tried using both the 1.3.16-k2
driver which is included in 2.6.30, and the 1.3.19.3 which I
downloaded from Intel's support site, I will try another nic if I can
find one.
I don't understand how toggling rx offload off and on can fix the
problem but it does.
Andy
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|