(Added Stefano.)
On 12/14/2010 01:59 PM, Jeremy Fitzhardinge wrote:
> On 12/14/2010 02:12 AM, John Weekes wrote:
>> I tested further and found that:
>>
>> * dom0 does't have the issue, normal PV domains do not have the issue,
>> and Windows GPLPV-based domains do not have the issue. It seems to be
>> specific to stubdom-based domains.
> That's interesting. There were a number of fixes to netfront/back to
> make sure all this checksum offload stuff worked properly, and I was
> never convinced they were also ported to stubdom's netfront. I don't
> remember the specifics now, unfortunately.
>
> J
>
>> * Other machines running the exact same Xen release and kernel
>> version, but that use the e1000 driver instead of the igb driver,
>> don't seem to have the problem. I don't know if it's related (I have
>> not yet been able to test with MSI disabled), but those machines
>> without the problem also aren't using MSI-X, whereas the igb-based
>> machine that shows the problem is. From dmesg:
>>
>> [ 21.209923] Intel(R) Gigabit Ethernet Network Driver - version
>> 1.3.16-k2
>> [ 21.210026] Copyright (c) 2007-2009 Intel Corporation.
>> [ 21.210140] xen: registering gsi 28 triggering 0 polarity 1
>> [ 21.210145] xen: --> irq=28
>> [ 21.210151] igb 0000:01:00.0: PCI INT A -> GSI 28 (level, low) ->
>> IRQ 28
>> [ 21.210279] igb 0000:01:00.0: setting latency timer to 64
>> [ 21.382336] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network
>> Connection
>> [ 21.382435] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4)
>> 00:25:90:09:e4:00
>> [ 21.382605] igb 0000:01:00.0: eth0: PBA No: ffffff-0ff
>> [ 21.382698] igb 0000:01:00.0: Using MSI-X interrupts. 4 rx
>> queue(s), 4 tx queue(s)
>>
>> (Both the e1000 and igb machines have the hvm_directio flag in the "xl
>> info" output.)
>>
>> * Different GSO/TSO settings do not appear to make a difference. Only
>> the tx offload setting does.
>>
>> * Inside the problematic domU, the bad segment counter increments when
>> the issue is occurring:
>>
>> testvds5 ~ # netstat -s eth0
>> Ip:
>> 22162 total packets received
>> 44 with invalid addresses
>> 0 forwarded
>> 0 incoming packets discarded
>> 22113 incoming packets delivered
>> 19582 requests sent out
>> Icmp:
>> 2694 ICMP messages received
>> 0 input ICMP message failed.
>> ICMP input histogram:
>> timeout in transit: 2447
>> echo replies: 247
>> 2698 ICMP messages sent
>> 0 ICMP messages failed
>> ICMP output histogram:
>> destination unreachable: 2
>> IcmpMsg:
>> InType0: 247
>> InType11: 2447
>> OutType3: 2
>> OutType69: 2696
>> Tcp:
>> 4 active connections openings
>> 3 passive connection openings
>> 0 failed connection attempts
>> 0 connection resets received
>> 3 connections established
>> 18819 segments received
>> 16795 segments send out
>> 0 segments retransmited
>> 2366 bad segments received.
>> 8 resets sent
>> Udp:
>> 65 packets received
>> 2 packets to unknown port received.
>> 0 packet receive errors
>> 89 packets sent
>> UdpLite:
>> TcpExt:
>> 1 TCP sockets finished time wait in fast timer
>> 172 delayed acks sent
>> Quick ack mode was activated 89 times
>> 3 packets directly queued to recvmsg prequeue.
>> 33304 bytes directly in process context from backlog
>> 3 bytes directly received in process context from prequeue
>> 7236 packet headers predicted
>> 23 packets header predicted and directly queued to user
>> 3117 acknowledgments not containing data payload received
>> 89 DSACKs sent for old packets
>> 2 DSACKs sent for out of order packets
>> 2 connections reset due to unexpected data
>> IpExt:
>> InBcastPkts: 533
>> InOctets: 23420805
>> OutOctets: 1601733
>> InBcastOctets: 162268
>> testvds5 ~ #
>>
>> * Some sites transfer quickly to the domU quickly regardless of the tx
>> offload setting, exhibiting the symptoms less. For instance, uiuc.edu
>> with tx on:
>>
>> root@testvds5:~# wget
>> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
>> --2010-12-14 03:53:50--
>> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
>> Resolving gentoo.cites.uiuc.edu... 128.174.5.78
>> Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected.
>> HTTP request sent, awaiting response... 200 OK
>> Length: 2798649344 (2.6G) [text/plain]
>> Saving to: `livedvd-amd64-multilib-10.1.iso'
>>
>> 0% [ ] 25,754,272 3.06M/s eta
>> 17m 7s ^C
>> root@testvds5:~#
>>
>> (netstat shows 23 bad segments received over the length of that test)
>>
>> and with tx off:
>>
>> root@testvds5:~# wget
>> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
>> --2010-12-14 03:54:45--
>> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
>> Resolving gentoo.cites.uiuc.edu... 128.174.5.78
>> Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected.
>> HTTP request sent, awaiting response... 200 OK
>> Length: 2798649344 (2.6G) [text/plain]
>> Saving to: `livedvd-amd64-multilib-10.1.iso.1'
>>
>> 1% [ ] 47,677,960 3.95M/s eta
>> 12m 0s ^C
>>
>> * The issue also occurs in xen-4.0-testing, as of c/s 21392.
>>
>> For reference, Xen and kernel version output:
>>
>> nyc-dodec266 src # xl info
>> host : nyc-dodec266
>> release : 2.6.32.26-g862ef97
>> version : #4 SMP Wed Dec 8 16:38:18 EST 2010
>> machine : x86_64
>> nr_cpus : 24
>> nr_nodes : 2
>> cores_per_socket : 12
>> threads_per_core : 1
>> cpu_mhz : 2674
>> hw_caps :
>> bfebfbff:2c100800:00000000:00003f40:029ee3ff:00000000:00000001:00000000
>> virt_caps : hvm hvm_directio
>> total_memory : 49143
>> free_memory : 9178
>> free_cpus : 0
>> xen_major : 4
>> xen_minor : 1
>> xen_extra : -unstable
>> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
>> hvm-3.0-x86_32p hvm-3.0-x86_64
>> xen_scheduler : credit
>> xen_pagesize : 4096
>> platform_params : virt_start=0xffff800000000000
>> xen_changeset : Wed Dec 08 10:46:31 2010 +0000
>> 22467:89116f28083f
>> xen_commandline : dom0_mem=2550M dom0_max_vcpus=4
>> cc_compiler : gcc version 4.4.4 (Gentoo 4.4.4-r2 p1.2,
>> pie-0.4.5)
>> cc_compile_by : root
>> cc_compile_domain : nuclearfallout.net
>> cc_compile_date : Fri Dec 10 00:51:50 EST 2010
>> xend_config_format : 4
>> nyc-dodec266 src # uname -a
>> Linux nyc-dodec266 2.6.32.26-g862ef97 #4 SMP Wed Dec 8 16:38:18 EST
>> 2010 x86_64 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux
>>
>> For now, I can use the "tx off" workaround by having a script set it
>> for all newly created domains. Is anyone up for nailing this down and
>> finding a real fix? Failing that, applying the workaround in the Xen
>> tools might be a good idea.
>>
>> -John
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|