On 12/14/2010 02:12 AM, John Weekes wrote:
> I tested further and found that:
>
> * dom0 does't have the issue, normal PV domains do not have the issue,
> and Windows GPLPV-based domains do not have the issue. It seems to be
> specific to stubdom-based domains.
That's interesting. There were a number of fixes to netfront/back to
make sure all this checksum offload stuff worked properly, and I was
never convinced they were also ported to stubdom's netfront. I don't
remember the specifics now, unfortunately.
J
>
> * Other machines running the exact same Xen release and kernel
> version, but that use the e1000 driver instead of the igb driver,
> don't seem to have the problem. I don't know if it's related (I have
> not yet been able to test with MSI disabled), but those machines
> without the problem also aren't using MSI-X, whereas the igb-based
> machine that shows the problem is. From dmesg:
>
> [ 21.209923] Intel(R) Gigabit Ethernet Network Driver - version
> 1.3.16-k2
> [ 21.210026] Copyright (c) 2007-2009 Intel Corporation.
> [ 21.210140] xen: registering gsi 28 triggering 0 polarity 1
> [ 21.210145] xen: --> irq=28
> [ 21.210151] igb 0000:01:00.0: PCI INT A -> GSI 28 (level, low) ->
> IRQ 28
> [ 21.210279] igb 0000:01:00.0: setting latency timer to 64
> [ 21.382336] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network
> Connection
> [ 21.382435] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4)
> 00:25:90:09:e4:00
> [ 21.382605] igb 0000:01:00.0: eth0: PBA No: ffffff-0ff
> [ 21.382698] igb 0000:01:00.0: Using MSI-X interrupts. 4 rx
> queue(s), 4 tx queue(s)
>
> (Both the e1000 and igb machines have the hvm_directio flag in the "xl
> info" output.)
>
> * Different GSO/TSO settings do not appear to make a difference. Only
> the tx offload setting does.
>
> * Inside the problematic domU, the bad segment counter increments when
> the issue is occurring:
>
> testvds5 ~ # netstat -s eth0
> Ip:
> 22162 total packets received
> 44 with invalid addresses
> 0 forwarded
> 0 incoming packets discarded
> 22113 incoming packets delivered
> 19582 requests sent out
> Icmp:
> 2694 ICMP messages received
> 0 input ICMP message failed.
> ICMP input histogram:
> timeout in transit: 2447
> echo replies: 247
> 2698 ICMP messages sent
> 0 ICMP messages failed
> ICMP output histogram:
> destination unreachable: 2
> IcmpMsg:
> InType0: 247
> InType11: 2447
> OutType3: 2
> OutType69: 2696
> Tcp:
> 4 active connections openings
> 3 passive connection openings
> 0 failed connection attempts
> 0 connection resets received
> 3 connections established
> 18819 segments received
> 16795 segments send out
> 0 segments retransmited
> 2366 bad segments received.
> 8 resets sent
> Udp:
> 65 packets received
> 2 packets to unknown port received.
> 0 packet receive errors
> 89 packets sent
> UdpLite:
> TcpExt:
> 1 TCP sockets finished time wait in fast timer
> 172 delayed acks sent
> Quick ack mode was activated 89 times
> 3 packets directly queued to recvmsg prequeue.
> 33304 bytes directly in process context from backlog
> 3 bytes directly received in process context from prequeue
> 7236 packet headers predicted
> 23 packets header predicted and directly queued to user
> 3117 acknowledgments not containing data payload received
> 89 DSACKs sent for old packets
> 2 DSACKs sent for out of order packets
> 2 connections reset due to unexpected data
> IpExt:
> InBcastPkts: 533
> InOctets: 23420805
> OutOctets: 1601733
> InBcastOctets: 162268
> testvds5 ~ #
>
> * Some sites transfer quickly to the domU quickly regardless of the tx
> offload setting, exhibiting the symptoms less. For instance, uiuc.edu
> with tx on:
>
> root@testvds5:~# wget
> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
> --2010-12-14 03:53:50--
> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
> Resolving gentoo.cites.uiuc.edu... 128.174.5.78
> Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 2798649344 (2.6G) [text/plain]
> Saving to: `livedvd-amd64-multilib-10.1.iso'
>
> 0% [ ] 25,754,272 3.06M/s eta
> 17m 7s ^C
> root@testvds5:~#
>
> (netstat shows 23 bad segments received over the length of that test)
>
> and with tx off:
>
> root@testvds5:~# wget
> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
> --2010-12-14 03:54:45--
> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
> Resolving gentoo.cites.uiuc.edu... 128.174.5.78
> Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 2798649344 (2.6G) [text/plain]
> Saving to: `livedvd-amd64-multilib-10.1.iso.1'
>
> 1% [ ] 47,677,960 3.95M/s eta
> 12m 0s ^C
>
> * The issue also occurs in xen-4.0-testing, as of c/s 21392.
>
> For reference, Xen and kernel version output:
>
> nyc-dodec266 src # xl info
> host : nyc-dodec266
> release : 2.6.32.26-g862ef97
> version : #4 SMP Wed Dec 8 16:38:18 EST 2010
> machine : x86_64
> nr_cpus : 24
> nr_nodes : 2
> cores_per_socket : 12
> threads_per_core : 1
> cpu_mhz : 2674
> hw_caps :
> bfebfbff:2c100800:00000000:00003f40:029ee3ff:00000000:00000001:00000000
> virt_caps : hvm hvm_directio
> total_memory : 49143
> free_memory : 9178
> free_cpus : 0
> xen_major : 4
> xen_minor : 1
> xen_extra : -unstable
> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler : credit
> xen_pagesize : 4096
> platform_params : virt_start=0xffff800000000000
> xen_changeset : Wed Dec 08 10:46:31 2010 +0000
> 22467:89116f28083f
> xen_commandline : dom0_mem=2550M dom0_max_vcpus=4
> cc_compiler : gcc version 4.4.4 (Gentoo 4.4.4-r2 p1.2,
> pie-0.4.5)
> cc_compile_by : root
> cc_compile_domain : nuclearfallout.net
> cc_compile_date : Fri Dec 10 00:51:50 EST 2010
> xend_config_format : 4
> nyc-dodec266 src # uname -a
> Linux nyc-dodec266 2.6.32.26-g862ef97 #4 SMP Wed Dec 8 16:38:18 EST
> 2010 x86_64 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux
>
> For now, I can use the "tx off" workaround by having a script set it
> for all newly created domains. Is anyone up for nailing this down and
> finding a real fix? Failing that, applying the workaround in the Xen
> tools might be a good idea.
>
> -John
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|