Addendum:
The Dells are actually R715.
The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen
Cheers,
Pim
On Feb 8, 2011, at 13:22 , Pim van Riezen wrote:
> Good day,
>
> In a scenario where we saw several dom0 nodes fall down due to a sustained
> SYN flood to a network range, we have been investigating issues with Xen
> under high network load. The results so far seem to be not so pretty. We
> recreated a lab setup that can reproduce the scenario with some reliability,
> although it takes a bit of trial-and-error to get crashes out of it.
>
> SETUP:
> 2x Dell R710
> - 4x 6core AMD Opteron 6174
> - 128GB memory
> - Broadcom BCM5709
> - LSI SAS2008 rev.02
> - Emulex Saturn-X FC adapter
> - CentOS 5.5 w/ gitco Xen 4.0.1
>
> 1x NexSan SATABeast FC raid
> 1x Brocade FC switch
> 5x Flood sources (Dell R210)
>
> The dom0 machines are loaded with 50 PV images, coupled to a LVM partition on
> FC, half of which are set to start compiling a kernel in rc.local. There are
> also 2 HVM images on both machines doing the same.
>
> Networking for all guests is configured in the bridging setup, attached to a
> specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, née
> xenbr0.86.
>
> Grub conf for the dom0s:
>
> kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off
> module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline
> xencons=tty
>
> The flooding is always done to either the entire IP range the guests live in
> (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP
> floods), with random source addresses.
>
> ISSUE:
> When the pps rate gets into the insane territory (gigabit link saturated or
> near-saturated), the machine seems to start losing track of interrupts.
> Depending on the severity, this leads to CPU soft lockups on random cores.
> Under more dire circumstances, other hardware attached to the PCI bus starts
> timing out making the kernel lose track of storage. Usually the
> SAS-controller is the first to go, but I've also seen timeouts on the FC
> controller.
>
> THINGS TRIED:
> 1. Raising the broadcom RX ring from 255 to 3000. No noticable effects.
> 2. Downgrading to Xen 3.4.3. No effect.
> 3. Different Dell BIOS versions. No effect.
> 4. Lowering number of guests -> effects get less serious. Not a serious
> option.
> 5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less
> serious when dealing with tcp SYN attacks. No effect when dealing with 28byte
> UDP attacks.
> 6. Disabling HPET as per
> http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html with
> cpuidle=0 and disabling irqbalance -> effects get less serious.
>
> The changes in 6 stop the machine from completely crapping itself, but I
> still get soft lockups, although they seem to be limited to one of these two
> paths:
>
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff80274688>] smp_call_function+0x4e/0x5e
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a
> [<ffffffff802d7428>] kill_bdev+0x1b/0x30
> [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169
> [<ffffffff80213492>] __fput+0xd3/0x1bd
> [<ffffffff802243cb>] filp_close+0x5c/0x64
> [<ffffffff8021e5d0>] sys_close+0x88/0xbd
> [<ffffffff802602f9>] tracesys+0xab/0xb6
>
> and
>
> [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8
> [<ffffffff8026ca88>] xen_idle+0x38/0x4a
> [<ffffffff8024af6c>] cpu_idle+0x97/0xba
> [<ffffffff8064eb0f>] start_kernel+0x21f/0x224
> [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb
>
> In some scenarios, an application running on the dom0 that relies on
> pthread_cond_timedwait seems to be hanging in all its thread on that specific
> call. This may be related to some timing going wonky during the attack, not
> sure.
>
> Is there anything more we can try?
>
> Cheers,
> Pim van Riezen
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|