WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
From: Pim van Riezen <pi+lists@xxxxxxxxxxxx>
Date: Tue, 8 Feb 2011 13:39:06 +0100
Delivery-date: Tue, 08 Feb 2011 04:40:01 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <A380BBA2-B226-4747-A35D-901318490782@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <A380BBA2-B226-4747-A35D-901318490782@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Addendum:

        The Dells are actually R715.
        The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen

Cheers,
Pim

On Feb 8, 2011, at 13:22 , Pim van Riezen wrote:

> Good day,
> 
> In a scenario where we saw several dom0 nodes fall down due to a sustained 
> SYN flood to a network range, we have been investigating issues with Xen 
> under high network load. The results so far seem to be not so pretty. We 
> recreated a lab setup that can reproduce the scenario with some reliability, 
> although it takes a bit of trial-and-error to get crashes out of it.
> 
> SETUP:
> 2x Dell R710
>       - 4x 6core AMD Opteron 6174
>       - 128GB memory
>       - Broadcom BCM5709
>       - LSI SAS2008 rev.02
>       - Emulex Saturn-X FC adapter
>       - CentOS 5.5 w/ gitco Xen 4.0.1
> 
> 1x NexSan SATABeast FC raid
> 1x Brocade FC switch
> 5x Flood sources (Dell R210)
> 
> The dom0 machines are loaded with 50 PV images, coupled to a LVM partition on 
> FC, half of which are set to start compiling a kernel in rc.local. There are 
> also 2 HVM images on both machines doing the same.
> 
> Networking for all guests is configured in the bridging setup, attached to a 
> specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, née 
> xenbr0.86.
> 
> Grub conf for the dom0s:
> 
>       kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off
>       module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline
> xencons=tty
> 
> The flooding is always done to either the entire IP range the guests live in 
> (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP 
> floods), with random source addresses.
> 
> ISSUE:
> When the pps rate gets into the insane territory (gigabit link saturated or 
> near-saturated), the machine seems to start losing track of interrupts. 
> Depending on the severity, this leads to CPU soft lockups on random cores. 
> Under more dire circumstances, other hardware attached to the PCI bus starts 
> timing out making the kernel lose track of storage. Usually the 
> SAS-controller is the first to go, but I've also seen timeouts on the FC 
> controller.
> 
> THINGS TRIED:
> 1. Raising the broadcom RX ring from 255 to 3000. No noticable effects.
> 2. Downgrading to Xen 3.4.3. No effect.
> 3. Different Dell BIOS versions. No effect.
> 4. Lowering number of guests -> effects get less serious. Not a serious 
> option.
> 5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less 
> serious when dealing with tcp SYN attacks. No effect when dealing with 28byte 
> UDP attacks.
> 6. Disabling HPET as per 
> http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html with 
> cpuidle=0 and disabling irqbalance -> effects get less serious.
> 
> The changes in 6 stop the machine from completely crapping itself, but I 
> still get soft lockups, although they seem to be limited to one of these two 
> paths:
> 
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff80274688>] smp_call_function+0x4e/0x5e
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a
> [<ffffffff802d7428>] kill_bdev+0x1b/0x30
> [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169
> [<ffffffff80213492>] __fput+0xd3/0x1bd
> [<ffffffff802243cb>] filp_close+0x5c/0x64
> [<ffffffff8021e5d0>] sys_close+0x88/0xbd
> [<ffffffff802602f9>] tracesys+0xab/0xb6
> 
> and
> 
> [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8
> [<ffffffff8026ca88>] xen_idle+0x38/0x4a
> [<ffffffff8024af6c>] cpu_idle+0x97/0xba
> [<ffffffff8064eb0f>] start_kernel+0x21f/0x224
> [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb
> 
> In some scenarios, an application running on the dom0 that relies on 
> pthread_cond_timedwait seems to be hanging in all its thread on that specific 
> call. This may be related to some timing going wonky during the attack, not 
> sure.
> 
> Is there anything more we can try?
> 
> Cheers,
> Pim van Riezen
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel