WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Xen3.3 / Xen3.4 CPU soft lockups under pvops 2.6.31/2.6.

To: Pim van Riezen <pi+lists@xxxxxxxxxxxx>
Subject: Re: [Xen-users] Xen3.3 / Xen3.4 CPU soft lockups under pvops 2.6.31/2.6.32
From: Yann Cezard <yann.cezard@xxxxxxxxxxx>
Date: Thu, 15 Apr 2010 09:33:21 +0200
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 15 Apr 2010 00:35:00 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1ADBF117-435F-4191-AB36-B43A373B4CE9@xxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Organization: CRI UPPA
References: <1ADBF117-435F-4191-AB36-B43A373B4CE9@xxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla-Thunderbird 2.0.0.22 (X11/20091109)
Pim van Riezen a écrit :
> Good day,
>
> We're trying to get 2.6.31 and 2.6.32 rolled out on our clusters to offer 
> newer features like FUSE fo our customers, but we're ran into a couple of 
> showstopper issues when deploying these kernels on busier guests, showing a 
> lot of errors like this:
>
>   BUG: soft lockup - CPU#0 stuck for 561s! [swapper:0]
>   Modules linked in:
>   CPU 0:
>   Modules linked in:
>   Pid: 0, comm: swapper Not tainted 2.6.32.9xls-domU #2 
>   RIP: e030:[<ffffffff810093aa>]  [<ffffffff810093aa>] 
> hypercall_page+0x3aa/0x1001
>   RSP: e02b:ffffffff81691f70  EFLAGS: 00000246
>   RAX: 0000000000000000 RBX: ffffffff81690000 RCX: ffffffff810093aa
>   RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
>   RBP: ffffffff81896d30 R08: 0000000000000000 R09: ffffffff8100e3b2
>   R10: 0000000000000001 R11: 0000000000000246 R12: ffffffffffffffff
>   R13: ffffffff818ebf20 R14: ffffffff818eec70 R15: 0000000000000000
>   FS:  00007f8ac7a9c6e0(0000) GS:ffff8800022ac000(0000) knlGS:0000000000000000
>   CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>   CR2: 00007f291b4c9000 CR3: 000000007d8c1000 CR4: 0000000000002660
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>   Call Trace:
>    [<ffffffff8100ddb7>] ? xen_safe_halt+0xc/0x15
>    [<ffffffff8100bdcf>] ? xen_idle+0x37/0x40
>    [<ffffffff8100fe2e>] ? cpu_idle+0x4f/0x82
>    [<ffffffff818b6c42>] ? start_kernel+0x353/0x35f
>
> in our hope to get rid of this issue we upgraded from Xen 3.3 to Xen 3.4.1.7 
> out of the gitco repos. The issue persisted. Is there a magic version of Xen, 
> preferably one that can be found in an rpm repository for CentOS 5, that 
> *does* properly support pvops kernels without these issues?
>
> I'm also seing this one: 
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=543 where the bug is 
> still open but no activity since 2008. I don't know if that bugzilla is still 
> being actively maintained?
>
> Cheers,
> Pim
>
>
>   
Hi Pim,

I am having similar issues, with differents versions of Xen hypervisor
(3.2, 3.4) and
with domU kernels >= 2.6.26 (until 2.6.32-4 from Debian), always on the
same 2
or 3 VMs that are frequently under heavy load.

After searching a lot, I thought that my CPU softlock problems (which
sometimes
make my VMs freezing) was perhaps related to the xen clocksource, so I
decided
to give a try to this :
   
http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.27

Using jiffies + independant wallclock + ntp in domU seems to have stop
the CPU
softlock error messages in kernel messages (at least I didn't have any
since I use
it, but it's only for 2 days...). Now I am crossing my fingers... :-)

I also read in xen-devel that you are using FC LUNs for storage, I also
use that,
perhaps you will want to have a look at the "Interrupt handling in Xen"
message
that was post on this list yesterday, by defaults my domain-0 was doing
all its
interrupts (network and HBA) on the same CPU, which is probably some kind
of bottleneck under heavy load.

Cheers,

-- 
Yann Cézard - Administrateur Systèmes Serveurs
Centre de Ressources Informatiques    -    http://cri.univ-pau.fr
Université de Pau et des Pays de l'Adour - http://www.univ-pau.fr
Bat IFR, rue Jules Ferry, 64000 PAU - Tél.:  +33 (0)5 59 40 77 94


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users