Pim van Riezen a écrit :
> Good day,
>
> We're trying to get 2.6.31 and 2.6.32 rolled out on our clusters to offer
> newer features like FUSE fo our customers, but we're ran into a couple of
> showstopper issues when deploying these kernels on busier guests, showing a
> lot of errors like this:
>
> BUG: soft lockup - CPU#0 stuck for 561s! [swapper:0]
> Modules linked in:
> CPU 0:
> Modules linked in:
> Pid: 0, comm: swapper Not tainted 2.6.32.9xls-domU #2
> RIP: e030:[<ffffffff810093aa>] [<ffffffff810093aa>]
> hypercall_page+0x3aa/0x1001
> RSP: e02b:ffffffff81691f70 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: ffffffff81690000 RCX: ffffffff810093aa
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: ffffffff81896d30 R08: 0000000000000000 R09: ffffffff8100e3b2
> R10: 0000000000000001 R11: 0000000000000246 R12: ffffffffffffffff
> R13: ffffffff818ebf20 R14: ffffffff818eec70 R15: 0000000000000000
> FS: 00007f8ac7a9c6e0(0000) GS:ffff8800022ac000(0000) knlGS:0000000000000000
> CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f291b4c9000 CR3: 000000007d8c1000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
> [<ffffffff8100ddb7>] ? xen_safe_halt+0xc/0x15
> [<ffffffff8100bdcf>] ? xen_idle+0x37/0x40
> [<ffffffff8100fe2e>] ? cpu_idle+0x4f/0x82
> [<ffffffff818b6c42>] ? start_kernel+0x353/0x35f
>
> in our hope to get rid of this issue we upgraded from Xen 3.3 to Xen 3.4.1.7
> out of the gitco repos. The issue persisted. Is there a magic version of Xen,
> preferably one that can be found in an rpm repository for CentOS 5, that
> *does* properly support pvops kernels without these issues?
>
> I'm also seing this one:
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=543 where the bug is
> still open but no activity since 2008. I don't know if that bugzilla is still
> being actively maintained?
>
> Cheers,
> Pim
>
>
>
Hi Pim,
I am having similar issues, with differents versions of Xen hypervisor
(3.2, 3.4) and
with domU kernels >= 2.6.26 (until 2.6.32-4 from Debian), always on the
same 2
or 3 VMs that are frequently under heavy load.
After searching a lot, I thought that my CPU softlock problems (which
sometimes
make my VMs freezing) was perhaps related to the xen clocksource, so I
decided
to give a try to this :
http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.27
Using jiffies + independant wallclock + ntp in domU seems to have stop
the CPU
softlock error messages in kernel messages (at least I didn't have any
since I use
it, but it's only for 2 days...). Now I am crossing my fingers... :-)
I also read in xen-devel that you are using FC LUNs for storage, I also
use that,
perhaps you will want to have a look at the "Interrupt handling in Xen"
message
that was post on this list yesterday, by defaults my domain-0 was doing
all its
interrupts (network and HBA) on the same CPU, which is probably some kind
of bottleneck under heavy load.
Cheers,
--
Yann Cézard - Administrateur Systèmes Serveurs
Centre de Ressources Informatiques - http://cri.univ-pau.fr
Université de Pau et des Pays de l'Adour - http://www.univ-pau.fr
Bat IFR, rue Jules Ferry, 64000 PAU - Tél.: +33 (0)5 59 40 77 94
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|