[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] PV Linux domUs freeze after a few hours

  • To: xen-users@xxxxxxxxxxxxxxxxxxx
  • From: Leszek Urbanski <tygrys@xxxxxx>
  • Date: Mon, 31 Aug 2009 14:43:58 +0200
  • Delivery-date: Mon, 31 Aug 2009 05:44:51 -0700
  • List-id: Xen user discussion <xen-users.lists.xensource.com>


I'm experiencing random domU freezes.

This is similar to Debian bug #534880. One of the numerous references to it
on the web:
It is stated there that both 2.6.26 and 2.6.30 with pv_ops freeze on domU.

I've been using Xen 3.2 for a few months without any problems - until now.

This seems like a critical bug that bites more and more installations and
will surely become a show stopper for migration to Xen at many Linux shops.

My environment:

- Xen 3.2.1

- multiple x86_64 and i386 dom0s and domUs

- machines from different vendors, the hardware has been checked and

- dom0 kernel: Debian Lenny's xenified 2.6.26 (with OpenSUSE patches)

- domU kernels: paravirt ops 2.6.30 and Lenny's xenified 2.6.26 (both have
  the same problems)

- all domUs are SMP (vcpus > 1). This problem doesn't occur with UP domUs,
  (unfortunately the performance hit from running the domUs with vcpus=1 is
  unacceptable for my installations)

- no vcpu pinning (by choice) for dom0s nor domUs

- the bug seems unrelated to load profiles; some domUs that freeze are almost
  always idle, some are I/O intensive, pushing 30 MB/s to disks and a few
  hundred megabits to the network.

The symptoms:

After a few (3-24) hours of runtime, some of the domUs become completely

- the network stack is completely dead

- xm console is unresponsive

- xm vcpu-list always shows one vcpu in no state ("---") and all other vcpus
  in r state

- xm destroy works and immediately destroys the domU

- nothing useful in xm dmesg, xm log

- mpstat shows less than 10% steal

- I'm waiting for another freeze to check if there's anything useful on
  domU consoles

I'll try the following options (and post my results to this list):

- vcpu-pinning for dom0 only

- vcpu-pinning for dom0 and domUs

- vcpu-pinning and dedicating a core for the dom0

(however, vcpu-pinning is not a solution for me, as it wastes cores - some
domUs sit idle and some wait for their turn)

- downgrading dom0 kernel to xenified 2.6.18

- upgrading the hypervisor to 3.4

- downgrading domU kernel to xenified 2.6.18

Leszek "Tygrys" Urbanski, SCSA, SCNA
 "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more
  wretched hive of bugs and flamers. We must be cautious." -- DECWARS
     http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.