[Xen-users] blktap and file-backed qcow: crashes and bad perform

To:	Xen-users@xxxxxxxxxxxxxxxxxxx
Subject:	[Xen-users] blktap and file-backed qcow: crashes and bad performance?
From:	"Christoph Dwertmann" <lists.cd@xxxxxxxxx>
Date:	Fri, 11 Aug 2006 16:59:00 +0200
Delivery-date:	Fri, 11 Aug 2006 07:59:46 -0700
Domainkey-signature:	a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=Iu00901jUJaXYTFjXP8ov7Cukzc9hlYY9D4ec7OvfOlc+/z4I+EMR0H/ZPxbsenAgWX60KOC5tSe94ij88wcuOk656F9QRVPhA2dkDAMCDSQa4WmQR31MwHhlu7wv6OWT8w94Q7kbxs+L0yWHDeqI0rVrtTAXMnliXUjElQQV2E=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-users-request@lists.xensource.com?subject=help>
List-id:	Xen user discussion <xen-users.lists.xensource.com>
List-post:	<mailto:xen-users@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-users-bounces@xxxxxxxxxxxxxxxxxxx

Hi!

I'm running the latest Xen unstable x86_64 on a Dell Poweredge 1950
Dual CPU Dual Core Xeon with 16GB RAM. I'm using file-backed sparse
qcow images as root filesystems for the Xen guests. All qcow images
are backed by the same image file (a 32bit Debian sid installation).
The Xen disk config looks like this:

disk   = [ 'tap:qcow:/home/images/%s.%d.qcow,xvda1,w' % (vmname, vmid)]

Before that I use the qcow-create tool to create those qcow files.

I use grub to boot Xen like this:
root    (hd0,0)
kernel /boot/xen-3.0-unstable.gz com2=57600,8n1 console=com2
dom0_mem=4097152 noreboot xenheap_megabytes=32
module /boot/xen0-linux root=/dev/sda1 ro noapic console=tty0
xencons=ttyS1 console=ttyS1
module /boot/xen0-linux-initrd

My goal is to run 100+ Xen guests, but this seems impossible. I
observe several things:

- after creating a few Xen guests (and even after shutting them down),
my process list is cluttered with "tapdisk" processes that put full
load on all 8 virtual CPUs on the dom0. The system gets unuseable.
Killing the tapdisk processes also apparently destroys the qcow
images.

- I (randomly?) get the messages "Error: (28, 'No space left on
device')" or "Error: Device 0 (vif) could not be connected. Hotplug
scripts not working." or even "Error: (12, 'Cannot allocate memory')"
on domU creation. There is plenty of disk space and RAM available at
that time. This mostly happens when creating more than 80 guests.

- the dom0 will sooner or later crash with a message like this:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/aio.c:511
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: ipt_MASQUERADE iptable_nat ip_nat ip_conntrack
nfnetlink ip_tables x_tables bridge dm_snapshot dm_mirror dm_mod
usbhid ide_cd sers
Pid: 46, comm: kblockd/0 Not tainted 2.6.16.13-xen-kasuari-dom0 #1
RIP: e030:[<ffffffff8018f8ee>] <ffffffff8018f8ee>{__aio_put_req+39}
RSP: e02b:ffffffff803a89c8  EFLAGS: 00010086
RAX: 00000000ffffffff RBX: ffff8800f43d7a80 RCX: 00000000f3bdc000
RDX: 0000000000001458 RSI: ffff8800f43d7a80 RDI: ffff8800f62d1c80
RBP: ffff8800f62d1c80 R08: 6db6db6db6db6db7 R09: ffff88000193d000
R10: 0000000000000000 R11: ffffffff80153e48 R12: ffff8800f62d1ce8
R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000000
FS:  00002b9bf01bccb0(0000) GS:ffffffff80472000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process kblockd/0 (pid: 46, threadinfo ffff8800005e4000, task ffff8800005c57e0)
Stack: ffff8800f43d7a80 ffff8800f62d1c80 ffff8800f62d1ce8 ffffffff80190082
      ffff880004e83d10 ffff8800f4db7400 0000000000000200 ffff8800f4db7714
      ffff8800f4db7400 0000000000000001
Call Trace: <IRQ> <ffffffff80190082>{aio_complete+297}
      <ffffffff80195b0b>{finished_one_bio+159}
<ffffffff80195be8>{dio_bio_complete+150}
      <ffffffff80195d24>{dio_bio_end_aio+32}
<ffffffff801cf1b7>{__end_that_request_first+328}
      <ffffffff801d00ca>{blk_run_queue+50}
<ffffffff8800524d>{:scsi_mod:scsi_end_request+40}
      <ffffffff880054fe>{:scsi_mod:scsi_io_completion+525}
      <ffffffff880741ce>{:sd_mod:sd_rw_intr+598}
<ffffffff88005792>{:scsi_mod:scsi_device_unbusy+85}
      <ffffffff801d1534>{blk_done_softirq+175}
<ffffffff80132544>{__do_softirq+122}
      <ffffffff8010bada>{call_softirq+30} <ffffffff8010d231>{do_softirq+73}
      <ffffffff8010d626>{do_IRQ+65} <ffffffff8023bf5a>{evtchn_do_upcall+134}
      <ffffffff801d8a66>{cfq_kick_queue+0}
<ffffffff8010b60a>{do_hypervisor_callback+30} <EOI>
      <ffffffff801d8a66>{cfq_kick_queue+0}
<ffffffff8010722a>{hypercall_page+554}
      <ffffffff8010722a>{hypercall_page+554} <ffffffff801dac97>{kobject_get+18}
      <ffffffff8023b7aa>{force_evtchn_callback+10}
<ffffffff8800641d>{:scsi_mod:scsi_request_fn+935}
      <ffffffff801d8adc>{cfq_kick_queue+118}
<ffffffff8013d3e6>{run_workqueue+148}
      <ffffffff8013db18>{worker_thread+0}
<ffffffff80140abd>{keventd_create_kthread+0}
      <ffffffff8013dc08>{worker_thread+240}
<ffffffff80125cdb>{default_wake_function+0}
      <ffffffff80140abd>{keventd_create_kthread+0}
<ffffffff80140abd>{keventd_create_kthread+0}
      <ffffffff80140d61>{kthread+212} <ffffffff8010b85e>{child_rip+8}
      <ffffffff80140abd>{keventd_create_kthread+0}
<ffffffff80140c8d>{kthread+0}
      <ffffffff8010b856>{child_rip+0}

Code: 0f 0b 68 c3 9b 2f 80 c2 ff 01 85 c0 74 07 31 c0 e9 09 01 00
RIP <ffffffff8018f8ee>{__aio_put_req+39} RSP <ffffffff803a89c8>
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.

Is it just my setup or
- does Xen not scale at all to 100+ machines?
- does blktap not scale at all?
- is blktap with qcow very unstable right now?

Thank you for any pointers,

--
Christoph Dwertmann
cdwertmann at gmx dot de

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

WARNING - OLD ARCHIVES

xen-users

[Xen-users] blktap and file-backed qcow: crashes and bad performance?