WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] block-attach crashes domU

To: Tracy Reed <treed@xxxxxxxxxxxxxxx>
Subject: Re: [Xen-users] block-attach crashes domU
From: Pasi Kärkkäinen <pasik@xxxxxx>
Date: Fri, 22 Jan 2010 10:00:18 +0200
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 22 Jan 2010 00:01:59 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100122062126.GV8056@xxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <20100122062126.GV8056@xxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
On Thu, Jan 21, 2010 at 10:21:26PM -0800, Tracy Reed wrote:
> I have run into a strange situation where a domain will not boot with
> a certain disk specified in the config file and trying to block-attach
> it after it starts results in the domain disappearing from the list
> and presumably simply crashing. 
> 
> I am running CentOS 5.4 with kernel 2.6.18-160.el5xen x86_64
> 
> For months everything worked perfectly with with these domains using
> an AoE SAN for the back-end. I have used this sort of setup for
> several years and it is great. But these domains in particular have
> been running for several months. Then 3 of the 4 domU's I run were
> really heavily slammed and became unresponsive and I ended up having
> to do an xm destroy on them. After that they refuse to come back
> up. One of my domU's has not been rebooted and it continues to work
> great with all 4 disk devices attached.
> 
> Here is my domU config file:
> 
> name = "db2"
> uuid = "f253cab5-c3de-c1f7-e735-5d4f0bfcd3ff"
> maxmem = 16384
> memory = 2048
> vcpus = 4
> bootloader = "/usr/bin/pygrub"
> on_poweroff = "destroy"
> on_reboot = "restart"
> on_crash = "restart"
> vfb = [  ]
> disk = [ "phy:/dev/etherd/e1.12,xvda,w", "phy:/dev/etherd/e2.12,xvdb,w", 
> "phy:/dev/etherd/e3.1,xvdc,w", "phy:/dev/etherd/e4.1,xvdd,w" ]
> vif = [ "mac=00:16:3e:5b:5c:dd,bridge=dmz" ]
> 
> If I boot the domU with this config file I get the following on boot:
> 
> Red Hat nash version 5.1.19.6 starting
> Mounting proc filesystem
> Mounting sysfs filesystem
> Creating /dev
> Creating initial device nodes
> Setting up hotplug.
> Creating block device nodes.
> Loading ehci-hcd.ko module
> Loading ohci-hcd.ko module
> Loading uhci-hcd.ko module
> USB Universal Host Controller Interface driver v3.0
> Loading jbd.ko module
> Loading ext3.ko module
> Loading raid1.ko module
> md: raid1 personality registered for level 1
> Loading xenblk.ko module
> Registering block device major 202
>  xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 >
>  xvdb: xvdb1 xvdb2 xvdb3 xvdb4 < xvdb5 >
>  xvdc: xvdc1
> kobject_add failed for xvda with -EEXIST, don't try to register things with 
> the same name in the same directory.
> 
> Call Trace:
>  [<ffffffff803404ea>] kobject_add+0x170/0x19b
>  [<ffffffff8025cfd5>] exact_lock+0x0/0x14
>  [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff802fb4e2>] register_disk+0x43/0x190
>  [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80336c3a>] add_disk+0x34/0x3d
>  [<ffffffff88084ec9>] :xenblk:backend_changed+0x110/0x193
>  [<ffffffff803b32fa>] xenbus_read_driver_state+0x26/0x3b
>  [<ffffffff803b4bdb>] xenwatch_thread+0x0/0x135
>  [<ffffffff803b402d>] xenwatch_handle_callback+0x15/0x48
>  [<ffffffff803b4cf7>] xenwatch_thread+0x11c/0x135
>  [<ffffffff8029bb44>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80233bcd>] kthread+0xfe/0x132
>  [<ffffffff80260b2c>] child_rip+0xa/0x12
>  [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80233acf>] kthread+0x0/0x132
>  [<ffffffff80260b22>] child_rip+0x0/0x12
> 
> Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: 
>  [<ffffffff802fe512>] create_dir+0x11/0x1cf
> PGD 7f1c9067 PUD 7f1ca067 PMD 0 
> Oops: 0000 [1] SMP 
> last sysfs file: /block/ram0/dev
> CPU 1 
> Modules linked in: xenblk raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
> Pid: 9, comm: xenwatch Not tainted 2.6.18-164.el5xen #1
> RIP: e030:[<ffffffff802fe512>]  [<ffffffff802fe512>] create_dir+0x11/0x1cf
> RSP: e02b:ffff880000fbfda0  EFLAGS: 00010282
> RAX: ffff88007f31b870 RBX: ffff88007f3cd4f0 RCX: ffff880000fbfdd8
> RDX: ffff88007f3cd4f8 RSI: 0000000000000000 RDI: ffff88007f3cd4f0
> RBP: ffff88007f3cd4f0 R08: 0000000000000001 R09: ffff88000114c000
> R10: ffffffff8029b92c R11: ffff880000fbfbb0 R12: ffff88007f3cd4f0
> R13: ffff880000fbfdd8 R14: 0000000000000000 R15: ffff88007f31b870
> FS:  0000000000000000(0000) GS:ffffffff805ca080(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000
> 
> The /dev/etherd/e4.1 backend to the xvdd device is present in the dom0
> and works perfectly. I can access it from within the dom0 with no
> problem.
> 
> Something is confused. I would really like to avoid rebooting the
> dom0's if at all possible.
> 
> I have found that if I remove the "phy:/dev/etherd/e4.1,xvdd,w" from
> the disk = line the domU boots fine. But if I try to block-attach the
> missing device the domU dies instantly. 
> 
> I have been looking for logs that might explain something about why it
> died but I cannot find anything relevant. I have googled the "don't
> try to register thigns with the same name in the same directory" error
> and found a few references to it but none in the context of xen.
> 
> Any advice would be greatly appreciated.
> 

Does it work if you attach some local LVM volume or file image (non-AOE) as 
xvdd? 

Do you get errors in dom0 "dmesg"? How about dom0 /var/log/messages? 
Do you get errors in dom0 "xm log" ? How about "xm dmesg"?

-- Pasi



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>