WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] pv guests die after failed migration

To: Andreas Olsowski <andreas.olsowski@xxxxxxxxxxx>
Subject: Re: [Xen-devel] pv guests die after failed migration
From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Date: Fri, 23 Sep 2011 08:47:25 +0100
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 23 Sep 2011 00:48:12 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4E7C37BD.2000706@xxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix Systems, Inc.
References: <4E786015.80603@xxxxxxxxxxx> <1316546879.5182.26.camel@xxxxxxxxxxxxxxxxxxxx> <4E7C37BD.2000706@xxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Fri, 2011-09-23 at 08:39 +0100, Andreas Olsowski wrote:
> Here is the full procedure:
[...]

Thanks, I should be able to figure out a repro with this, although I may
not get to it straight away.

> root@xenturio1:/usr/src/linux-2.6-xen# xl console thiswillfail
> PM: freeze of devices complete after 0.207 msecs
> PM: late freeze of devices complete after 0.058 msecs
> ------------[ cut here ]------------
> kernel BUG at drivers/xen/events.c:1466!
> invalid opcode: 0000 [#1] SMP
> CPU 0
> Modules linked in:
> 
> Pid: 6, comm: migration/0 Not tainted 3.0.4-xenU #6
> RIP: e030:[<ffffffff8140d574>]  [<ffffffff8140d574>] 
> xen_irq_resume+0x224/0x370
> RSP: e02b:ffff88001f9fbce0  EFLAGS: 00010082
> RAX: ffffffffffffffef RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ffff88001f809ea8 RSI: ffff88001f9fbd00 RDI: 0000000000000001
> RBP: 0000000000000010 R08: ffffffff81859a00 R09: 0000000000000000
> R10: 0000000000000000 R11: 09f911029d74e35b R12: 0000000000000000
> R13: 000000000000f0a0 R14: 0000000000000000 R15: ffff88001f9fbd00
> FS:  00007ff28f8c8700(0000) GS:ffff88001fec6000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fff02056048 CR3: 000000001e4d8000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process migration/0 (pid: 6, threadinfo ffff88001f9fa000, task 
> ffff88001f9f7170)
> Stack:
>   ffff88001f9fbd34 ffff88001f9fbd54 0000000000000003 000000000000f100
>   0000000000000000 0000000000000003 0000000000000000 0000000000000003
>   ffff88001fa6ddb0 ffffffff8140aa20 ffffffff81859a08 0000000000000000
> Call Trace:
>   [<ffffffff8140aa20>] ? gnttab_map+0x100/0x130
>   [<ffffffff815c2765>] ? _raw_spin_lock+0x5/0x10
>   [<ffffffff81083e01>] ? cpu_stopper_thread+0x101/0x190
>   [<ffffffff8140e1f5>] ? xen_suspend+0x75/0xa0
>   [<ffffffff81083f1b>] ? stop_machine_cpu_stop+0x8b/0xd0
>   [<ffffffff81083e90>] ? cpu_stopper_thread+0x190/0x190
>   [<ffffffff81083dd0>] ? cpu_stopper_thread+0xd0/0x190
>   [<ffffffff815c0870>] ? schedule+0x270/0x6c0
>   [<ffffffff81083d00>] ? copy_pid_ns+0x2a0/0x2a0
>   [<ffffffff81065846>] ? kthread+0x96/0xa0
>   [<ffffffff815c4024>] ? kernel_thread_helper+0x4/0x10
>   [<ffffffff815c3436>] ? int_ret_from_sys_call+0x7/0x1b
>   [<ffffffff815c2be1>] ? retint_restore_args+0x5/0x6
>   [<ffffffff815c4020>] ? gs_change+0x13/0x13
> Code: e8 f2 e9 ff ff 8b 44 24 10 44 89 e6 89 c7 e8 64 e8 ff ff ff c3 83 
> fb 04 0f 84 95 fe ff ff 4a 8b 14 f5 20 95 85 81 e9 68 ff ff ff <0f> 0b 
> eb fe 0f 0b eb fe 48 8b 1d fd 00 42 00 4c 8d 6c 24 20 eb
> RIP  [<ffffffff8140d574>] xen_irq_resume+0x224/0x370
>   RSP <ffff88001f9fbce0>
> ---[ end trace 82e2e97d58b5f835 ]---

This seems to be taking the non-cancelled resume path, does this patch
help at all:

diff -r d7b14b76f1eb tools/libxl/libxl.c
--- a/tools/libxl/libxl.c       Thu Sep 22 14:26:08 2011 +0100
+++ b/tools/libxl/libxl.c       Fri Sep 23 08:45:28 2011 +0100
@@ -246,7 +246,7 @@ int libxl_domain_resume(libxl_ctx *ctx, 
         rc = ERROR_NI;
         goto out;
     }
-    if (xc_domain_resume(ctx->xch, domid, 0)) {
+    if (xc_domain_resume(ctx->xch, domid, 1)) {
         LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
                         "xc_domain_resume failed for domain %u",
                         domid);

I don't think that's a solution but if this patch works then it may
indicate a problem with xc_domain_resume_any.

[...]
> I am not quite sure what you mean by "guest log".

The guest console log, which you provided above, thanks.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel