WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Arbitrary reboot with xen 3.4.x

To: Pasi Kärkkäinen <pasik@xxxxxx>
Subject: Re: [Xen-devel] Arbitrary reboot with xen 3.4.x
From: Guillaume Rousse <Guillaume.Rousse@xxxxxxxx>
Date: Fri, 20 Nov 2009 11:42:23 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 20 Nov 2009 02:42:48 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20091119180951.GF16033@xxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4B058940.2050009@xxxxxxxx> <20091119180951.GF16033@xxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.23 (X11/20091009)
Pasi Kärkkäinen a écrit :
On Thu, Nov 19, 2009 at 07:06:56PM +0100, Guillaume Rousse wrote:
Hello.

I've a dom0 working perfectly under xen 3.3.x, with a bout 15 HVM domU. When migrating to xen 3.4.1, with the same dom0 kernel (2.6.27.37), everything seems to be fine, I can launch the various hosts, but 5 to 10 minutes later, the host violently reboot... I can't find any trace in the logs. I do have a second host with the same configuration and setup, and the result is similar. It seems to be linked with domU activity, because without any domU, or without any domU with actual activity, I don't have any reboot. I had to rollback to xen 3.3.0.


Did you try the new Xen 3.4.2 ?
I just did this morning. Without any changelog, it's a bit 'upgrade and pray'...

It seems like an hardware issue (but it doesn't appears with 3.3.0), or a crash in the hypervisor, than syslog is unable to catch when it appears. How can I try to get a trace ?


You should setup a serial console, so you can capture and
log the full console (xen + dom0 kernel) output to other computer..
Indeed.

Here is the output. At first domU crash, because of memory ballooning issue, is not fatal. The second crash, however is. I don't know if it's because of uncorrect state after initial crash, or because of additional domUs launched in the interim.

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) Domain 1 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-3.4.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: 0010:[<ffffffff811ed7ed>] (XEN) RFLAGS: 0000000000010246 CONTEXT: hvm guest (XEN) rax: 00000000007028b8 rbx: 0000000000001000 rcx: 0000000000000200 (XEN) rdx: 0000000000000000 rsi: 00000000007028b8 rdi: ffff8800123a0000 (XEN) rbp: ffff88001a119b68 rsp: ffff88001a119b50 r8: ffffea00003fcb00 (XEN) r9: 000000000001050f r10: 0000000000000000 r11: 0000000000000001 (XEN) r12: 0000000000001000 r13: 0000000000000000 r14: ffff88001796aea8 (XEN) r15: 0000000000001000 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 000000001a079000 cr2: 00007fc176c772e8 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0018 cs: 0010 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) Domain 2 reported crashed by domain 0 on cpu#0: (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! (XEN) domain_crash called from p2m.c:1091 (XEN) ----[ Xen-3.4.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff828c801aab29>] hash_foreach+0x59/0xe0 (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff8284000c1780 rcx: 00000000000060bc (XEN) rdx: ffff83041f98c000 rsi: 0000000000000336 rdi: ffff8300be7c0000 (XEN) rbp: 0000000000000336 rsp: ffff828c80257848 r8: 0000000000200c00 (XEN) r9: 0000000000000001 r10: ffff83041f98c000 r11: ffff828c801b10e0 (XEN) r12: 0000000000000001 r13: 0000000000000000 r14: 00000000000060bc (XEN) r15: ffff828c80205f80 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000021759000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff828c80257848: (XEN) 0000000000000000 ffff8300be7c0000 ffff83041f98c000 ffff8284000c1780 (XEN) ffff8300be7c0000 00000000000060bc 0000000000000000 00000000000144bc (XEN) ffff8300be7c0000 ffff828c801aae4d ffff828c80257960 00000000000060bc (XEN) ffff828c80257960 ffff83041f98c000 ffff83041f98c000 ffff828c801b13bf (XEN) 00000000000144bc 0000000000200c00 ffff83041f4ed5e0 ffff83041f98d130 (XEN) ffff828c80284d24 ffff83041f4ed5e0 ffff828c80257960 ffff828c80257968 (XEN) ffff83041f98c000 00000000000144bc 0000000000000000 ffff828c801a96d4 (XEN) 0000000000000200 2000000000000000 ffff828c80257a80 000000061f98c000 (XEN) 0000000000000200 007fffffffffffff 0000000000000000 ffff83041f4ed000 (XEN) 000000000041f4ed 0000000000000001 0000000000000001 0000000000000200 (XEN) 00000000000144bc ffff83041f98c000 0000000000000006 ffff828c801a5991 (XEN) ffff828c80257abc 0000000000000001 ffff828c80257ba8 007fffffffffffff (XEN) ffff828c802579f0 ffff83041f98c000 ffff828c80257a80 ffff828c801a6efb (XEN) 0000000400000000 0000000000000000 ffff8300060bc000 ffff8300060bb000 (XEN) ffff8300060ba000 ffff8300060b9000 ffff8300060b8000 ffff8300060b7000 (XEN) ffff8300060b6000 ffff8300060b5000 ffff8300060b4000 ffff8300060b3000 (XEN) ffff8300060b2000 ffff8300060b1000 ffff8300060b0000 ffff8300060af000 (XEN) ffff8300060ae000 ffff828c801f16dc 0000000000000082 0000000100000001 (XEN) 0000000100000001 0000000100000001 0000000100000001 0000000100000001 (XEN) 0000000100000001 0000000100000001 0000000100000001 0000000000000286 (XEN) Xen call trace: (XEN) [<ffff828c801aab29>] hash_foreach+0x59/0xe0 (XEN) [<ffff828c801aae4d>] sh_remove_all_mappings+0x8d/0x200 (XEN) [<ffff828c801b13bf>] shadow_write_p2m_entry+0x2df/0x330 (XEN) [<ffff828c801a96d4>] p2m_set_entry+0x344/0x430 (XEN) [<ffff828c801a5991>] set_p2m_entry+0x71/0xa0 (XEN) [<ffff828c801a6efb>] p2m_pod_zero_check+0x1db/0x310 (XEN) [<ffff828c801a8a20>] p2m_pod_demand_populate+0x830/0xa40 (XEN) [<ffff828c801a90b4>] p2m_gfn_to_mfn+0x224/0x260 (XEN) [<ffff828c80151fd5>] mod_l1_entry+0x6e5/0x7b0 (XEN) [<ffff828c80153067>] do_mmu_update+0x937/0x16e0 (XEN) [<ffff828c8014df0b>] get_page_type+0xb/0x20 (XEN) [<ffff828c801112b4>] do_multicall+0x164/0x370 (XEN) [<ffff828c801c8169>] syscall_enter+0xa9/0xae (XEN) (XEN) Pagetable walk from 0000000000000000: (XEN) L4[0x000] = 000000001cb48067 00000000003d6ca9 (XEN) L3[0x000] = 000000000c58b067 00000000003e72ec (XEN) L2[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 0000000000000000 (XEN) **************************************** (XEN)
(XEN) Reboot in five seconds...


My domUs all have this configuration:
memory = 256
maxmem = 512

Or different values, but always with the same ratio between memory and max memory. Which seems to be quite useless for hvm domUs, as memory ballooning is not supported AFAIK, unless using pv-drivers (which I can't manage to build).

With identical values, the issue does'nt appear.

With Xen 3.4.2, the domUs still crash, but at least dom0 does not reboot. So it's just less worst :)
--
BOFH excuse #426:

internet is needed to catch the etherbunny

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel