|
|
|
|
|
|
|
|
|
|
xen-devel
RE: [Xen-devel] CPU hangs
My apologies.. it has come to my attention that the piece of code printing the
message about the VRAM being cleared is our own.. so this may be an in-house
bug.
Thanks for looking.
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Roger Cruz
Sent: Thursday, September 09, 2010 1:57 PM
To: Pasi Kärkkäinen
Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] CPU hangs
Hi Pasi,
Thank you for answering so quickly.
> Have you tried changing the cpufreq/cpuidle settings?
No. We have had this work before. I cant recall exactly why we needed.
> How about the watchdog?
The watchdog is here new in order to cause the stack trace. Otherwise, it just
hangs and you cant tell what is going on.
> Also if you're using Xen 3.4.2 I believe you'll lose the dom0_mem=1024M
> parameter
> due to the grub2 bug.. so make sure to add dummy=dummy parameter before the
> dom0_mem.
I fixed this bug in the GRUB2 version we are using, so the parameter is
correctly passed to Xen now.
R.
-----Original Message-----
From: Pasi Kärkkäinen [mailto:pasik@xxxxxx]
Sent: Thursday, September 09, 2010 1:52 PM
To: Roger Cruz
Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] CPU hangs
On Thu, Sep 09, 2010 at 12:48:55PM -0500, Roger Cruz wrote:
> In multicpu mode, it takes what appears to be a random amount of time to
> hang the whole host. So I make it happen faster by cutting down the # of
> CPUs to 1. When I do this, I usually can get it to happen in < 1hr. I
> believe a Windows HVM must be running but can't say that with 100%
> certainty at this time. I dont believe the serial port prints in the
> stack trace is what is hanging. I added a serial port to be able to
> debug the problem. I think the issue is with the shadow page table. Of
> interest may be the fact that these messages are being printed as well
>
> > (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer
> detects
> > that CPU0 is stuck!
>
> So my first inclination is to go research the area dealing with VRAM
> tracking. It may be getting in a loop causing the crash
>
>
> menuentry "Boot Entry 3: debug cpu1" {
> saved_entry=2
> save_env saved_entry
> set root=(NxVG-NxDisk1)
> multiboot /xen.gz dom0_mem=1024MB cpufreq=xen cpuidle
> [1]crashkernel=128M@16M vga=text-80x60,keep sync_console noreboot watchdog
> com1=115200,8n1,magic console=com1 loglvl=all guest_loglvl=all maxcpus=1
> module /vmlinuz-2.6.32-orc root=/dev/mapper/NxVG-NxDisk5 ro
> console=ttyS0,115200,8n1 xencons=ttyS earlyprintk=xen initcall_debug debug
> nmi_watchdog=1
> module /initrd.img-2.6.32-orc
> }
>
Have you tried changing the cpufreq/cpuidle settings?
How about the watchdog?
Also if you're using Xen 3.4.2 I believe you'll lose the dom0_mem=1024M
parameter
due to the grub2 bug.. so make sure to add dummy=dummy parameter before the
dom0_mem.
-- Pasi
> --------------------------------------------------------------------------
>
> From: Pasi Kärkkäinen [mailto:pasik@xxxxxx]
> Sent: Thu 9/9/2010 12:13 PM
> To: Roger Cruz
> Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] CPU hangs
>
> On Thu, Sep 09, 2010 at 10:53:20AM -0500, Roger Cruz wrote:
> > I am experiencing host hangs with 3.4.2 so I turned on the watchdog
> and
> > finally got something useful to start tracking. Before I do, I
> always
> > like to make sure that this is not something that has already been
> > reported and fixed. Anyone know of any such CPU deadlocks and a fix?
> >
> > Thanks
> >
>
> Please paste your grub.conf entry.
> When does this hang happen? During startup, or during operation? After how
> much uptime?
>
> ns16550 sounds like a serial port to me..
>
> -- Pasi
>
> > (XEN) multi.c:1077:d2 gfn f1159 (mfn 60192) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115a (mfn 60191) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115b (mfn 60190) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115c (mfn 6018f) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115d (mfn 6018e) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115e (mfn 6018d) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f115f (mfn 6018c) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
> > (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer
> detects
> > that CPU0 is stuck!
> > (XEN) ----[ Xen-3.4.2 x86_64 debug=n Tainted: C ]----
> > (XEN) CPU: 0
> > (XEN) RIP: e008:[<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
> > (XEN) RFLAGS: 0000000000000006 CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000 rbx: ffff828c801ef260 rcx:
> > 0000000000000001
> > (XEN) rdx: 0000000000002005 rsi: 0000000000000020 rdi:
> > ffff828c801ef260
> > (XEN) rbp: 0000000000000020 rsp: ffff828c8024faa0 r8:
> > 0000000000004000
> > (XEN) r9: 0000000000003fff r10: ffff828c80268360 r11:
> > 0000000000000400
> > (XEN) r12: ffff828c801ef2dc r13: 0000000000000020 r14:
> > ffff828c80267ecc
> > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4:
> > 00000000000026f0
> > (XEN) cr3: 00000000a17ea000 cr2: 0000000097a20000
> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> > (XEN) Xen stack trace from rsp=ffff828c8024faa0:
> > (XEN) ffff828c80127776 ffff828c801ef260 0000000000000000
> > ffff828c801ef2dc
> > (XEN) ffff828c80127e00 0000000800000000 0000000000000086
> > 0000000000000400
> > (XEN) ffff828c80267ea6 ffff828c80267edc ffff828c8024fb40
> > 00000000000f1161
> > (XEN) 0000000000000000 ffff8300b781c000 ffff828c80126019
> > 0000000000000286
> > (XEN) ffff828c8012662e 0000003000000030 ffff828c8024fc18
> > ffff828c8024fb48
> > (XEN) ffff828c80267ea6 0000000000000000 ffff828c801e3b9c
> > 0000000000000435
> > (XEN) 0000000000000002 00000000000f1161 000000000006018a
> > ffff8300b781c000
> > (XEN) ffff8300b75da000 0000000400000000 ffff8180006022b0
> > 0000000078e31023
> > (XEN) 0000000078e31021 0000000000078e31 ffff8180006022b0
> > ffff8180006022b0
> > (XEN) ffff828c801b4870 0000000000000000 ffff828400c03160
> > 0000000000000000
> > (XEN) ffff8300a08a4b08 0000000000000000 000000006018a023
> > ffff8300a08a4b08
> > (XEN) 0000000000000000 000000006018a023 ffff828c801b4839
> > ffff8300b75da000
> > (XEN) ffff828c00000001 ffffffffffffffff 000000000006018a
> > 0000000000000000
> > (XEN) 00000001801b7221 00000000a08a4b08 00000000000a08a4
> > 0000000078e32061
> > (XEN) ffff830078e32b08 ffff8300a08a4b10 ffff830078e32ff8
> > ffff8300b7801b08
> > (XEN) ffff828c8024fcd8 ffff8300b75da000 ffff828c801b6306
> > ffff828c80228740
> > (XEN) ffff8300b7801000 00000000000a08a4 ffff828c8024fcc8
> > ffff828c8024ff28
> > (XEN) ffff828c8024fce4 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN) 0000000100000100 ffff828400f1c640 0000000000f1c640
> > 0000000000078e32
> > (XEN) ffff8300b75da000 ffff828400f1c640 00000000000b7801
> > 0000000000000000
> > (XEN) Xen call trace:
> > (XEN) [<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
> > (XEN) [<ffff828c80127776>] __serial_putc+0x86/0x180
> > (XEN) [<ffff828c80127e00>] serial_puts+0x90/0x120
> > (XEN) [<ffff828c80126019>] __putstr+0x9/0xa0
> > (XEN) [<ffff828c8012662e>] printk+0xee/0x1d0
> > (XEN) [<ffff828c801b4870>] shadow_set_l1e+0x490/0x4e0
> > (XEN) [<ffff828c801b4839>] shadow_set_l1e+0x459/0x4e0
> > (XEN) [<ffff828c801b6306>] sh_resync_l1__guest_3+0x156/0x1c0
> > (XEN) [<ffff828c801aacee>] _sh_resync+0x1be/0x1d0
> > (XEN) [<ffff828c801ac03c>] sh_resync_all+0x3bc/0x450
> > (XEN) [<ffff828c8019d254>] vmx_msr_write_intercept+0x134/0x550
> > (XEN) [<ffff828c801ad8a7>] sh_update_paging_modes+0xd7/0x390
> > (XEN) [<ffff828c801ae624>] shadow_update_paging_modes+0x74/0xd0
> > (XEN) [<ffff828c80182726>] hvm_set_cr4+0xa6/0xb0
> > (XEN) [<ffff828c8019f272>] vmx_vmexit_handler+0x11f2/0x18d0
> > (XEN) [<ffff828c80127500>] ns16550_poll+0x0/0xa0
> > (XEN) [<ffff828c80138f62>] reprogram_timer+0x62/0xa0
> > (XEN) [<ffff828c8018eedb>] pt_update_irq+0x7b/0x110
> > (XEN) [<ffff828c8018a507>] hvm_vcpu_has_pending_irq+0x37/0x60
> > (XEN) [<ffff828c80198715>] vmx_intr_assist+0x55/0x190
> > (XEN) [<ffff828c801984e3>] vmx_asm_do_vmentry+0x0/0xdd
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) FATAL TRAP: vector = 2 (nmi)
> > (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
> > (XEN) ****************************************
> > (XEN)
>
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > [2]http://lists.xensource.com/xen-devel
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10
> 02:34:00
>
> References
>
> Visible links
> 1. mailto:crashkernel=128M@16m
> 2. http://lists.xensource.com/xen-devel
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10
02:34:00
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|