In multicpu mode, it takes what appears to be a random amount of time to hang the whole host. So I make it happen faster by cutting down the # of CPUs to 1. When I do this, I usually can get it to happen in < 1hr. I believe a Windows HVM must be running but can't say that with 100% certainty at this time. I dont believe the serial port prints in the stack trace is what is hanging. I added a serial port to be able to debug the problem. I think the issue is with the shadow page table. Of interest may be the fact that these messages are being printed as well
> (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer detects
> that CPU0 is stuck!
So my first inclination is to go research the area dealing with VRAM tracking. It may be getting in a loop causing the crash
menuentry "Boot Entry 3: debug cpu1" {
saved_entry=2
save_env saved_entry
set root=(NxVG-NxDisk1)
multiboot /xen.gz dom0_mem=1024MB cpufreq=xen cpuidle crashkernel=128M@16M vga=text-80x60,keep sync_console noreboot watchdog com1=115200,8n1,magic console=com1 loglvl=all guest_loglvl=all maxcpus=1
module /vmlinuz-2.6.32-orc root=/dev/mapper/NxVG-NxDisk5 ro console=ttyS0,115200,8n1 xencons=ttyS earlyprintk=xen initcall_debug debug nmi_watchdog=1
module /initrd.img-2.6.32-orc
}
On Thu, Sep 09, 2010 at 10:53:20AM -0500, Roger Cruz wrote:
> I am experiencing host hangs with 3.4.2 so I turned on the watchdog and
> finally got something useful to start tracking. Before I do, I always
> like to make sure that this is not something that has already been
> reported and fixed. Anyone know of any such CPU deadlocks and a fix?
>
> Thanks
>
Please paste your grub.conf entry.
When does this hang happen? During startup, or during operation? After how much uptime?
ns16550 sounds like a serial port to me..
-- Pasi
> (XEN) multi.c:1077:d2 gfn f1159 (mfn 60192) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f115a (mfn 60191) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f115b (mfn 60190) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f115c (mfn 6018f) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f115d (mfn 6018e) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f115e (mfn 6018d) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f115f (mfn 6018c) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
> (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer detects
> that CPU0 is stuck!
> (XEN) ----[ Xen-3.4.2 x86_64 debug=n Tainted: C ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
> (XEN) RFLAGS: 0000000000000006 CONTEXT: hypervisor
> (XEN) rax: 0000000000000000 rbx: ffff828c801ef260 rcx:
> 0000000000000001
> (XEN) rdx: 0000000000002005 rsi: 0000000000000020 rdi:
> ffff828c801ef260
> (XEN) rbp: 0000000000000020 rsp: ffff828c8024faa0 r8:
> 0000000000004000
> (XEN) r9: 0000000000003fff r10: ffff828c80268360 r11:
> 0000000000000400
> (XEN) r12: ffff828c801ef2dc r13: 0000000000000020 r14:
> ffff828c80267ecc
> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4:
> 00000000000026f0
> (XEN) cr3: 00000000a17ea000 cr2: 0000000097a20000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff828c8024faa0:
> (XEN) ffff828c80127776 ffff828c801ef260 0000000000000000
> ffff828c801ef2dc
> (XEN) ffff828c80127e00 0000000800000000 0000000000000086
> 0000000000000400
> (XEN) ffff828c80267ea6 ffff828c80267edc ffff828c8024fb40
> 00000000000f1161
> (XEN) 0000000000000000 ffff8300b781c000 ffff828c80126019
> 0000000000000286
> (XEN) ffff828c8012662e 0000003000000030 ffff828c8024fc18
> ffff828c8024fb48
> (XEN) ffff828c80267ea6 0000000000000000 ffff828c801e3b9c
> 0000000000000435
> (XEN) 0000000000000002 00000000000f1161 000000000006018a
> ffff8300b781c000
> (XEN) ffff8300b75da000 0000000400000000 ffff8180006022b0
> 0000000078e31023
> (XEN) 0000000078e31021 0000000000078e31 ffff8180006022b0
> ffff8180006022b0
> (XEN) ffff828c801b4870 0000000000000000 ffff828400c03160
> 0000000000000000
> (XEN) ffff8300a08a4b08 0000000000000000 000000006018a023
> ffff8300a08a4b08
> (XEN) 0000000000000000 000000006018a023 ffff828c801b4839
> ffff8300b75da000
> (XEN) ffff828c00000001 ffffffffffffffff 000000000006018a
> 0000000000000000
> (XEN) 00000001801b7221 00000000a08a4b08 00000000000a08a4
> 0000000078e32061
> (XEN) ffff830078e32b08 ffff8300a08a4b10 ffff830078e32ff8
> ffff8300b7801b08
> (XEN) ffff828c8024fcd8 ffff8300b75da000 ffff828c801b6306
> ffff828c80228740
> (XEN) ffff8300b7801000 00000000000a08a4 ffff828c8024fcc8
> ffff828c8024ff28
> (XEN) ffff828c8024fce4 0000000000000000 0000000000000000
> 0000000000000000
> (XEN) 0000000100000100 ffff828400f1c640 0000000000f1c640
> 0000000000078e32
> (XEN) ffff8300b75da000 ffff828400f1c640 00000000000b7801
> 0000000000000000
> (XEN) Xen call trace:
> (XEN) [<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
> (XEN) [<ffff828c80127776>] __serial_putc+0x86/0x180
> (XEN) [<ffff828c80127e00>] serial_puts+0x90/0x120
> (XEN) [<ffff828c80126019>] __putstr+0x9/0xa0
> (XEN) [<ffff828c8012662e>] printk+0xee/0x1d0
> (XEN) [<ffff828c801b4870>] shadow_set_l1e+0x490/0x4e0
> (XEN) [<ffff828c801b4839>] shadow_set_l1e+0x459/0x4e0
> (XEN) [<ffff828c801b6306>] sh_resync_l1__guest_3+0x156/0x1c0
> (XEN) [<ffff828c801aacee>] _sh_resync+0x1be/0x1d0
> (XEN) [<ffff828c801ac03c>] sh_resync_all+0x3bc/0x450
> (XEN) [<ffff828c8019d254>] vmx_msr_write_intercept+0x134/0x550
> (XEN) [<ffff828c801ad8a7>] sh_update_paging_modes+0xd7/0x390
> (XEN) [<ffff828c801ae624>] shadow_update_paging_modes+0x74/0xd0
> (XEN) [<ffff828c80182726>] hvm_set_cr4+0xa6/0xb0
> (XEN) [<ffff828c8019f272>] vmx_vmexit_handler+0x11f2/0x18d0
> (XEN) [<ffff828c80127500>] ns16550_poll+0x0/0xa0
> (XEN) [<ffff828c80138f62>] reprogram_timer+0x62/0xa0
> (XEN) [<ffff828c8018eedb>] pt_update_irq+0x7b/0x110
> (XEN) [<ffff828c8018a507>] hvm_vcpu_has_pending_irq+0x37/0x60
> (XEN) [<ffff828c80198715>] vmx_intr_assist+0x55/0x190
> (XEN) [<ffff828c801984e3>] vmx_asm_do_vmentry+0x0/0xdd
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) FATAL TRAP: vector = 2 (nmi)
> (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
> (XEN) ****************************************
> (XEN)
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10 02:34:00