Hi:
This issue can be easily reproduced by continuous and almost concurrently reboot 12 Xen HVM VMS on a single physic server. The reproduce hit the back trace about 6 to 14 hours after it started. I have several similar Xen back traces, please refer to the end of the mail. The first three back traces almost the same, they happened in domain_kill, while the last backtrace happened in do_multicall.
As go through the Xen code, in /xen-4.0.0/xen/arch/x86/mm.c, it shows that the author aware of the race competition between domain_relinquish_resources and presented code. It occurred me to simply move line 2765 and 2766 before 2764, that is move put_page_and_type(page) into the spin_lock to avoid competition.
2753 /* A page is dirtied when its pin status is set. */
2754 paging_mark_dirty(pg_owner, mfn);
2755
2756 /* We can race domain destruction (domain_relinquish_resources). */
2757 if ( unlikely(pg_owner != d) )
2758 {
2759 int drop_ref;
2760 spin_lock(&pg_owner->page_alloc_lock);
2761 drop_ref = (pg_owner->is_dying &&
2762 test_and_clear_bit(_PGT_pinned,
2763 &page->u.inuse.type_info));
2764 spin_unlock(&pg_owner->page_alloc_lock);
2765 if ( drop_ref )
2766 put_page_and_type(page);
2767 }
2768
2769 break;
2770 }
Form the result of reproduce on patched code, it appears the patch worked well since the reproduce succeed during a 48hours long run. But I am not sure of the side effects it brings.
Appreciate in advance if someone could give more clauses, thx.
=============Trace 1: =============
(XEN) ----[ Xen-4.0.0 x86_64 debug=y Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82c48011617c>] free_heap_pages+0x55a/0x575
(XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor
(XEN) rax: 0000001fffffffe0 rbx: ffff82f60b8bbfc0 rcx: ffff83063fe01a20
(XEN) rdx: ffff8315ffffffe0 rsi: ffff8315ffffffe0 rdi: 00000000ffffffff
(XEN) rbp: ffff82c48037fc98 rsp: ffff82c48037fc58 r8: 0000000000000000
(XEN) r9: ffffffffffffffff r10: ffff82c48020e770 r11: 0000000000000282
(XEN) r12: 00007d0a00000000 r13: 0000000000000000 r14: ffff82f60b8bbfe0
(XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0
(XEN) cr3: 0000000232914000 cr2: ffff8315ffffffe4
(XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008
(XEN) Xen stack trace from rsp=ffff82c48037fc58:
(XEN) 0000000000000016 0000000000000000 00000000000001a2 ffff8304afc40000
(XEN) 0000000000000000 ffff82f60b8bbfe0 00000000000330fe ffff82f60b8bc000
(XEN) ffff82c48037fcd8 ffff82c48011647e 0000000100000000 ffff82f60b8bbfe0
(XEN) ffff8304afc40020 0000000000000000 ffff8304afc40000 0000000000000000
(XEN) ffff82c48037fcf8 ffff82c480160caf ffff8304afc40000 ffff82f60b8bbfe0
(XEN) ffff82c48037fd68 ffff82c48014deaf 0000000000000ca3 ffff8304afc40fd8
(XEN) ffff8304afc40fd8 ffff8304afc40fd8 4000000000000000 ffff82c48037ff28
(XEN) 0000000000000000 ffff8304afc40000 ffff8304afc40000 000000000099e000
(XEN) 00000000ffffffda 0000000000000001 ffff82c48037fd98 ffff82c4801504de
(XEN) ffff8304afc40000 0000000000000000 000000000099e000 00000000ffffffda
(XEN) ffff82c48037fdb8 ffff82c4801062ee 000000000099e000 fffffffffffffff3
(XEN) ffff82c48037ff08 ffff82c480104cd7 ffff82c40000f800 0000000000000286
(XEN) 0000000000000286 ffff8300bf76c000 000000ea864b1814 ffff8300bf76c030
(XEN) ffff83023ff1ded8 ffff83023ff1ded0 ffff82c48037fe38 ffff82c48011c9f5
(XEN) ffff82c48037ff08 ffff82c480272100 ffff8300bf76c000 ffff82c48037fe48
(XEN) ffff82c48011f557 ffff82c480272100 0000000600000002 000000004700000a
(XEN) 000000004700bf2c 0000000000000000 000000004700c158 0000000000000000
(XEN) 00002b3b59e7d050 0000000000000000 0000007f00b14140 00002b3b5f257a80
(XEN) 0000000000996380 00002aaaaaad0830 00002b3b5f257a80 00000000009bb690
(XEN) 00002aaaaaad0830 000000398905abf3 000000000078de60 00002b3b5f257aa4
(XEN) Xen call trace:
(XEN) [<ffff82c48011617c>] free_heap_pages+0x55a/0x575
(XEN) [<ffff82c48011647e>] free_domheap_pages+0x2e7/0x3ab
(XEN) [<ffff82c480160caf>] put_page+0x69/0x70
(XEN) [<ffff82c48014deaf>] relinquish_memory+0x36e/0x499
(XEN) [<ffff82c4801504de>] domain_relinquish_resources+0x1ac/0x24c
(XEN) [<ffff82c4801062ee>] domain_kill+0x93/0xe4
(XEN) [<ffff82c480104cd7>] do_domctl+0xa1c/0x1205
(XEN) [<ffff82c4801f71bf>] syscall_enter+0xef/0x149
(XEN)
(XEN) Pagetable walk from ffff8315ffffffe4:
(XEN) L4[0x106] = 00000000bf589027 5555555555555555
(XEN) L3[0x057] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: ffff8315ffffffe4
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)
=============Trace 2: =============
(XEN) Xen call trace:
(XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
(XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380
(XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500
(XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280
(XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0
(XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000
(XEN) [<ffff82c48010739b>] evtchn_set_pending+0xab/0x1b0
(XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
(XEN)
(XEN) Pagetable walk from ffff8315ffffffe4:
(XEN) L4[0x106] = 00000000bf569027 5555555555555555
(XEN) L3[0x057] = 0000000000000000 ffffffffffffffff
(XEN) stdvga.c:147:d60 entering stdvga and caching modes
(XEN)
(XEN) ****************************************
(XEN) HVM60: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: ffff8315ffffffe4
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)
=============Trace 3: =============
(XEN) Xen call trace:
(XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
(XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380
(XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500
(XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280
(XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0
(XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000
(XEN) [<ffff82c480117804>] csched_acct+0x384/0x430
(XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
(XEN)
(XEN) Pagetable walk from ffff8315ffffffe4:
(XEN) L4[0x106] = 00000000bf569027 5555555555555555
(XEN) L3[0x057] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: ffff8315ffffffe4
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)
=============Trace 4: =============
(XEN) Xen call trace:
(XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
(XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380
(XEN) [<ffff82c48015b0e5>] free_page_type+0x4c5/0x670
(XEN) [<ffff82c48015a218>] get_page+0x28/0xf0
(XEN) [<ffff82c48015b439>] __put_page_type+0x1a9/0x290
(XEN) [<ffff82c48016211f>] do_mmuext_op+0xf3f/0x1320
(XEN) [<ffff82c480113d7e>] do_multicall+0x14e/0x340
(XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
(XEN)
(XEN) Pagetable walk from ffff8315ffffffe4:
(XEN) L4[0x106] = 00000000bf569027 5555555555555555
(XEN) L3[0x057] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: ffff8315ffffffe4
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)
-----------------------------------------------------
Sun, 07 Feb 2010 11:56:26 +0000, Keir Fraser >> wrote:
>I'll have to decode the backtrace a bit, but I would guess most likely is >some memory got corrupted along the way, which would be rather nasty. I >already need to follow up on a report of apparent memory corruption in a >domU userspace (testing with the 'memtester' utility), so with a bit of luck >they could be maifestations of the same bug.
>-- Keir
On 06/02/2010 22:56, "Mark Hurenkamp" <mark.hurenkamp@xxxxxxxxx>> wrote:
>> Hi, >> >> >> While playing with my xen server (which is running xen-unstable/linux pvops), >> it suddenly crashed with the following messages on the serial port. >> This is a recent version of xen-unstable, but i am a few updates behind. >> I've seen this only once, so perhaps it is hard to reproduce. I hope this >> info is still of use to someone. >> >> >> Regards, >> Mark. >> >> >> (XEN) tmem: all pools frozen for all domains >> (XEN) tmem: all pools frozen for all domains >> (XEN) tmem: all pools thawed for all domains >> (XEN) tmem: all pools thawed for all domains >> (XEN) paging.c:170: paging_free_log_dirty_bitmap: used 19 pages for domain 3 >> dirty logging >> (XEN) ----[ Xen-4.0.0-rc3-pre x86_64 debug=y Tainted: C ]---- >> (XEN) CPU
: 2 >> (XEN) RIP: e008:[<ffff82c4801150c5>>] free_heap_pages+0x53a/0x555 >> (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor >> (XEN) rax: ffff82c4803004c0 rbx: ffff82f600ae4b40 rcx: ffff8315ffffffe0 >> (XEN) rdx: 00000000ffffffff rsi: ffff8315ffffffe0 rdi: ffff82f600000000 >> (XEN) rbp: ffff83013ff27bc8 rsp: ffff83013ff27b68 r8: 0000000000000000 >> (XEN) r9: 0200000000000000 r10: 0000000000000001 r11: 0080000000000000 >> (XEN) r12: ffff82f600ae4b60 r13: 0000000000000000 r14: 00007d0a00000000 >> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 >> (XEN) cr3: 0000000101001000 cr2: ffff8315ffffffe4 >> (XEN) ds: 0000 es: 0000 &nb
sp; fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff83013ff27b68: >> (XEN) c2c2c2c2c2c2c2c2 0000000000000064 0000000000000000 0000000000000012 >> (XEN) 0000000000000297 000000000000017a ffff82c48011e1e3 0000000000000000 >> (XEN) ffff83010fc50000 ffff82f600ae4b60 0000000000069f65 ffff82f600ae4b80 >> (XEN) ffff83013ff27c18 ffff82c4801153ee 0000000000000001 0000000000000001 >> (XEN) ffff82f600ae49c8 ffff82f600ae4b60 0000000000800727 ffff83013fef0000 >> (XEN) ffff82f600ae4b60 ffff83010fc50000 ffff83013ff27c38 ffff82c48015d4d0 >> (XEN) 000000000000e010 800000005725b727 ffff83013ff27c78 ffff82c48015f8d8 >> (XEN) 80000000571bf727 ffff8300aae3ac60 ffff83013fef0000 ffff8300aae3b000 >> (XEN)&nbs
p; ffff83013ff27f28 0000000000000000 ffff83013ff27cd8 ffff82c48015eaf4 >> (XEN) ffff83013ff27d08 ffff82c48015fe3d ffff83013ff27cf8 ffff82c48015d4fe >> (XEN) ffff83013ff27cc8 1400000000000001 ffff82f60155c740 ffff82f60155c740 >> (XEN) ffff83013ff27f28 007fffffffffffff ffff83013ff27d28 ffff82c48015f11c >> (XEN) 000000003fef0000 ffff82f60155c750 ffff83013ff27d38 ffff83013fef0000 >> (XEN) 0000000000000000 ffffc9000000c2b0 00000000000aae3a ffff83013ff27f28 >> (XEN) ffff83013ff27d38 ffff82c48015f2f8 ffff83013ff27e38 ffff82c480163a4f >> (XEN) ffff83013fef0018 00007ff03fef0000 0000000000000000 ffff82c480264db0 >> (XEN) ffff82c480264db8 ffff83013ff27f28 ffff83013ff27f28 ffff83013fef0218 >> (XEN) ffff8300bf524000 ffff83013fef0000 ffff8
300bf524000 ffff83013fef0000 >> (XEN) ffff83013fff3da8 0000000100000002 ffff830100000000 ffff82f60155c740 >> (XEN) 800000008eadf063 ffff880000000001 ffff83013ff27de8 000000003fff3d90 >> (XEN) Xen call trace: >> (XEN) [<ffff82c4801150c5>>] free_heap_pages+0x53a/0x555 >> (XEN) [<ffff82c4801153ee>>] free_domheap_pages+0x30e/0x3cc >> (XEN) [<ffff82c48015d4d0>>] put_page+0x6c/0x73 >> (XEN) [<ffff82c48015f8d8>>] put_page_from_l1e+0x19f/0x1b5 >> (XEN) [<ffff82c48015eaf4>>] free_page_type+0x25c/0x7b0 >> (XEN) [<ffff82c48015f11c>>] __put_page_type+0xd4/0x292 >> (XEN) [<ffff82c48015f2f8>>] put_page_type+0xe/0x23 >> (XEN) [<ffff82c480163a4f>>
] do_mmuext_op+0x6ff/0x14b8 >> (XEN) [<ffff82c480114235>>] do_multicall+0x285/0x410 >> (XEN) [<ffff82c4801f01bf>>] syscall_enter+0xef/0x149 >> (XEN) >> (XEN) Pagetable walk from ffff8315ffffffe4: >> (XEN) L4[0x106] = 00000000bf4f5027 5555555555555555 >> (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 2: >> (XEN) FATAL PAGE FAULT >> (XEN) [error_code=0002] >> (XEN) Faulting linear address: ffff8315ffffffe4 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-deve
l@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel
|