xen-devel
RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
Thank you the details.
There is no "PFN compression on bits" on Xen boot output. I add some extra log, and
found it returned from xen/arch/x86/x86_64/mm.c, line 183. Please refer to the boot
log below.
I may can add some assertions on the pages address after chunk merging.
Thank you for mails your forwarded. I will go through all of them later.
--------------------------pfn_pdx_hole_setup-----------------
164 void __init pfn_pdx_hole_setup(unsigned long mask) 165 { 166 unsigned int i, j, bottom_shift, hole_shift; 167 printk("-------in pfn\n"); 168 169 for ( hole_shift = bottom_shift = j = 0; ; ) 170 { 171 i = find_next_zero_bit(&mask, BITS_PER_LONG, j); 172 j = find_next_bit(&mask, BITS_PER_LONG, i); 173 if ( j >= BITS_PER_LONG ) 174 break; 175 if ( j - i > hole_shift ) 176 { 177 &nb
sp; hole_shift = j - i; 178 bottom_shift = i; 179 } 180 } 181 if ( !hole_shift ){ 182 printk("-------hole shift returned\n"); 183 return; 184 } 185 printk("-------in pfn middle \n"); 186 187 printk(KERN_INFO "PFN compression on bits %u...%u\n", 188 bottom_shift, bottom_shift + hole_shift - 1); 189 printk("----PFN compression on bits %u...%u\n", 190 bottom_shift, bottom_shift +
hole_shift - 1); 191 192 pfn_pdx_hole_shift = hole_shift; 193 pfn_pdx_bottom_mask = (1UL << bottom_shift) - 1; 194 ma_va_bottom_mask = (PAGE_SIZE << bottom_shift) - 1; 195 pfn_hole_mask = ((1UL << hole_shift) - 1) << bottom_shift; 196 pfn_top_mask = ~(pfn_pdx_bottom_mask | pfn_hole_mask); 197 ma_top_mask = pfn_top_mask << PAGE_SHIFT; 198 }
------------------------------------------xen boot log---------------------
(XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009a800 (usable) (XEN) 000000000009a800 - 00000000000a0000 (reserved) (XEN) 00000000000e4bb0 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000bf790000 (usable) (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data) (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved) (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000fff00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000640000000 (usable) (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM) (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97) (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT&nb
sp; 97) (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117) (XEN) ACPI: FACS BF79E000, 0040 (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97) (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97) (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97) (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1) (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97) (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117) (XEN) --------------844 (XEN) ---------srat enter (XEN) ---------prepare en
ter into pfn (XEN) -------in pfn (XEN) -------hole shift returned (XEN) --------------849 (XEN) System RAM: 24542MB (25131224kB) (XEN) Domain heap initialised DMA width 31 bits
> Date: Tue, 31 Aug 2010 15:49:29 +0100 > Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > From: keir.fraser@xxxxxxxxxxxxx > To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > CC: JBeulich@xxxxxxxxxx > > Do you have a line in Xen boot output that starts "PFN compression on bits"? > If so what does it say? > > My suspicion is that Jan Beulich's patches to implement a consolidated page > array for sparse memory maps has broken the assumption in some Xen code > that: > page_to_mfn(mfn_to_page(x)+y) == x+y, for all valid mfns x, and all y up to > some pretty big limit. > > Looking in free_heap_pages() I see we do a whole bunch of chunk merging in > our buddy allocator, doing arithmetic on variable 'pg' to find neigbouring > chunks. It's a bit dodgy I suspect. > > I'm cc'ing Jan to see what we can get away with in doing arithmet
ic on > page_info pointers. What's the guaranteed smallest aligned contiguous ranges > of mfn in the frame_table now, Jan? (i.e., ranges in which adjacent > page_info structs relate to adjacent MFNs) > > If this is the problem I'm pretty sure we can come up with a patch quite > easily, but depending on the answer to my above question to Jan, we may need > to do some code auditing. > > -- Keir > > On 31/08/2010 14:49, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > > > Hi Keir: > > > > Thank you for correcting my mistakes. > > Here is the lastest panic and its objdump. > > I am not familiar with assemble language and those regigsters usage. > > I will try to spend some other time to get more understandings. > > What's your opionion? > > btw, the memtest is still running, so far so good, thanks. > > > > ----
--------------objdump----------------------------------------------------- > > ------------------- > > 177 ffff82c480115396:<++48 c1 e1 04 <++shl $0x4,%rcx > > 178 ffff82c48011539a:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx > > 179 } > > 180 static inline void > > 181 page_list_del(struct page_info *page, struct page_list_head *head) > > 182 { > > 183 struct page_info *next = pdx_to_page(page->list.next); > > 184 ffff82c48011539e:<++8b 03 <++mov (%rbx),%eax > > 185 ffff82c4801153a0:<++48 c1 e0 05 <++shl $0x5,%rax > > 186 ffff82c4801153a4:<++48 29 e8 <++sub %rbp,%rax 187 > > ffff82c4801153a7:<++48 3b 19 <++cmp (%rcx),%rbx > > 188 ffff82c4801153aa:<++0f 84 95 01 00 00 <++je ffff82c480115545 > > <free_heap_pages+0x405> > > 189 struct page_info *prev = pdx_to_page(page->list.prev); > > 1
90 ffff82c4801153b0:<++89 f2 <++mov %esi,%edx > > 191 ffff82c4801153b2:<++48 c1 e2 05 <++shl $0x5,%rdx > > 192 ffff82c4801153b6:<++48 29 ea <++sub %rbp,%rdx > > 193 ffff82c4801153b9:<++48 3b 59 08 <++cmp &nbs p; 0x8(%rcx),%rbx > > 194 ffff82c4801153bd:<++0f 84 bd 01 00 00 <++je ffff82c480115580 > > <free_heap_pages+0x440> > > 195 > > 196 if ( !__page_list_del_head(page, head, next, prev) ) > > 197 { > > 198 next->list.prev = page->list.prev; > > 199 ffff82c4801153c3:<++89 70 04 <++mov %esi,0x4(%rax) > > 200 prev->list.next = page->list.next; > > 201 ffff82c4801153c6:<++8b 03 <++mov (%rbx),%eax > > &nbs p; > > 202 ffff82c4801153c8:<++89 02 <++mov %eax,(%rdx) > > 203 ffff82c4801153ca:<++49 89 dd <++mov %rbx,%r13 > > 204 ffff82c4801153cd:<++41 83 c4 01 &
lt;++add $0x1,%r12d > > 205 ffff82c4801153d1:<++41 83 fc 12 <++cmp ; $0x12,%r12d > > 206 ffff82c4801153d5:<++0f 84 e3 00 00 00 <++je ffff82c4801154be > > <free_heap_pages+0x37e> > > 207 ffff82c4801153db:<++48 bd 00 00 00 00 0a <++mov $0x7d0a00000000,%rbp > > 208 ffff82c4801153e2:<++7d 00 00 > > 209 ffff82c4801153e5:<++44 89 e1 <++mov %r12d,%ecx > > 210 ffff82c4801153e8:<++be 01 00 00 00 <++mov $0x1,%esi > > > > > > ------------------------------------------------------------------------------ > > --------------------- > > blktap_sysfs_create: adding attributes for dev ffff880239496c00 > > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- > > (XEN) CPU: 2 > > (XEN) RIP: e008:[<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor >
> (XEN) rax: ffff8315ffffffe0 rbx: ffff82f6093b0040 rcx: ffff83063fc01a20 > > (XEN) rdx: ffff8315ffffffe0 rsi: 00000000ffffffff rdi: 000000000049d802 > > (XEN) rbp: 00007d0a00000000 rsp: ffff83023ff37cb8 r8: 0000000000000000 > > (XEN) r9: ffffffffffffffff r10: ffff83060a3c0018 r11: 0000000000000282 > > (XEN) r12: 0000000000000000 r13: ffff82f6093b0060 r14: 00000000000001a2 > > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 > > (XEN) cr3: 000000008da54000 cr2: ffff83 15ffffffe4 > > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 > > (XEN) Xen stack trace from rsp=ffff83023ff37cb8: > > (XEN) ffff82f6093b7f80 00000000ffffffe0 00000000000001a2 ffff83060a3c0000 > > (XEN) 0000000000000000 0000000000000001 ffff82f6093b0060 0000000000000000 > > (XEN) ffff82f6093b0080 ffff82c480115732 00000001093b7cc0 ffff82f6093b0060 > > (XEN) ffff83060a
3c0018 0000000000000000 ffff83060a3c0000 ffff83060a3c0fa8 > > (XEN) 0000000000000000 ffff82c48014aaa6 ffff83060a3c0fa8 ffff83060a3c0fa8 > > (XEN) ffff83060a3c0014 4000000000000000 ffff83023ff37f28 ffff83060a3c0018 > > (XEN) 0000000000000000 ffff83060a3c0000 0000000000305000 0000000000000009 > > (XEN) 0000000000000009 ffff82c48014b2fd 00ffffffffffffff ffff83060a3c0000 > > (XEN) 0000000000000000 ffff83023ff37e28 0000000000305000 ffff82c480105fe0 > > (XEN) ffff82c480255240 fffffffffffffff3 0000000002599000 ffff82c4801043ce > > (XEN) ffff82c4801447da 0000000000000080 ffff83023ff37f28 0000000000000096 > > (XEN) ffff83023ff37f28 00000000000000fc 0000000600000002 00000000023c0031 > > (XEN) 0000000000000001 00000039890a8e2a 0000003000000018 000000004523af30 > > (XEN) 000000004523ae70 0000000000000000 00007fc608ea8a70 000000398903c8a4 > > (XEN) 000000004523af44 0000000000000000 000000004
523b158 0000000000000000 > > (XEN) 0000007f024f6d20 00007fc60a094750 000000000255ff40 00007fc607be5ea8 > > (XEN) fffffffffffffff5 0000000000000246 00000039880cc557 0000000000000100 > > (XEN) 00000039880cc557 0000000000000033 0000000000000246 ffff8300bf562000 > > (XEN) ffff8801db8d3e78 000000004523aec0 0000000000305000 000000 0000000009 > > (XEN) 0000000000000009 ffff82c4801e3169 0000000000000009 0000000000000009 > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0 > > (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380 > > (XEN) [<ffff82c48014aaa6>] relinquish_memory+0x186/0x530 > > (XEN) [<ffff82c48014b2fd>] domain_relinquish_resources+0x1ad/0x280 > > (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0 > > (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000 > > (XEN) [<ffff82c4801447da>
] __find_next_bit+0x6a/0x70 > > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > > (XEN) > > (XEN) Pagetable walk from ffff8315ffffffe4: > > (XEN) L4[0x106] = 00000000bf569027 5555555555555555 > > (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff > > (XE N) > > (XEN) **************************************** > > (XEN) Panic on CPU 2: > > (XEN) FATAL PAGE FAULT > > (XEN) [error_code=0002] > > (XEN) Faulting linear address: ffff8315ffffffe4 > > (XEN) **************************************** > > (XEN) > > (XEN) Manual reset required ('noreboot' specified) > > > > ------------------------------------------------------------------------------ > > --------------------- > >> Date: Mon, 30 Aug 2010 14:16:09 +0100 > >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > >>
From: keir.fraser@xxxxxxxxxxxxx > >> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > >> > >> On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > >> > >>> Appreciate for the quick response. > >>> > >>> Actually I have done some decode on the backtrace last Friday. > >>> According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms" > >>> (please see below). It looks like the bug happened on the domain page list > >> > >> ffff82c4801153c3 isn't the start of an instruction in your below > >> disassembly. Hence you didn't disassemble exactly the build of Xen which > >> crashed. It needs to be exactly the same image. > >> > >> -- keir > >> > >> & gt; travels, which is beyond my understanding. Since in my understandi
ng, > >>> those domain pages come from kernel memory zone, they are always > >>> reside in the physical memory, and the address shouldn't have the chance > >>> to be changed, right? > >>> If so, what is the relationship between all those panic and free_heap_pages? > >>> > >>> Several servers (at least 3) experienced the same panic on the same test. > >>> Those servers have the identical hardware, kernel and xen configuration. > >>> Right now, on one server, memtest is running, shall be finished in a few > >>> hours. > >>> (24G memory) > >>> > >>> ---------------------------------------------------------------------------- > >>> -- > >>> ------ > >>> 169 static inline void > >>> 170 page_list_del(struct page_info *page, struct page_list_he
ad *head) > >>> 171 { > >>> 172 struct page_info *next = p dx_to_page(page->list.next); > >>> 173 struct page_info *prev = pdx_to_page(page->list.prev); > >>> 174 ffff82c4801153b8:<++8b 73 04 <++mov 0x4(%rbx),%esi > >>> 175 ffff82c4801153bb:<++49 8d 0c 06 <++lea (%r14,%rax,1),%rcx > >>> 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea 2494714(%rip),%rax > >>> # ffff82c4803764c0 <_heap> > >>> 177 ffff82c4801153c6:<++48 c1 e1 04 <++shl $0x4,%rcx > >>> 178 ffff82c4801153ca:<++4a 03 0c f8 <++add (%rax,%r15,8),%rcx > >>> 179 } > >>> 180 static inline void > >>> 181 page_list_del(struct page_info *page, struct page_list_head *head) > >>> 182 { > >>> 183 struct page_info *next = pdx_to_page(page->list.next); > >>>
184 ffff82c4801153ce:<++8b 03 <++mov (%rbx),%eax > >>> 185 ffff82c4801153d0:<++48 c1 e0 05 <++shl $0x5,%rax > >>> 186 ffff82c4801153d4:<++48 29 e8 <++sub %rbp,%r ax > >>> 187 ffff82c4801153d7:<++48 3b 19 <++cmp (%rcx),%rbx > >>> 188 ffff82c4801153da:<++0f 84 95 01 00 00 <++je ffff82c480115575 > >>> <free_heap_pages+0x405> > >>> 189 struct page_info *prev = pdx_to_page(page->list.prev); > >>> 190 ffff82c4801153e0:<++89 f2 <++mov %esi,%edx > >>> 191 ffff82c4801153e2:<++48 c1 e2 05 <++shl $0x5,%rdx > >>> 192 ffff82c4801153e6:<++48 29 ea <++sub %rbp,%rdx > >>> 193 ffff82c4801153e9:<++48 3b 59 08 <++cmp 0x8(%rcx),%rbx > >>> 194 ffff82c4801153ed:<++0f 84 bd 01 00 00 <++je ffff82c4801155b0 > >>> <free_heap_pages+0x440> > >&
gt;> 195 > >>> 196 if ( !__page_list_del_head(page, head, next, prev) ) > >>> 197 { > >>> 198 > >>> ---------------------------------------------------------------------------- > >>> -- > >>> ------ > >>> > >>>> Date: Mon, 30 Aug 2010 10:02:05 +01 00 > >>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT > >>>> From: keir.fraser@xxxxxxxxxxxxx > >>>> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > >>>> > >>>> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > >>>> > >>>>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is > >>>>> not a valid page address. > >>>>> I printted pages of the domain in assign_pages, wh
ich all looks like > >>>>> ffff82f60bd64000, at least > >>>>> ffff82f60 is the same. > >>>> > >>>> Yes, well you may not be crashing on a supposed page address. Certainly the > >>>> page pointer that relinquish_memory() is working on, and passed to > >>>> put_page->free_domheap_pages is valid enough to not cause any of those > >>>> functions to crash when dereferenci ng it. At the moment you really have no > >>>> idea what is causing free_heap_pages() to crash. > >>>> > >>>>> A bit of lost direction to go further. Thanks. > >>>> > >>>> You need to find out which line of code in free_heap_pages() is crashing, > >>>> and what variable it is trying to dereference when it crashes. You have a > >>>> nice backtrace with
an EIP value, so you can 'objdump -d xen-syms' and > >>>> search for the EIP in the disassembly. If you have a debug build of Xen you > >>>> can even do 'objdump -S xen-syms' and have the disassembly annotated with > >>>> corresponding source lines. > >>>> > >>>> Have you seen this on more than one physical machine? If not, have you run > >>>> memtest on the offending machine? > >>>> > >>>> -- Keir > >>>> > >>>> > >>> > >> > >> > > > >
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, (continued)
- RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, MaoXiaoyun
- Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, Keir Fraser
- RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, MaoXiaoyun
- Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, Keir Fraser
- Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, Keir Fraser
- Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, Jan Beulich
- Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, Keir Fraser
- Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, Jan Beulich
- Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, Keir Fraser
- Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT, Keir Fraser
- RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT,
MaoXiaoyun <=
|
|
|