Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT

To:	MaoXiaoyun <tinnycloud@xxxxxxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
From:	Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date:	Mon, 30 Aug 2010 14:16:09 +0100
Cc:
Delivery-date:	Mon, 30 Aug 2010 06:17:56 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<BAY121-W8CFEA2EBC6CB60638CAA2DA890@xxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	ActIQ9HxTeXypqTlT+eugkrAmgcmvwAAbCVd
Thread-topic:	[Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
User-agent:	Microsoft-Entourage/12.26.0.100708

On 30/08/2010 14:03, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:

> Appreciate for the quick response.
>  
> Actually I have done some decode on the backtrace last Friday.
> According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms"
> (please see below). It looks like the bug happened on the domain page list

ffff82c4801153c3 isn't the start of an instruction in your below
disassembly. Hence you didn't disassemble exactly the build of Xen which
crashed. It needs to be exactly the same image.

 -- keir

> travels, which is beyond my understanding. Since in my understanding,
> those domain pages come from kernel memory zone, they are always
> reside in the physical memory, and the address shouldn't have the chance
> to be changed, right?
> If so, what is the relationship between all those panic and free_heap_pages?
>  
> Several servers (at least 3) experienced the same panic on the same test.
> Those servers have the identical hardware, kernel and xen configuration.
> Right now, on one server, memtest is running, shall be finished in a few
> hours.
>  (24G memory)
>  
> ------------------------------------------------------------------------------
> ------
>  169 static inline void
>  170 page_list_del(struct page_info *page, struct page_list_head *head)
>  171 {
>  172     struct page_info *next = pdx_to_page(page->list.next);
>  173     struct page_info *prev = pdx_to_page(page->list.prev);
>  174 ffff82c4801153b8:<++8b 73 04             <++mov    0x4(%rbx),%esi
>  175 ffff82c4801153bb:<++49 8d 0c 06          <++lea    (%r14,%rax,1),%rcx
>  176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea    2494714(%rip),%rax
> # ffff82c4803764c0 <_heap>
>  177 ffff82c4801153c6:<++48 c1 e1 04          <++shl    $0x4,%rcx
>  178 ffff82c4801153ca:<++4a 03 0c f8          <++add    (%rax,%r15,8),%rcx
>  179 }
>  180 static inline void
>  181 page_list_del(struct page_info *page, struct page_list_head *head)
>  182 {
>  183     struct page_info *next = pdx_to_page(page->list.next);
>  184 ffff82c4801153ce:<++8b 03                <++mov    (%rbx),%eax
>  185 ffff82c4801153d0:<++48 c1 e0 05          <++shl    $0x5,%rax
>  186 ffff82c4801153d4:<++48 29 e8             <++sub    %rbp,%rax
>  187 ffff82c4801153d7:<++48 3b 19             <++cmp    (%rcx),%rbx
>  188 ffff82c4801153da:<++0f 84 95 01 00 00    <++je     ffff82c480115575
> <free_heap_pages+0x405>
>  189     struct page_info *prev = pdx_to_page(page->list.prev);
>  190 ffff82c4801153e0:<++89 f2                <++mov    %esi,%edx
>  191 ffff82c4801153e2:<++48 c1 e2 05          <++shl    $0x5,%rdx
>  192 ffff82c4801153e6:<++48 29 ea             <++sub    %rbp,%rdx
>  193 ffff82c4801153e9:<++48 3b 59 08          <++cmp    0x8(%rcx),%rbx
>  194 ffff82c4801153ed:<++0f 84 bd 01 00 00    <++je     ffff82c4801155b0
> <free_heap_pages+0x440>
>  195 
>  196     if ( !__page_list_del_head(page, head, next, prev) )
>  197     {
>  198    
> ------------------------------------------------------------------------------
> ------
>  
>> Date: Mon, 30 Aug 2010 10:02:05 +0100
>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
>> From: keir.fraser@xxxxxxxxxxxxx
>> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>> 
>> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>> 
>>> 3) Every panic pointer to the same address: ffff8315ffffffe4, which is
>>> not a valid page address.
>>> I printted pages of the domain in assign_pages, which all looks like
>>> ffff82f60bd64000, at least
>>> ffff82f60 is the same.
>> 
>> Yes, well you may not be crashing on a supposed page address. Certainly the
>> page pointer that relinquish_memory() is working on, and passed to
>> put_page->free_domheap_pages is valid enough to not cause any of those
>> functions to crash when dereferencing it. At the moment you really have no
>> idea what is causing free_heap_pages() to crash.
>> 
>>> A bit of lost direction to go further. Thanks.
>> 
>> You need to find out which line of code in free_heap_pages() is crashing,
>> and what variable it is trying to dereference when it crashes. You have a
>> nice backtrace with an EIP value, so you can 'objdump -d xen-syms' and
>> search for the EIP in the disassembly. If you have a debug build of Xen you
>> can even do 'objdump -S xen-syms' and have the disassembly annotated with
>> corresponding source lines.
>> 
>> Have you seen this on more than one physical machine? If not, have you run
>> memtest on the offending machine?
>> 
>> -- Keir
>> 
>> 
>        



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT