This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT

To: <keir.fraser@xxxxxxxxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
From: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
Date: Mon, 30 Aug 2010 21:03:43 +0800
Delivery-date: Mon, 30 Aug 2010 06:04:38 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
Importance: Normal
In-reply-to: <C8A1321D.21554%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <BAY121-W47C2F18C8C8A31F98554B0DA890@xxxxxxx>, <C8A1321D.21554%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Appreciate for the quick response.
Actually I have done some decode on the backtrace last Friday.
According the RIP ffff82c4801153c3, I cut the "objdump -dS xen-syms"
(please see below). It looks like the bug happened on the domain page list
travels, which is beyond my understanding. Since in my understanding,
those domain pages come from kernel memory zone, they are always
reside in the physical memory, and the address shouldn't have the chance
to be changed, right?
If so, what is the relationship between all those panic and free_heap_pages?
Several servers (at least 3) experienced the same panic on the same test. 
Those servers have the identical hardware, kernel and xen configuration.
Right now, on one server, memtest is running, shall be finished in a few hours.
 (24G memory)
 169 static inline void
 170 page_list_del(struct page_info *page, struct page_list_head *head)
 171 {
 172     struct page_info *next = pdx_to_page(page->list.next);
 173     struct page_info *prev = pdx_to_page(page->list.prev);
 174 ffff82c4801153b8:<++8b 73 04             <++mov    0x4(%rbx),%esi
 175 ffff82c4801153bb:<++49 8d 0c 06          <++lea    (%r14,%rax,1),%rcx
 176 ffff82c4801153bf:<++48 8d 05 fa 10 26 00 <++lea    2494714(%rip),%rax        # ffff82c4803764c0 <_heap>                               &nb sp;                
 177 ffff82c4801153c6:<++48 c1 e1 04          <++shl    $0x4,%rcx
 178 ffff82c4801153ca:<++4a 03 0c f8          <++add    (%rax,%r15,8),%rcx
 179 }
 180 static inline void
 181 page_list_del(struct page_info *page, struct page_list_head *head)
 182 {
 183     struct page_info *next = pdx_to_page(page->list.next);
 184 ffff82c4801153ce:<++8b 03                <++mov    (%rbx),%eax
 185 ffff82c4801153d0:<++48 c1 e0 05          <++shl    $0x5,%rax
 186 ffff82c4801153d4:<++48 29 e8 & nbsp;           <++sub    %rbp,%rax
 187 ffff82c4801153d7:<++48 3b 19             <++cmp    (%rcx),%rbx
 188 ffff82c4801153da:<++0f 84 95 01 00 00    <++je     ffff82c480115575 <free_heap_pages+0x405>
 189     struct page_info *prev = pdx_to_page(page->list.prev);
 190 ffff82c4801153e0:<++89 f2                <++mov    %esi,%edx
 191 ffff82c4801153e2:<++48 c1 e2 05          <++shl    $0x5,%rdx
 192 ffff82c4801153e6:<++48 29 ea             <++sub    %rbp,%rdx
 193 ffff 82c4801153e9:<++48 3b 59 08          <++cmp    0x8(%rcx),%rbx
 194 ffff82c4801153ed:<++0f 84 bd 01 00 00    <++je     ffff82c4801155b0 <free_heap_pages+0x440>
 196     if ( !__page_list_del_head(page, head, next, prev) )
 197     {
> Date: Mon, 30 Aug 2010 10:02:05 +0100
> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> From: keir.fraser@xxxxxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> On 30/08/2010 09:47, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> > 3) Every panic pointer to the same address: ffff8315ffffffe4, which is
> > not a valid page address.
> > I printted pages of the domain in assign_pages, which all looks like
> > ffff82f60bd64000, at least
> > ffff82f60 is the same.
> Yes, well you may not be crashing on a supposed page address. Certainly the
> page pointer that relinquish_memory() is working on, and passed to
> put_page->free_domheap_pages is valid enough to not cause any of those
> functions to crash when dereferencin g it. At the moment you really have no
> idea what is causing free_heap_pages() to crash.
> > A bit of lost direction to go further. Thanks.
> You need to find out which line of code in free_heap_pages() is crashing,
> and what variable it is trying to dereference when it crashes. You have a
> nice backtrace with an EIP value, so you can 'objdump -d xen-syms' and
> search for the EIP in the disassembly. If you have a debug build of Xen you
> can even do 'objdump -S xen-syms' and have the disassembly annotated with
> corresponding source lines.
> Have you seen this on more than one physical machine? If not, have you run
> memtest on the offending machine?
> -- Keir
Xen-devel mailing list