Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT

To:	MaoXiaoyun <tinnycloud@xxxxxxxxxxx>, "jbeulich@xxxxxxxxxx" <jbeulich@xxxxxxxxxx>
Subject:	Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
From:	Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date:	Wed, 1 Sep 2010 08:40:55 +0100
Cc:	xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Wed, 01 Sep 2010 00:43:01 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<BAY121-W20E047CEA1476A16D2A9F7DA8B0@xxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	ActJpbtnuYn2ey8fRXKMXrH9ZMdLkAAA0cRA
Thread-topic:	[Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
User-agent:	Microsoft-Entourage/12.26.0.100708

On 01/09/2010 08:17, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:

> As I go through the chunk merge code in free_heap_pages, one thing I'd like
> to mention is, previously, I printted out all domain pages when allocated,
> and I found the order in assgin_pages in
> /xen-4.0.0/xen/common/page_alloc.c:1087,
> the order either be 0, or 9, and later I know that is because domain U
> populate physmap
> 2M Bytes everytime.
>  
> And here in the while statement,  the order is compare with MAX_ORDER, which
> is 20.
> I wonder if it might have some clues.

Xen's buddy allocator merges pairs of adjacent free chunks up to a maximum
size of 2**20 pages. That merging needs to be careful it doesn't merge off
the end of RAM. I'm just guessing that maybe there's an issue with that on
your fairly large memory system.

 -- Keir

> Thanks.
> -------------------------------
>  531 
>  532     /* Merge chunks as far as possible. */
>  533     while ( order < MAX_ORDER )
>  534     {
>  535         mask = 1UL << order;
>  
>> Date: Tue, 31 Aug 2010 18:03:41 +0100
>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
>> From: keir.fraser@xxxxxxxxxxxxx
>> To: JBeulich@xxxxxxxxxx
>> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>> 
>> On 31/08/2010 17:35, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote:
>> 
>>>> That's somewhat implicit: srat_parse_regions() gets passed an
>>>> address that is at least BOOTSTRAP_DIRECTMAP_END (i.e. 4G).
>>>> Thus srat_parse_regions() starts off with a mask with the lower
>>>> 32 bits all set (only more bits can get set subsequently). Thus
>>>> the earliest zero bit pfn_pdx_hole_setup() can find is bit 20
>>>> (due to the >> PAGE_SHIFT in the invocation). Consequently
>>>> the smallest chunk where arithmetic is valid really is 4Gb, not
>>>> 256Mb as I first wrote.
>>> 
>>> Well, that's a bit too implicit for me. How about we initialise 'j' to
>>> MAX_ORDER in pfn_pdx_hole_setup() with a comment about supporting page_info
>>> pointer arithmetic within allocatable multi-page regions?
>> 
>> Well I agree with your logic anyway. So I don't see that this can be the
>> cause of MaoXiaoyun's bug. At least not directly. But then I'm stumped as to
>> why the page arithmetic and checks in free_heap_pages are (apparently)
>> resulting in a page pointer way outside the frame-table region and actually
>> in the directmap region.
>> 
>> I think an obvious next step wpuld be to get your boot output, MaoXiaoyun.
>> Can you please post it? And you may as well stop your memtest if you haven't
>> already. If you've seen the issue on more than one machine then it certainly
>> isn't due to that kind of hardware failure.
>> 
>> -- Keir
>> 
>> 
>        



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT