WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Xen dom0 crash: "d0:v0: unhandled page fault (ec=0000)"

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] Xen dom0 crash: "d0:v0: unhandled page fault (ec=0000)"
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Mon, 01 Nov 2010 09:37:44 -0400
Cc: "Alan J. Wylie" <NDA5OWUy@xxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Gianni Tedesco <gianni.tedesco@xxxxxxxxxx>, Stefan Kuhne <stefan.kuhne@xxxxxxx>, sven <ml@xxxxxxxxxxxxx>, Andreas Kinzler <ml-xen-devel@xxxxxx>
Delivery-date: Mon, 01 Nov 2010 06:38:39 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20101029161553.GA27408@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <19629.39326.337589.71778@xxxxxxxxxxx> <1287498599.12843.2111.camel@xxxxxxxxxxxxxxxxxxxxxx> <4CBDB229.3030501@xxxxxxxxxxxxx> <1287503143.12843.2191.camel@xxxxxxxxxxxxxxxxxxxxxx> <4CBE2A43.70200@xxxxxx> <1287564863.12843.4194.camel@xxxxxxxxxxxxxxxxxxxxxx> <1288367063.23619.51.camel@xxxxxxxxxxxxxxxxxxxxxx> <20101029161553.GA27408@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc13 Lightning/1.0b3pre Thunderbird/3.1.4
 On 10/29/2010 12:15 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Oct 29, 2010 at 04:44:23PM +0100, Gianni Tedesco wrote:
>> On Wed, 2010-10-20 at 09:54 +0100, Gianni Tedesco wrote:
>>> On Wed, 2010-10-20 at 00:31 +0100, Andreas Kinzler wrote:
>>>> On 19.10.2010 17:45, Gianni Tedesco wrote:
>>>>> ditto, I suspected a known bug in my gcc version which broke xchg
>>>>> because when I compiled with -O2 instead of -Os... the problem went away
>>>>> but then something else bad happened later (I forget the details, and it
>>>>> was too many config tweaks ago to get back to last time I had it working
>>>>> that well)
>>>> Jeremy, one user earlier reported that he found out that for him there 
>>>> seems to be a relation between kernel size and crash status. He just 
>>>> added/removed some options that could never influence the "crash status" 
>>>> (like adding/removing netfilter modules). With all the experiences here, 
>>>> is may be useful to check for code paths related to kernel size.
>>>>
>>>> Regards Andreas
>> I have dmesg output from 2.6.32.18-ge6b9b2c and the current broken
>> version.
>>
>> http://pastebin.com/3m0DpDdW - 2.6.32.24-gd0054d6-dirty - broken
> Gianni pointed out to me that he spotted this:
>
> [    0.000000] last_pfn = 0x2d0699 max_arch_pfn = 0x400000000
> [    0.000000] x86 PAT enabled: cpu 0, old 0x50100070406, new 0x7010600070106
> [    0.000000] last_pfn = 0x2f000 max_arch_pfn = 0x400000000
>
> I am not sure why "last_pfn" is being printed twice, but it could be
> Gianni test-patch.
>
> It looks as if the initial E820 is created with a max_pfn of
> 0x2d0699, which rougly translates to 8G of memory instead of
> the 752MB.
>
> There were a bunch of changes in arch/x86/xen/setup.c and mmu.c
> code that figures out the max_pfn. Actually, there is one
> (git commit 6c8e75f5e712e596ab138597e65aac426ff03382):
>
>  HYPERVISOR_shared_info->arch.max_pfn = xen_max_p2m_pfn

That sets the extent that the toolstack will look at the P2M for
migration; it has no direct effect on the domain itself.

> Which would set the this to the highest PFN. But that number
> should not have been used by the E820 calculation which uses
> nr_pages entry to clamp the E820. Oh wait, it does not - it actually
> still parses the E820, but marks the area above the nr_pages
> as "XEN EXTRA" (git commit 8d0d6d6d275d4514780ba3d350e57d48e3b5b5e1)
> so they should not figure in the last_pfn calculation and instead
> lay unused. But the 'initial memory mapping' ignores that and
> still tries to setup mapping on _all_ E820_RAM regions, even
> if they are reserved from by the early memory allocator. This would
> imply that the page table is being actually put right in the
> area that is reserved by the early memory allocator.
>
> Hmm, so Gianni, I think if you shortcircuited the setup.c code
> to not parse the E820_RAM regions above the nr_pages that might
> do it. And also try to figure out who or what resets the last_pfn.
>
> Or in the code that sets the 'XEN EXTRA', make it set that region
> of pages as E820_RESERVED and see what happens then.

The way is this is supposed to work is:

   1. Xen gives the domain N pages
   2. There's an E820 which describes M pages (M > N)
   3. The kernel traverses the existing E820 and finds holes and adds
      the memory to a new E820_RAM region beyond M
   4. Set up P2M for pages up to N
   5. When the kernel maps all "RAM", the region from N-M is not
      present, and has no valid P2M mapping; in that case, xen_make_pte
      will return a non-present pte.

The important part of making XEN EXTRA E820_RAM is that the kernel will
allocate page structures for them, even if the pages are absent.  Making
it RESERVED will suppress that and make the exercise pointless.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel