WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Xen dom0 crash: "d0:v0: unhandled page fault (ec=0000)"

On Fri, Oct 29, 2010 at 04:44:23PM +0100, Gianni Tedesco wrote:
> On Wed, 2010-10-20 at 09:54 +0100, Gianni Tedesco wrote:
> > On Wed, 2010-10-20 at 00:31 +0100, Andreas Kinzler wrote:
> > > On 19.10.2010 17:45, Gianni Tedesco wrote:
> > > > ditto, I suspected a known bug in my gcc version which broke xchg
> > > > because when I compiled with -O2 instead of -Os... the problem went away
> > > > but then something else bad happened later (I forget the details, and it
> > > > was too many config tweaks ago to get back to last time I had it working
> > > > that well)
> > > 
> > > Jeremy, one user earlier reported that he found out that for him there 
> > > seems to be a relation between kernel size and crash status. He just 
> > > added/removed some options that could never influence the "crash status" 
> > > (like adding/removing netfilter modules). With all the experiences here, 
> > > is may be useful to check for code paths related to kernel size.
> > > 
> > > Regards Andreas
> 
> I have dmesg output from 2.6.32.18-ge6b9b2c and the current broken
> version.
> 
> http://pastebin.com/3m0DpDdW - 2.6.32.24-gd0054d6-dirty - broken

Gianni pointed out to me that he spotted this:

[    0.000000] last_pfn = 0x2d0699 max_arch_pfn = 0x400000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x50100070406, new 0x7010600070106
[    0.000000] last_pfn = 0x2f000 max_arch_pfn = 0x400000000

I am not sure why "last_pfn" is being printed twice, but it could be
Gianni test-patch.

It looks as if the initial E820 is created with a max_pfn of
0x2d0699, which rougly translates to 8G of memory instead of
the 752MB.

There were a bunch of changes in arch/x86/xen/setup.c and mmu.c
code that figures out the max_pfn. Actually, there is one
(git commit 6c8e75f5e712e596ab138597e65aac426ff03382):

 HYPERVISOR_shared_info->arch.max_pfn = xen_max_p2m_pfn

Which would set the this to the highest PFN. But that number
should not have been used by the E820 calculation which uses
nr_pages entry to clamp the E820. Oh wait, it does not - it actually
still parses the E820, but marks the area above the nr_pages
as "XEN EXTRA" (git commit 8d0d6d6d275d4514780ba3d350e57d48e3b5b5e1)
so they should not figure in the last_pfn calculation and instead
lay unused. But the 'initial memory mapping' ignores that and
still tries to setup mapping on _all_ E820_RAM regions, even
if they are reserved from by the early memory allocator. This would
imply that the page table is being actually put right in the
area that is reserved by the early memory allocator.

Hmm, so Gianni, I think if you shortcircuited the setup.c code
to not parse the E820_RAM regions above the nr_pages that might
do it. And also try to figure out who or what resets the last_pfn.

Or in the code that sets the 'XEN EXTRA', make it set that region
of pages as E820_RESERVED and see what happens then.

> http://pastebin.com/yC4CzLaZ - 2.6.32.18-ge6b9b2c - working
> 
> In the panic you can see that %rax contains what the kernel thinks is
> xen_start_info (0xffffffff817d3000) and from the xen logs, that is
> correct.
> 
> >From the kernel messages it looks like the e820 map is being calculated
> incorrectly, leading to bad memory mappings being setup which somehow
> end up nuking the pagetable entry for xen start info. The
> last_pfn/max_arch_pfn end up being an order of magnitude larger even
> though both are booted with identical xen and kernel commandline params.
> Xen has dom0_mem=752M.
> 
> Gianni
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel