WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Debian linux-image-2.6.32-4-xen-amd64 2.6.32-11 doesn't

To: Thomas Schwinge <thomas@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Debian linux-image-2.6.32-4-xen-amd64 2.6.32-11 doesn't boot with > 4 GiB; resets immediatelly, no log messages
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Fri, 09 Apr 2010 11:20:52 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Ian Campbell <ijc@xxxxxxxxxxxxxx>
Delivery-date: Fri, 09 Apr 2010 11:21:54 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100409180016.GA14029@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20100408113422.GD4183@xxxxxxxxxxxxxxxxxxxxxxxxxx> <20100408133820.GA29832@xxxxxxxxxxxxxxxxxxx> <20100408221953.GG4183@xxxxxxxxxxxxxxxxxxxxxxxxxx> <4BBE5DF2.6040707@xxxxxxxx> <20100409180016.GA14029@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.4
On 04/09/2010 11:00 AM, Thomas Schwinge wrote:
> Before we get to the backtrace, one further detail: this kernel *does*
> boot if one of the following has happened before: the BIOS memchecker has
> run, memtest86+ has run, some other kernel has run (though it doesn't
> always boot in this latter case).  Thus, I wildly guess that some
> uninitialized data structure (in memory) is dereferenced -- that happens
> to be in a sane state after memtest86+ et al.
>   

OK, I think I see what's happening here...

>     $ for ip in ffffffff814f6d88 ffffffff81433e38 ffffffff814f6d3d 
> ffffffff81433e60 ffffffff815a73ac ffffffff81433f98 ffffffff814f6f85 
> ffffffff8152b2d0 ffffffff814f95fb ffffffff814f8249 ffffffff813f3f5f 
> ffffffff813b4119 ffffffff81433f90 ffffffff811ff14f ffffffff8100e361 
> ffffffff8100e343 ffffffff813b4119 ffffffff813f3f5f ffffffff8152a7b0 
> ffffffff814f49d0 ffffffff81001000 ffffffff814f6aca; do echo "* $ip:" && 
> addr2line -fie debian/build/build_amd64_xen_amd64/vmlinux "$ip"; done > 
> ~/shared/tmp/tmp
>     * ffffffff814f6d88:
>     xen_release_chunk
>   

This is the code which goes through the gaps between the E820 table
entries looking for pages which Xen has assigned the kernel, but the
kernel can't use (because they're not covered by E820).  It does this with:

        for(pfn = start; pfn < end; pfn++) {
                unsigned long mfn = pfn_to_mfn(pfn);

                /* Make sure pfn exists to start with */
                if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn)
                        continue;
                ...


So in theory we're poking at the p2m and m2p tables for random pages
which may or may not be valid.  So if we do a pfn_to_mfn on a pfn which
is within the range of valid pfns, but not actually a valid pfn for our
domain, then the resulting mfn is undefined (and may depend on random
memory contents, which is why it is affected by what you've previously
booted).

We then pass that mfn back to mfn_to_pfn to see if it really does belong
to us (because it will return the same pfn back).  But it could be
random garbage, which mfn_to_pfn uses to index an array.

Normally that would be OK, because it uses:

        __get_user(pfn, &machine_to_phys_mapping[mfn]);

to dereference the array.  But at this early stage, none of the kernel's
exception handlers have been set up, so this will just fault into Xen.

It would be interesting to confirm this by building your kernel with
CONFIG_DEBUG_INFO=y in the .config, and verify that the faulting
instruction is actually this line.

Thanks,
    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>