WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: L1[0x1fb] = 0000000000000000 which faults on one type of

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: [Xen-devel] Re: L1[0x1fb] = 0000000000000000 which faults on one type of machine but on another works?
From: Gianni Tedesco <gianni.tedesco@xxxxxxxxxx>
Date: Tue, 22 Mar 2011 13:10:31 +0000
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "andrew.thomas@xxxxxxxxxx" <andrew.thomas@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>, "keir.xen@xxxxxxxxx" <keir.xen@xxxxxxxxx>, "swente@xxxxxxxxxxxxx" <swente@xxxxxxxxxxxxx>
Delivery-date: Tue, 22 Mar 2011 06:12:07 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110316221912.GA13035@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20110316221912.GA13035@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Wed, 2011-03-16 at 22:19 +0000, Konrad Rzeszutek Wilk wrote:
> I am troubleshooting an issue where the Linux kernel tries
> to dereference a not present entry. I have a fix for this
> in for-2.6.32/bug-fixes .. but please read on.

I'll give it a shot, I'll try anything at this point ;P

> Specifically it tries to derefence the fixmapped value of
> APIC_BASE. The fixmapped value of APIC_BASE is actually not set
> due to git commit a1d8e2fa8325064338b2da1bcf0d7a0473883c284
> which adds this in arch/x86/kernel/acpi/boot.c:
> 
> static void __init acpi_register_lapic_address(unsigned long address)
>  {
>         /* Xen dom0 doesn't have usable lapics */
>        if (xen_initial_domain())
>              return;
>  
>         mp_lapic_addr = address;
> 
>       set_fixmap_nocache(FIX_APIC_BASE, address);
> 
> Later on we use 'native_apic_read' which tries to use the APIC_BASE as
> address (it is present to be @ slot FIX_APIC_BASE of the fixmap
> API) and it fails (on some machines).
> 
> Since we don't call 'set_fixmap_nocache(FIX_APIC_BASE)' and 
> if one were to go through the pagetable this is what we get:
> 
> 
> [    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
> [    0.000000] mapped APIC to ffffffffff5fb000 (00000000)
> (XEN) d0:v0: unhandled page fault (ec=0000)
> (XEN) Pagetable walk from ffffffffff5fb020:
> (XEN)  L4[0x1ff] = 0000000221003067 0000000000001003
> (XEN)  L3[0x1ff] = 0000000221004067 0000000000001004
> (XEN)  L2[0x1fa] = 0000000221771067 0000000000001771 
> (XEN)  L1[0x1fb] = 0000000000000000 ffffffffffffffff
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.1-110309  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e033:[<ffffffff8102b5d1>]
> (XEN) RFLAGS: 0000000000000292   EM: 1   CONTEXT: pv guest
> (XEN) rax: ffffffff8164cf50   rbx: 000000026ec00000   rcx: 00000000ffffdd85
> (XEN) rdx: 00000000ffffffff   rsi: 0000000000000000   rdi: 0000000000000020
> (XEN) rbp: ffffffff81643ea8   rsp: ffffffff81643e50   r8:  0000000000000002
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: ffff880013671800   r13: 00000000bff66000   r14: ffffffffffffffff
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
> (XEN) cr3: 0000000221001000   cr2: ffffffffff5fb020
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffffffff81643e50:
> 
> Which is to say that the L1 has this:
> 0000000115771fa0:  00000000 00000000 00000000 00000000
> 0000000115771fb0:  00000000 00000000 00000000 00000000
> 0000000115771fc0:  00000000 00000000 15770067 80100001
> 0000000115771fd0:  15770067 80100001 00000000 00000000
> 0000000115771fe0:  00000000 00000000 00000000 00000000
> 0000000115771ff0:  00000000 00000000 00000000 00000000
> 
> L1[0x1fb] is machine address 115771fd8, which has nothing in it.
> 
> OK, so I've come up a fix that is a back-port of how 2.6.38 does it
> which is that it removes the check I mentioned above and in xen_set_fixmap
> we set the FIX_APIC_BASE to actually point to a dummy ioapic_mapping. 
> It is 7cb068cf1ba90425e12f3a7b3caed9d018fa9b8c in for-2.6.32/bug-fixes
> 
> Gianni, you might want to check this out in case it fixes the problem you
> are experiencing.

Not sure, mine happens a lot earlier, sort of just after the very early
memory initialisation. Also we're nowhere near trying to use APIC
anything as an address afaict - just trying to reach the xen info page.

The last thing I see is:
[    0.000000] kernel direct mapping tables up to 2f000000 @ 100000-27a000
[    0.000000] init_memory_mapping: 0000000100000000-00000002a7000000


> But one thing I can't understand is why on one machine (IBM x3850)
> I get this crash, while another one with the same pagetable contents
> (L1 has nothing for 0x1fb) it works just fine? I added a panic and used
> the Xen hypervisor kdb to manually inspect the pagetable, and it has
> the same contents as the IBM x3850 -but it boots fine with this invalid value.
> Any ideas?

A missing TLB flush? heh

> 
> FYI, seems another user (Sven Sübert) IBM x3650 hits the same bug. And with
> this fix he is able to boot.

Very odd, if this isn't the bug I'm seeing it might be tangentially
related.

I'll let you know

Gianni


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel