This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

To: M A Young <m.a.young@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Fri, 7 Jan 2011 16:23:59 -0500
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 07 Jan 2011 13:26:12 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <alpine.LFD.2.02.1101072034080.9613@xxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <alpine.LFD.2.02.1101072034080.9613@xxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.20 (2009-06-14)
On Fri, Jan 07, 2011 at 08:34:43PM +0000, M A Young wrote:
> On Fri, 7 Jan 2011, Konrad Rzeszutek Wilk wrote:
> >>BUG unable to handle kernel NULL pointer dereference at
> >>IP: [<ffffffff81b69b92>] setup_node_bootmem+0x16b/0x199
> >Hmmm, I did see something similar to this in 2.6.37-rc1, but we fixed
> >that quickly. It was triggered by having 4GB of memory or so and
> >the work-around was to use dom0_mem=max:2GB.
> >
> >Can you send the photo? Maybe the calleer stack will shed some light.
> Here are two photos of the output at different times. The context is
>    0xffffffff81b69b6d <setup_node_bootmem+326>:
>     callq  0xffffffff81475ec9 <printk>
>    0xffffffff81b69b72 <setup_node_bootmem+331>:       movslq %ebx,%rdx
>    0xffffffff81b69b75 <setup_node_bootmem+334>:       xor    %eax,%eax
>    0xffffffff81b69b77 <setup_node_bootmem+336>:       mov    $0x4fc0,%ecx
>    0xffffffff81b69b7c <setup_node_bootmem+341>:
>     mov    -0x7e4cb750(,%rdx,8),%rsi
>    0xffffffff81b69b84 <setup_node_bootmem+349>:       shr    $0xc,%r13
>    0xffffffff81b69b88 <setup_node_bootmem+353>:       shr    $0xc,%r12
>    0xffffffff81b69b8c <setup_node_bootmem+357>:       sub    %r13,%r12
>    0xffffffff81b69b8f <setup_node_bootmem+360>:       mov    %rsi,%rdi
>    0xffffffff81b69b92 <setup_node_bootmem+363>:       rep stos %eax,%es:(%rdi)

That looks like:

        memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));

>From the photo, %eax is zero, and this is perfect code for copying values in.

>    0xffffffff81b69b94 <setup_node_bootmem+365>:       mov    %ebx,%edi
>    0xffffffff81b69b96 <setup_node_bootmem+367>:
>     mov    -0x7e4cb750(,%rdx,8),%rax
> which is somewhere around line 224 in arch/x86/mm/numa_64.c
>         if (nid != nodeid)
>                 printk(KERN_INFO "    NODE_DATA(%d) on node %d\n",
> nodeid, nid);

Can you make sure that 419db274bed4269f475a8e78cbe9c917192cfe8b is in? That
is the patch that fixed this issue last time.

However .. the more I look at the code the less it seems to be that and
that is the last fix in that file.

Do you see any messages about 'Cannot find 20 bytes in node X' (where X
I think is 0)?

Xen-devel mailing list