WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present

To: Ian Campbell <ijc@xxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Tue, 23 Nov 2010 10:24:01 -0800
Cc: Vincent CARON <zerodeux@xxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Cris Daniluk <cris.daniluk@xxxxxxxxx>, 603632@xxxxxxxxxxxxxxx
Delivery-date: Tue, 23 Nov 2010 10:25:03 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1290513067.31507.7699.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20101115233253.11935.35707.reportbug@zerohal> <1290513067.31507.7699.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.1.6-1.fc13 Lightning/1.0b3pre Thunderbird/3.1.6
On 11/23/2010 03:51 AM, Ian Campbell wrote:
> I'm not sure but looking at the complete bootlog it looks as if the
> system may only have node==1 i.e. no 0 node which could plausibly lead
> to this sort of issue:
>         [    0.000000] Bootmem setup node 1 0000000000000000-0000000040000000
>         [    0.000000]   NODE_DATA [0000000000008000 - 000000000000ffff]
>         [    0.000000]   bootmap [0000000000010000 -  0000000000017fff] pages 
> 8
>         [    0.000000] (8 early reservations) ==> bootmem [0000000000 - 
> 0040000000]
>         [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> 
> [0000000000 - 0000001000]
>         [    0.000000]   #1 [0003446000 - 0003465000]   XEN PAGETABLES ==> 
> [0003446000 - 0003465000]
>         [    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==> 
> [0000006000 - 0000008000]
>         [    0.000000]   #3 [0001000000 - 0001694994]    TEXT DATA BSS ==> 
> [0001000000 - 0001694994]
>         [    0.000000]   #4 [00016b5000 - 0003244e00]          RAMDISK ==> 
> [00016b5000 - 0003244e00]
>         [    0.000000]   #5 [0003245000 - 0003446000]   XEN START INFO ==> 
> [0003245000 - 0003446000]
>         [    0.000000]   #6 [0001695000 - 000169532d]              BRK ==> 
> [0001695000 - 000169532d]
>         [    0.000000]   #7 [0000100000 - 00002e0000]          PGTABLE ==> 
> [0000100000 - 00002e0000]
>         [    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
>         [    0.000000] Zone PFN ranges:
>         [    0.000000]   DMA      0x00000000 -> 0x00001000
>         [    0.000000]   DMA32    0x00001000 -> 0x00100000
>         [    0.000000]   Normal   0x00100000 -> 0x00100000
>         [    0.000000] Movable zone start PFN for each node
>         [    0.000000] early_node_map[2] active PFN ranges
>         [    0.000000]     1: 0x00000000 -> 0x000000a0
>         [    0.000000]     1: 0x00000100 -> 0x00040000
>         [    0.000000] On node 1 totalpages: 262048
>         [    0.000000]   DMA zone: 56 pages used for memmap
>         [    0.000000]   DMA zone: 483 pages reserved
>         [    0.000000]   DMA zone: 3461 pages, LIFO batch:0
>         [    0.000000]   DMA32 zone: 3528 pages used for memmap
>         [    0.000000]   DMA32 zone: 254520 pages, LIFO batch:31
>
> Perhaps we should be passing numa_node_id() (e.g. current node) instead
> of node 0? There doesn't seem to be another obvious alternative to
> passing in an explicit node number to this callchain (some places cope
> with -1 but not this path AFAICT).

Does booting native get the same configuration?

> It's also not obvious if dom0 should be seeing the tables which describe
> the hosts nodes anyway or if we should be clobbering something. Given
> that dom0 sees a pseudo-physical address map I'm not convinced seeing
> the real SRAT is in any way beneficial. Perhaps we should simply be
> clobbering NUMAness until actual PV understanding of NUMA is ready?

Yes, the host SRAT is meaningless in the domain and we really should
ignore it.  I'm not sure what happens if you boot on a really NUMA system.

> One thing I notice when googling R410 issues is that they apparently
> have a "Cores per CPU" BIOS option which might be worth playing with,
> since configuring a reduced number of cores might remove node 0 but not
> node 1 (odd but not invalid?). Presumably it is also worth making sure
> you have the latest BIOS etc.

Also, what's the DIMM configuration?  Are the slots fully populated?


    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>