WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
From: Vincent Caron <vcaron@xxxxxxxxxxxxx>
Date: Thu, 25 Nov 2010 13:51:57 +0100
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Fraser <keir@xxxxxxx>, Vincent CARON <zerodeux@xxxxxxxxxxxx>, Cris Daniluk <cris.daniluk@xxxxxxxxx>, 603632@xxxxxxxxxxxxxxx, Ian Campbell <ijc@xxxxxxxxxxxxxx>, Keir
Delivery-date: Fri, 26 Nov 2010 03:22:07 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4CEC06C1.5010500@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Bearstech
References: <20101115233253.11935.35707.reportbug@zerohal> <1290513067.31507.7699.camel@xxxxxxxxxxxxxxxxxxxxxx> <4CEC06C1.5010500@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Tue, 2010-11-23 at 10:24 -0800, Jeremy Fitzhardinge wrote:
> On 11/23/2010 03:51 AM, Ian Campbell wrote:
> > I'm not sure but looking at the complete bootlog it looks as if the
> > system may only have node==1 i.e. no 0 node which could plausibly lead
> > to this sort of issue:
> >         [    0.000000] Bootmem setup node 1 
> > 0000000000000000-0000000040000000
> >         [    0.000000]   NODE_DATA [0000000000008000 - 000000000000ffff]
> >         [    0.000000]   bootmap [0000000000010000 -  0000000000017fff] 
> > pages 8
> >         [    0.000000] (8 early reservations) ==> bootmem [0000000000 - 
> > 0040000000]
> >         [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> 
> > [0000000000 - 0000001000]
> >         [    0.000000]   #1 [0003446000 - 0003465000]   XEN PAGETABLES ==> 
> > [0003446000 - 0003465000]
> >         [    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==> 
> > [0000006000 - 0000008000]
> >         [    0.000000]   #3 [0001000000 - 0001694994]    TEXT DATA BSS ==> 
> > [0001000000 - 0001694994]
> >         [    0.000000]   #4 [00016b5000 - 0003244e00]          RAMDISK ==> 
> > [00016b5000 - 0003244e00]
> >         [    0.000000]   #5 [0003245000 - 0003446000]   XEN START INFO ==> 
> > [0003245000 - 0003446000]
> >         [    0.000000]   #6 [0001695000 - 000169532d]              BRK ==> 
> > [0001695000 - 000169532d]
> >         [    0.000000]   #7 [0000100000 - 00002e0000]          PGTABLE ==> 
> > [0000100000 - 00002e0000]
> >         [    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
> >         [    0.000000] Zone PFN ranges:
> >         [    0.000000]   DMA      0x00000000 -> 0x00001000
> >         [    0.000000]   DMA32    0x00001000 -> 0x00100000
> >         [    0.000000]   Normal   0x00100000 -> 0x00100000
> >         [    0.000000] Movable zone start PFN for each node
> >         [    0.000000] early_node_map[2] active PFN ranges
> >         [    0.000000]     1: 0x00000000 -> 0x000000a0
> >         [    0.000000]     1: 0x00000100 -> 0x00040000
> >         [    0.000000] On node 1 totalpages: 262048
> >         [    0.000000]   DMA zone: 56 pages used for memmap
> >         [    0.000000]   DMA zone: 483 pages reserved
> >         [    0.000000]   DMA zone: 3461 pages, LIFO batch:0
> >         [    0.000000]   DMA32 zone: 3528 pages used for memmap
> >         [    0.000000]   DMA32 zone: 254520 pages, LIFO batch:31
> >
> > Perhaps we should be passing numa_node_id() (e.g. current node) instead
> > of node 0? There doesn't seem to be another obvious alternative to
> > passing in an explicit node number to this callchain (some places cope
> > with -1 but not this path AFAICT).
> 
> Does booting native get the same configuration?

  Booting native with the same Xen-enabled kernel gives:

[    0.000000] Bootmem setup node 0 0000000130000000-0000000230000000
[    0.000000]   NODE_DATA [0000000130000000 - 0000000130007fff]
[    0.000000]   bootmap [0000000130008000 -  0000000130027fff] pages 20
[    0.000000] (8 early reservations) ==> bootmem [0130000000 -
0230000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE
[    0.000000]   #2 [0001000000 - 0001694994]    TEXT DATA BSS
[    0.000000]   #3 [0037656000 - 0037fefb18]          RAMDISK
[    0.000000]   #4 [000009ec00 - 0000100000]    BIOS reserved
[    0.000000]   #5 [0001695000 - 000169532d]              BRK
[    0.000000]   #6 [0000008000 - 000000c000]          PGTABLE
[    0.000000]   #7 [000000c000 - 0000011000]          PGTABLE
[    0.000000] Bootmem setup node 1 0000000000000000-0000000130000000
[    0.000000]   NODE_DATA [0000000000011000 - 0000000000018fff]
[    0.000000]   bootmap [0000000000019000 -  000000000003efff] pages 26
[    0.000000] (8 early reservations) ==> bootmem [0000000000 -
0130000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==>
[0000000000 - 0000001000]
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==>
[0000006000 - 0000008000]
[    0.000000]   #2 [0001000000 - 0001694994]    TEXT DATA BSS ==>
[0001000000 - 0001694994]
[    0.000000]   #3 [0037656000 - 0037fefb18]          RAMDISK ==>
[0037656000 - 0037fefb18]
[    0.000000]   #4 [000009ec00 - 0000100000]    BIOS reserved ==>
[000009ec00 - 0000100000]
[    0.000000]   #5 [0001695000 - 000169532d]              BRK ==>
[0001695000 - 000169532d]
[    0.000000]   #6 [0000008000 - 000000c000]          PGTABLE ==>
[0000008000 - 000000c000]
[    0.000000]   #7 [000000c000 - 0000011000]          PGTABLE ==>
[000000c000 - 0000011000]
[    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
[    0.000000] [ffffea0004280000-ffffea00043fffff] potential offnode
page_structs
[    0.000000]  [ffffea0000000000-ffffea00043fffff] PMD ->
[ffff880001800000-ffff8800051fffff] on node 1
[    0.000000]  [ffffea0004400000-ffffea0007bfffff] PMD ->
[ffff880130200000-ffff8801339fffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00230000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[4] active PFN ranges
[    0.000000]     1: 0x00000000 -> 0x000000a0
[    0.000000]     1: 0x00000100 -> 0x000cf679
[    0.000000]     1: 0x00100000 -> 0x00130000
[    0.000000]     0: 0x00130000 -> 0x00230000
[    0.000000] On node 0 totalpages: 1048576
[    0.000000]   Normal zone: 14336 pages used for memmap
[    0.000000]   Normal zone: 1034240 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 1046041
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 109 pages reserved
[    0.000000]   DMA zone: 3835 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 14280 pages used for memmap
[    0.000000]   DMA32 zone: 831153 pages, LIFO batch:31
[    0.000000]   Normal zone: 2688 pages used for memmap
[    0.000000]   Normal zone: 193920 pages, LIFO batch:31


> > It's also not obvious if dom0 should be seeing the tables which describe
> > the hosts nodes anyway or if we should be clobbering something. Given
> > that dom0 sees a pseudo-physical address map I'm not convinced seeing
> > the real SRAT is in any way beneficial. Perhaps we should simply be
> > clobbering NUMAness until actual PV understanding of NUMA is ready?
> 
> Yes, the host SRAT is meaningless in the domain and we really should
> ignore it.  I'm not sure what happens if you boot on a really NUMA system.
> 
> > One thing I notice when googling R410 issues is that they apparently
> > have a "Cores per CPU" BIOS option which might be worth playing with,
> > since configuring a reduced number of cores might remove node 0 but not
> > node 1 (odd but not invalid?). Presumably it is also worth making sure
> > you have the latest BIOS etc.
> 
> Also, what's the DIMM configuration?  Are the slots fully populated?

  8 slots, 4 populated; slots #0, #1, #4 and #5 populated with 2GiB
dimms (according to lshw, setup by Dell).

  I switched off hyperthreading in the BIOS settings (default is 'on'),
I had issues with Xen 3.2 on this topic (related to floating vcpus,
which I had to pin to fix random crashes). Also I don't think HT is
significant for my usage. I'm used to see strange bugs as soon as I
tweak Dell BIOSes, so I thought I'd mention that.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>