WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Fix hypervisor crash with unpopulated NUMA node

To: Jan Beulich <JBeulich@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Fix hypervisor crash with unpopulated NUMA nodes
From: Andre Przywara <andre.przywara@xxxxxxx>
Date: Wed, 7 Oct 2009 14:13:06 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 07 Oct 2009 05:14:02 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4ACC9D73020000780001879A@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4ACC6346.5080309@xxxxxxx> <4ACC9D73020000780001879A@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.21 (X11/20090329)
Jan Beulich wrote:
Andre Przywara <andre.przywara@xxxxxxx> 07.10.09 11:45 >>>
on NUMA systems with memory-less nodes Xen crashes quite early in the hypervisor (while initializing the heaps). This is not an issue if this happens to be the last node, but "inner" nodes trigger this reliably.
On multi-node processors it is much more likely to leave a node unequipped.
The attached patch fixes this by enumerating the node via the node_online_map instead of counting from 0 to num_nodes.

While I do not see anything wrong with the patch, I still wonder why it
would be needed: It seems to indicate that node_online_map represents
only nodes with memory, but imo should be representing nodes with
memory or processors (leaving aside pure I/O nodes for the moment).
So perhaps there's rather a problem with the setup of node_online_map somewhere?
Yes, because the map creation is callback driven by ACPI code.
The BIOS of my machine is omitting the memory entries for memory-less nodes, so there is no callback triggered for these nodes. Nevertheless Xen uses the SRAT provided node numbers, this creates the hole.
(My setup: 2 + 0 + 2 + 0 GB per node, Xen sees two nodes named 0 and 2).
I agree that should be changed (that is what I meant with "will rework later"), not only because the "lonely" cores will simply be added to another node. But since I will be not in the office for the next two weeks I would like to get this patch applied for the time being.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>