WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH 4/6] xen: export NUMA topology in physinfo hcall

To: Tristan Gingold <Tristan.Gingold@xxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 4/6] xen: export NUMA topology in physinfo hcall
From: Ryan Harper <ryanh@xxxxxxxxxx>
Date: Tue, 3 Oct 2006 15:37:28 -0500
Cc: Ryan Harper <ryanh@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, xen-ia64-devel <xen-ia64-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 03 Oct 2006 13:39:24 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <200610031144.40820.Tristan.Gingold@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20060929185849.GE12702@xxxxxxxxxx> <200610031144.40820.Tristan.Gingold@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.6+20040907i
* Tristan Gingold <Tristan.Gingold@xxxxxxxx> [2006-10-03 04:40]:
> Le Vendredi 29 Septembre 2006 20:58, Ryan Harper a écrit :
> > This patch modifies the physinfo hcall to export NUMA CPU and Memory
> > topology information.  The new physinfo hcall is integrated into libxc
> > and xend (xm info specifically).  Included in this patch is a minor
> > tweak to xm-test's xm info testcase.  The new fields in xm info are:
> >
> > nr_nodes               : 4
> > mem_chunks             : node0:0x0000000000000000-0x0000000190000000
> >                          node1:0x0000000190000000-0x0000000300000000
> >                          node2:0x0000000300000000-0x0000000470000000
> >                          node3:0x0000000470000000-0x0000000640000000
> > node_to_cpu            : node0:0-7
> >                          node1:8-15
> >                          node2:16-23
> >                          node3:24-31
> Hi,
> 
> I have successfully applied this patch on xen-ia64-unstable.  It requires a 
> small patch to fix issues.

Thanks for giving the patches a test.  

> I have tested it on a 4 node, 24 cpus system.
> 
> I have two suggestions for physinfo hcall:
> * We (Bull) already sell machines with more than 64 cpus (up to 128).  
> Unfortuantly the physinfo interface works with at most 64 cpus.  May I 
> suggest to replace the node_cpu_to maps with a cpu_to_node map ?

That is fine.  It shouldn't be too much trouble to pass up an array of
cpu_to_node and convert to node_to_cpu (I like the brevity of the above
display; based on number of nodes rather than number of cpus).  Does 
that sound reasonable?

> 
> * On ia64 memory can be sparsly populated.  There is no real relation between 
> number of nodes and number of memory chunks.  May I suggest to add a new 
> field (nr_mem_chunks) in physinfo ?  It should be a read/written field: it 
> should return the number of mem chunks at ouput (which can be greather than 
> the input value if the buffer was too small).

Even if it sparsely populated, won't each of the chunks "belong" to a
particular node?  The above list of 4 entries is not hard-coded, but a
result of the behavior of the srat table memory affinity parsing.

The current srat code from Linux x86_64 (specifically,
acpi_numa_memory_affinity_init(), merges each memory entry from
the srat table based on the entries proximity value (a.k.a node
number).  

It will grow the node's memory range either down, or up if the new
entry's start or end is outside the nodes current range:

 if (!node_test_and_set(node, nodes_parsed)) {
        nd->start = start;
        nd->end = end;
 } else {
        if (start < nd->start)
            nd->start = start;
        if (nd->end < end)
            nd->end = end;
 }

The end result will be a mapping of any number of memory chunks to the
number of nodes in the system as each chunk must belong to one node. 

One of the goal for the NUMA patches was to not re-invent this parsing
and data structures all over, but to reuse what is available in Linux.
It may be that the x86_64 srat table parsing in Linux differs from ia64
in Linux.  Is there something that needs fixing here?

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@xxxxxxxxxx

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>