[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/6] xen: export NUMA topology in physinfo hcall

* Tristan Gingold <Tristan.Gingold@xxxxxxxx> [2006-10-03 04:40]:
> Le Vendredi 29 Septembre 2006 20:58, Ryan Harper a écrit :
> > This patch modifies the physinfo hcall to export NUMA CPU and Memory
> > topology information.  The new physinfo hcall is integrated into libxc
> > and xend (xm info specifically).  Included in this patch is a minor
> > tweak to xm-test's xm info testcase.  The new fields in xm info are:
> >
> > nr_nodes               : 4
> > mem_chunks             : node0:0x0000000000000000-0x0000000190000000
> >                          node1:0x0000000190000000-0x0000000300000000
> >                          node2:0x0000000300000000-0x0000000470000000
> >                          node3:0x0000000470000000-0x0000000640000000
> > node_to_cpu            : node0:0-7
> >                          node1:8-15
> >                          node2:16-23
> >                          node3:24-31
> Hi,
> I have successfully applied this patch on xen-ia64-unstable.  It requires a 
> small patch to fix issues.

Thanks for giving the patches a test.  

> I have tested it on a 4 node, 24 cpus system.
> I have two suggestions for physinfo hcall:
> * We (Bull) already sell machines with more than 64 cpus (up to 128).  
> Unfortuantly the physinfo interface works with at most 64 cpus.  May I 
> suggest to replace the node_cpu_to maps with a cpu_to_node map ?

That is fine.  It shouldn't be too much trouble to pass up an array of
cpu_to_node and convert to node_to_cpu (I like the brevity of the above
display; based on number of nodes rather than number of cpus).  Does 
that sound reasonable?

> * On ia64 memory can be sparsly populated.  There is no real relation between 
> number of nodes and number of memory chunks.  May I suggest to add a new 
> field (nr_mem_chunks) in physinfo ?  It should be a read/written field: it 
> should return the number of mem chunks at ouput (which can be greather than 
> the input value if the buffer was too small).

Even if it sparsely populated, won't each of the chunks "belong" to a
particular node?  The above list of 4 entries is not hard-coded, but a
result of the behavior of the srat table memory affinity parsing.

The current srat code from Linux x86_64 (specifically,
acpi_numa_memory_affinity_init(), merges each memory entry from
the srat table based on the entries proximity value (a.k.a node

It will grow the node's memory range either down, or up if the new
entry's start or end is outside the nodes current range:

 if (!node_test_and_set(node, nodes_parsed)) {
        nd->start = start;
        nd->end = end;
 } else {
        if (start < nd->start)
            nd->start = start;
        if (nd->end < end)
            nd->end = end;

The end result will be a mapping of any number of memory chunks to the
number of nodes in the system as each chunk must belong to one node. 

One of the goal for the NUMA patches was to not re-invent this parsing
and data structures all over, but to reuse what is available in Linux.
It may be that the x86_64 srat table parsing in Linux differs from ia64
in Linux.  Is there something that needs fixing here?

Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.