[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/6] xen: export NUMA topology in physinfo hcall



Le Mardi 03 Octobre 2006 22:37, Ryan Harper a écrit :
> * Tristan Gingold <Tristan.Gingold@xxxxxxxx> [2006-10-03 04:40]:
> > Le Vendredi 29 Septembre 2006 20:58, Ryan Harper a écrit :
> > > This patch modifies the physinfo hcall to export NUMA CPU and Memory
> > > topology information.  The new physinfo hcall is integrated into libxc
> > > and xend (xm info specifically).  Included in this patch is a minor
> > > tweak to xm-test's xm info testcase.  The new fields in xm info are:
> > >
> > > nr_nodes               : 4
> > > mem_chunks             : node0:0x0000000000000000-0x0000000190000000
> > >                          node1:0x0000000190000000-0x0000000300000000
> > >                          node2:0x0000000300000000-0x0000000470000000
> > >                          node3:0x0000000470000000-0x0000000640000000
> > > node_to_cpu            : node0:0-7
> > >                          node1:8-15
> > >                          node2:16-23
> > >                          node3:24-31
> >
> > Hi,
> >
> > I have successfully applied this patch on xen-ia64-unstable.  It requires
> > a small patch to fix issues.
>
> Thanks for giving the patches a test.
>
> > I have tested it on a 4 node, 24 cpus system.
> >
> > I have two suggestions for physinfo hcall:
> > * We (Bull) already sell machines with more than 64 cpus (up to 128).
> > Unfortuantly the physinfo interface works with at most 64 cpus.  May I
> > suggest to replace the node_cpu_to maps with a cpu_to_node map ?
>
> That is fine.  It shouldn't be too much trouble to pass up an array of
> cpu_to_node and convert to node_to_cpu (I like the brevity of the above
> display; based on number of nodes rather than number of cpus).  Does
> that sound reasonable?
I like the current display, and yes it sounds reasonable.

> > * On ia64 memory can be sparsly populated.  There is no real relation
> > between number of nodes and number of memory chunks.  May I suggest to
> > add a new field (nr_mem_chunks) in physinfo ?  It should be a
> > read/written field: it should return the number of mem chunks at ouput
> > (which can be greather than the input value if the buffer was too small).
>
> Even if it sparsely populated, won't each of the chunks "belong" to a
> particular node?  The above list of 4 entries is not hard-coded, but a
> result of the behavior of the srat table memory affinity parsing.
>
> The current srat code from Linux x86_64 (specifically,
> acpi_numa_memory_affinity_init(), merges each memory entry from
> the srat table based on the entries proximity value (a.k.a node
> number).
>
> It will grow the node's memory range either down, or up if the new
> entry's start or end is outside the nodes current range:
>
>  if (!node_test_and_set(node, nodes_parsed)) {
>         nd->start = start;
>         nd->end = end;
>  } else {
>         if (start < nd->start)
>             nd->start = start;
>         if (nd->end < end)
>             nd->end = end;
>  }
>
> The end result will be a mapping of any number of memory chunks to the
> number of nodes in the system as each chunk must belong to one node.
>
> One of the goal for the NUMA patches was to not re-invent this parsing
> and data structures all over, but to reuse what is available in Linux.
> It may be that the x86_64 srat table parsing in Linux differs from ia64
> in Linux.  Is there something that needs fixing here?
On ia64 we have reused ia64 code from linux.  Therefore we don't share all the 
srat parsing code.

I know on my 4 nodes system there are 5 srat entries.  I have to check if the 
entries can be merged.
Stay tune!

Tristan.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.