WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Host Numa informtion in dom0

To: Andre Przywara <andre.przywara@xxxxxxx>, Dulloor <dulloor@xxxxxxxxx>
Subject: RE: [Xen-devel] Host Numa informtion in dom0
From: "Kamble, Nitin A" <nitin.a.kamble@xxxxxxxxx>
Date: Mon, 1 Feb 2010 15:21:58 -0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: Keir, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Mon, 01 Feb 2010 15:22:22 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4B674A18.8010106@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <8EA2C2C4116BF44AB370468FBF85A7770123904A29@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <4B66AB88.6090208@xxxxxxx> <940bcfd21002010953y74e43db7h838f5021207bfa8f@xxxxxxxxxxxxxx> <4B674A18.8010106@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acqjhsm4Z/tTfYbcTautWESqNFCP6wADZKVA
Thread-topic: [Xen-devel] Host Numa informtion in dom0
Andre, Dulloor,
  Some of us are also busy cooking guest numa patches for xen. I think we 
should sync up, so that it works well for both. 
  And sockets_per_node can be taken out if it is issue to you. That was added 
to assist the user in specifying the numa topology for the guest. It is not 
strictly required, and can be taken out without any harm.

Thanks & Regards,
Nitin



-----Original Message-----
From: Andre Przywara [mailto:andre.przywara@xxxxxxx] 
Sent: Monday, February 01, 2010 1:40 PM
To: Dulloor
Cc: Kamble, Nitin A; xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser
Subject: Re: [Xen-devel] Host Numa informtion in dom0

Dulloor wrote:
>> Beside that I have to oppose the introduction of sockets_per_node again.
>> Future AMD processors will feature _two_ nodes on _one_ socket, so this
>> variable should hold 1/2, but this will be rounded to zero. I think this
>> information is pretty useless anyway, as the number of sockets is mostly
>> interesting for licensing purposes, where a single number is sufficient.
> 
> I sent a similar patch (was using to enlist pcpu-tuples and in
> vcpu-pin/unpin) and I didn't pursue it because of this same argument.
> When we talk of cpu topology, that's how it is currently :
> nodes-socket-cpu-core. Don't sockets also figure in the cache and
> interconnect hierarchy ?
Not necessarily. Think of Intel's Core2Quad, they have two separate L2 
caches each associated to two of the four cores in one socket. If you 
move from core0 to core2 then AFAIK the cost would be very similar to 
moving to another processor socket. So in fact the term socket does not 
help here.
The situation is similar to the new AMD CPUs, just that it replaces "L2 
cache" with "node" (aka shared memory controller, which also matches 
shared L3 cache). In fact the cost of moving from one node to the 
neighbor in the same socket is exactly the same as moving to another 
socket.
> What would be the hierarchy in those future AMD processors ? Even Keir
> and Ian Pratt initially wanted the pcpu-tuples
> to be listed that way. So, it would be helpful to make a call and move ahead.
You could create variables like cores_per_socket and cores_per_node, 
this would solve this issue for now. Actually better would be an array 
mapping cores (or threads) to {nodes,sockets,L[123]_caches}, as this 
would allow asymmetrical configurations (useful for guests).
In the past there once was a socket_per_node value in physinfo, but it 
has been removed. It was not used anywhere, and multiplying the whole 
chain of x_per_y sometimes ended up in wrong values anyway.
Anyway, if you insist on this value it will hold bogus values for the 
upcoming processors. If it will be zero, you end up in trouble when 
multiplying or dividing with it, and letting it be one is also wrong.
I am sorry to spoil this whole game, but that it's how it is.

If you or Nitin show me how the socket_per_node variable should be used, 
we can maybe find a pleasing solution.

Regards,
Andre.
> 
> On Mon, Feb 1, 2010 at 5:23 AM, Andre Przywara <andre.przywara@xxxxxxx> wrote:
>> Kamble, Nitin A wrote:
>>> Hi Keir,
>>>
>>>   Attached is the patch which exposes the host numa information to dom0.
>>> With the patch "xm info" command now also gives the cpu topology & host numa
>>> information. This will be later used to build guest numa support.
>> What information are you missing from the current physinfo? As far as I can
>> see, only the total amount of memory per node is not provided. But one could
>> get this info from parsing the SRAT table in Dom0, which is at least mapped
>> into Dom0's memory.
>> Or do you want to provide NUMA information to all PV guests (but then it
>> cannot be a sysctl)? This would be helpful, as this would avoid to enable
>> ACPI parsing in PV Linux for NUMA guest support.
>>
>> Beside that I have to oppose the introduction of sockets_per_node again.
>> Future AMD processors will feature _two_ nodes on _one_ socket, so this
>> variable should hold 1/2, but this will be rounded to zero. I think this
>> information is pretty useless anyway, as the number of sockets is mostly
>> interesting for licensing purposes, where a single number is sufficient.
>>  For scheduling purposes cache topology is more important.
>>
>> My NUMA guest patches (currently for HVM only) are doing fine, I will try to
>> send out a RFC patches this week. I think they don't interfere with this
>> patch, but if you have other patches in development, we should sync on this.
>> The scope of my patches is to let the user (or xend) describe a guest's
>>  topology (either by specifying only the number of guest nodes in the config
>> file or by explicitly describing the whole NUMA topology). Some code will
>> assign host nodes to the guest nodes (I am not sure yet whether this really
>> belongs into xend as it currently does, or is better done in libxc, where
>> libxenlight would also benefit).
>> Then libxc's hvm_build_* will pass that info into the hvm_info_table, where
>> code in the hvmloader will generate an appropriate SRAT table.
>> An extension of this would be to let Xen automatically decide whether a
>> split of the resources is necessary (because there is not enough memory
>> available (anymore) on one node).
>>
>> Looking forward to comments...
>>
>> Regards,
>> Andre.
>>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>