WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] RE: [RFC] NUMA support

To: "Andre Przywara" <andre.przywara@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] RE: [RFC] NUMA support
From: "Duan, Ronghui" <ronghui.duan@xxxxxxxxx>
Date: Sat, 24 Nov 2007 23:57:31 +0800
Cc: "Xu, Anthony" <anthony.xu@xxxxxxxxx>
Delivery-date: Sat, 24 Nov 2007 07:58:09 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <4746E24C.9010403@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acgt3G3JfGm5ZGhgRCGBOLd14Zbw0wAyrjvw
Thread-topic: [RFC] NUMA support
Hi all,
>thanks Ronghui for your patches and ideas. To make a more structured
>approach to a better NUMA support, I suggest to concentrate on
>one-node-guests first:

That is exactly what we want do at first, don't support guest Numa.

>* introduce CPU affinity to memory allocation routines called from
Dom0.
>This is basically my patch 2/4 from August. We should think about using
>a NUMA node number instead of a physical CPU, is there something to be
>said against this?

I think it is reasonable to bind guest with node not CPU.

>* find _some_ method of load balancing when creating guests. The method
>1 from Ronghui is a start, but a real decision based on each node's
>utilization (or free memory) would be more reasonable.

Yes, it is only a start for balancing.

>* patch the guest memory allocation routines to allocate memory from
>that specific node only (based on my patch 3/4)
Considering the performance, we should do it.

>* use live migration to local host to allow node migration. Assuming
>that localhost live migration works reliably (is that really true?) it
>shouldn't be too hard to implement this (basically just using node
>affinity while allocating guest memory). Since this is a rather
>expensive operation (takes twice the memory temporarily and quite some
>time), I'd suggest to trigger that explicitly from the admin via a xm
>command, maybe as an addition to migrate:
># xm migrate --live --node 1 <domid> localhost
>There could be some Dom0 daemon based re-balancer to do this somewhat
>automatically later on.
>
>I would take care of the memory allocation patch and would look into
>node migration. It would be great if Roughui or Anthony would help to
>improve the "load balancing" algorithm.

I have no idea on this now.

>Meanwhile I will continue to patch that d*** Linux kernel to accept
both
>CONFIG_NUMA and CONFIG_XEN without crashing that early ;-), this should
>allow both HVM and PV guests to support multiple NUMA nodes within one
>guest.
>
>Also we should start a discussion on the config file options to add:
>Shall we use "numanodes=<nr of nodes>", something like "numa=on" (for
>one-node-guests only), or something like "numanode=0,1" to explicitly
>specify certain nodes?

Because now we don't support guest Numa, this configure options we don't
need now. If need to support guest Numa, I think users may even want to
configure the node's type, i.e. how many Cpu or memory in that node. I
think it will be too complicated. ^_^

>Any comments are appreciated.
>
>> I read your patches and Anthony's commands. Write a patch based on
>>
>> 1:    If guest set numanodes=n (default it will be 1 means that this
>> guest        will be restricted in one node); hypervisor will choose
>> begin node to        pin for this guest use round robin. But the
method I use
>> need a       spin_lock to prevent create domain at same time. Are
there any
>> more         good methods, hope for your suggestion.
>That's a good start, thank you. Maybe Keir has some comments on the
>spinlock issue.
>> 2:   pass node parameter use higher bits in flags when create domain.
>> At   this time, domain can record node information in domain struct
>> for  further use, i.e. show which node to pin when setup_guest.
>>      If use this method, in your patch, can simply balance nodes just
>> like         below;
>>
>>> +    for (i=0;i<=dominfo.max_vcpu_id;i++)
>>> +    {
>>> +        node= ( i * numanodes ) / (dominfo.max_vcpu_id+1)+
>>> +           domaininfo.first_node;
>>> +        xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]);
>>> +    }
>How many bits do you want to use? Maybe it's not a good idea to abuse
>some variable to hold a limited number of nodes only ("640K ought to be
>enough for anybody" ;-) But the general idea is good.
Actually if no need to support guest Numa, no parameter need to pass
down. 
Seems that one node for guest is a good method. ^_^

Best regards,
Ronghui

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel