[Xen-devel] RE: [RFC] NUMA support

To:	"Andre Przywara" <andre.przywara@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-devel] RE: [RFC] NUMA support
From:	"Duan, Ronghui" <ronghui.duan@xxxxxxxxx>
Date:	Sat, 24 Nov 2007 23:57:31 +0800
Cc:	"Xu, Anthony" <anthony.xu@xxxxxxxxx>
Delivery-date:	Sat, 24 Nov 2007 07:58:09 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<4746E24C.9010403@xxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	Acgt3G3JfGm5ZGhgRCGBOLd14Zbw0wAyrjvw
Thread-topic:	[RFC] NUMA support

Hi all,
>thanks Ronghui for your patches and ideas. To make a more structured
>approach to a better NUMA support, I suggest to concentrate on
>one-node-guests first:

That is exactly what we want do at first, don't support guest Numa.

>* introduce CPU affinity to memory allocation routines called from
Dom0.
>This is basically my patch 2/4 from August. We should think about using
>a NUMA node number instead of a physical CPU, is there something to be
>said against this?

I think it is reasonable to bind guest with node not CPU.

>* find _some_ method of load balancing when creating guests. The method
>1 from Ronghui is a start, but a real decision based on each node's
>utilization (or free memory) would be more reasonable.

Yes, it is only a start for balancing.

>* patch the guest memory allocation routines to allocate memory from
>that specific node only (based on my patch 3/4)
Considering the performance, we should do it.

>* use live migration to local host to allow node migration. Assuming
>that localhost live migration works reliably (is that really true?) it
>shouldn't be too hard to implement this (basically just using node
>affinity while allocating guest memory). Since this is a rather
>expensive operation (takes twice the memory temporarily and quite some
>time), I'd suggest to trigger that explicitly from the admin via a xm
>command, maybe as an addition to migrate:
># xm migrate --live --node 1 <domid> localhost
>There could be some Dom0 daemon based re-balancer to do this somewhat
>automatically later on.
>
>I would take care of the memory allocation patch and would look into
>node migration. It would be great if Roughui or Anthony would help to
>improve the "load balancing" algorithm.

I have no idea on this now.

>Meanwhile I will continue to patch that d*** Linux kernel to accept
both
>CONFIG_NUMA and CONFIG_XEN without crashing that early ;-), this should
>allow both HVM and PV guests to support multiple NUMA nodes within one
>guest.
>
>Also we should start a discussion on the config file options to add:
>Shall we use "numanodes=<nr of nodes>", something like "numa=on" (for
>one-node-guests only), or something like "numanode=0,1" to explicitly
>specify certain nodes?

Because now we don't support guest Numa, this configure options we don't
need now. If need to support guest Numa, I think users may even want to
configure the node's type, i.e. how many Cpu or memory in that node. I
think it will be too complicated. ^_^

>Any comments are appreciated.
>
>> I read your patches and Anthony's commands. Write a patch based on
>>
>> 1:    If guest set numanodes=n (default it will be 1 means that this
>> guest        will be restricted in one node); hypervisor will choose
>> begin node to        pin for this guest use round robin. But the
method I use
>> need a       spin_lock to prevent create domain at same time. Are
there any
>> more         good methods, hope for your suggestion.
>That's a good start, thank you. Maybe Keir has some comments on the
>spinlock issue.
>> 2:   pass node parameter use higher bits in flags when create domain.
>> At   this time, domain can record node information in domain struct
>> for  further use, i.e. show which node to pin when setup_guest.
>>      If use this method, in your patch, can simply balance nodes just
>> like         below;
>>
>>> +    for (i=0;i<=dominfo.max_vcpu_id;i++)
>>> +    {
>>> +        node= ( i * numanodes ) / (dominfo.max_vcpu_id+1)+
>>> +           domaininfo.first_node;
>>> +        xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]);
>>> +    }
>How many bits do you want to use? Maybe it's not a good idea to abuse
>some variable to hold a limited number of nodes only ("640K ought to be
>enough for anybody" ;-) But the general idea is good.
Actually if no need to support guest Numa, no parameter need to pass
down. 
Seems that one node for guest is a good method. ^_^

Best regards,
Ronghui

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] RE: [RFC] NUMA support