Hi all,
>thanks Ronghui for your patches and ideas. To make a more structured
>approach to a better NUMA support, I suggest to concentrate on
>one-node-guests first:
That is exactly what we want do at first, don't support guest Numa.
>* introduce CPU affinity to memory allocation routines called from
Dom0.
>This is basically my patch 2/4 from August. We should think about using
>a NUMA node number instead of a physical CPU, is there something to be
>said against this?
I think it is reasonable to bind guest with node not CPU.
>* find _some_ method of load balancing when creating guests. The method
>1 from Ronghui is a start, but a real decision based on each node's
>utilization (or free memory) would be more reasonable.
Yes, it is only a start for balancing.
>* patch the guest memory allocation routines to allocate memory from
>that specific node only (based on my patch 3/4)
Considering the performance, we should do it.
>* use live migration to local host to allow node migration. Assuming
>that localhost live migration works reliably (is that really true?) it
>shouldn't be too hard to implement this (basically just using node
>affinity while allocating guest memory). Since this is a rather
>expensive operation (takes twice the memory temporarily and quite some
>time), I'd suggest to trigger that explicitly from the admin via a xm
>command, maybe as an addition to migrate:
># xm migrate --live --node 1 <domid> localhost
>There could be some Dom0 daemon based re-balancer to do this somewhat
>automatically later on.
>
>I would take care of the memory allocation patch and would look into
>node migration. It would be great if Roughui or Anthony would help to
>improve the "load balancing" algorithm.
I have no idea on this now.
>Meanwhile I will continue to patch that d*** Linux kernel to accept
both
>CONFIG_NUMA and CONFIG_XEN without crashing that early ;-), this should
>allow both HVM and PV guests to support multiple NUMA nodes within one
>guest.
>
>Also we should start a discussion on the config file options to add:
>Shall we use "numanodes=<nr of nodes>", something like "numa=on" (for
>one-node-guests only), or something like "numanode=0,1" to explicitly
>specify certain nodes?
Because now we don't support guest Numa, this configure options we don't
need now. If need to support guest Numa, I think users may even want to
configure the node's type, i.e. how many Cpu or memory in that node. I
think it will be too complicated. ^_^
>Any comments are appreciated.
>
>> I read your patches and Anthony's commands. Write a patch based on
>>
>> 1: If guest set numanodes=n (default it will be 1 means that this
>> guest will be restricted in one node); hypervisor will choose
>> begin node to pin for this guest use round robin. But the
method I use
>> need a spin_lock to prevent create domain at same time. Are
there any
>> more good methods, hope for your suggestion.
>That's a good start, thank you. Maybe Keir has some comments on the
>spinlock issue.
>> 2: pass node parameter use higher bits in flags when create domain.
>> At this time, domain can record node information in domain struct
>> for further use, i.e. show which node to pin when setup_guest.
>> If use this method, in your patch, can simply balance nodes just
>> like below;
>>
>>> + for (i=0;i<=dominfo.max_vcpu_id;i++)
>>> + {
>>> + node= ( i * numanodes ) / (dominfo.max_vcpu_id+1)+
>>> + domaininfo.first_node;
>>> + xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]);
>>> + }
>How many bits do you want to use? Maybe it's not a good idea to abuse
>some variable to hold a limited number of nodes only ("640K ought to be
>enough for anybody" ;-) But the general idea is good.
Actually if no need to support guest Numa, no parameter need to pass
down.
Seems that one node for guest is a good method. ^_^
Best regards,
Ronghui
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|