WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [RFC] NUMA support

To: "Duan, Ronghui" <ronghui.duan@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] [RFC] NUMA support
From: "Andre Przywara" <andre.przywara@xxxxxxx>
Date: Fri, 23 Nov 2007 15:23:08 +0100
Cc: Anthony.Xu@xxxxxxxxx
Delivery-date: Fri, 23 Nov 2007 06:24:12 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <82C666AA63DC75449C51EAD62E8B2BEC337773@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <82C666AA63DC75449C51EAD62E8B2BEC337773@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.6 (X11/20070728)
All,
thanks Ronghui for your patches and ideas. To make a more structured approach to a better NUMA support, I suggest to concentrate on one-node-guests first: * introduce CPU affinity to memory allocation routines called from Dom0. This is basically my patch 2/4 from August. We should think about using a NUMA node number instead of a physical CPU, is there something to be said against this?

* find _some_ method of load balancing when creating guests. The method 1 from Ronghui is a start, but a real decision based on each node's utilization (or free memory) would be more reasonable.

* patch the guest memory allocation routines to allocate memory from that specific node only (based on my patch 3/4)

* use live migration to local host to allow node migration. Assuming that localhost live migration works reliably (is that really true?) it shouldn't be too hard to implement this (basically just using node affinity while allocating guest memory). Since this is a rather expensive operation (takes twice the memory temporarily and quite some time), I'd suggest to trigger that explicitly from the admin via a xm command, maybe as an addition to migrate:
# xm migrate --live --node 1 <domid> localhost
There could be some Dom0 daemon based re-balancer to do this somewhat automatically later on.

I would take care of the memory allocation patch and would look into node migration. It would be great if Roughui or Anthony would help to improve the "load balancing" algorithm.

Meanwhile I will continue to patch that d*** Linux kernel to accept both CONFIG_NUMA and CONFIG_XEN without crashing that early ;-), this should allow both HVM and PV guests to support multiple NUMA nodes within one guest.

Also we should start a discussion on the config file options to add:
Shall we use "numanodes=<nr of nodes>", something like "numa=on" (for one-node-guests only), or something like "numanode=0,1" to explicitly specify certain nodes?

Any comments are appreciated.

I read your patches and Anthony's commands. Write a patch based on

1:    If guest set numanodes=n (default it will be 1 means that this
guest           will be restricted in one node); hypervisor will choose
begin node to   pin for this guest use round robin. But the method I use
need a  spin_lock to prevent create domain at same time. Are there any
more    good methods, hope for your suggestion.
That's a good start, thank you. Maybe Keir has some comments on the spinlock issue.
2:      pass node parameter use higher bits in flags when create domain.
At      this time, domain can record node information in domain struct
for further use, i.e. show which node to pin when setup_guest. If use this method, in your patch, can simply balance nodes just
like    below;

+    for (i=0;i<=dominfo.max_vcpu_id;i++)
+    {
+        node= ( i * numanodes ) / (dominfo.max_vcpu_id+1)+             
+               domaininfo.first_node;
+        xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]);
+    }
How many bits do you want to use? Maybe it's not a good idea to abuse some variable to hold a limited number of nodes only ("640K ought to be enough for anybody" ;-) But the general idea is good.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel