Hi Andre,
This is a good start for supporting guest NUMA.
I have some comments.
+ for (i=0;i<=dominfo.max_vcpu_id;i++)
+ {
+ node= ( i * numanodes ) / (dominfo.max_vcpu_id+1);
+ xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]);
+ }
This always starts from node0, this may make node0 very busy, while other nodes
may not have many work.
It may be nice to pin node from the lightest overhead node.
We also need to add some limitations for numanodes. The number of vcpus on
vnode should not be larger than the number of pcpus on pnode. Otherwise vcpus
belonging to a domain run on the same pcpu, which is not what we want.
In setup_numa_mem, each node has even memory size, if the memory allocation
fails, the domain creation fails. This may be too "rude", I think we can
support guest NUMA with each node has different memory size, even more, and
maybe some node doesn't have memory. What we need guarantee is guest see
physical topology.
In your patch, when create NUMA guest, vnode is pinned to pnode. While after
some creations and destroys domain operation, the workload on the platform may
be very imbalanced, we need a method to dynamically balance workload.
There are two methods IMO.
1. Implement NUMA-aware scheduler and page migration
2. Run a daemon in dom0, this daemon monitors workload, and use live-migration
to balance workload if necessary.
Regards
-Anthony
>-----Original Message-----
>From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-
>bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Andre Przywara
>Sent: Monday, August 13, 2007 6:01 PM
>To: xen-devel@xxxxxxxxxxxxxxxxxxx
>Subject: [Xen-devel] [PATCH 0/4] [HVM] NUMA support in HVM guests
>
>Hi,
>
>these four patches allow to forward NUMA characteristics into HVM
>guests. This works by allocating memory explicitly from different NUMA
>nodes and create an appropriate SRAT-ACPI table which describes the
>topology. Needs a decent guest kernel which uses the SRAT table to
>discover the NUMA topology.
>This allows to break the current de-facto limitation of guests to one
>NUMA node, one can use more memory and/or more VCPUs than there are
>available on one node.
>
> Patch 1/4: introduce numanodes=n config file option.
>this states how many NUMA nodes the guest should see, the default is
>0,
>which means to turn off most parts of the code.
> Patch 2/4: introduce CPU affinity for allocate_physmap call.
>currently
>the correct NUMA node to take the memory from is chosen by simply using
>the currently scheduled CPU, this patch allows to explicitly specify a
>CPU and provides XENMEM_DEFAULT_CPU for the old behavior
> Patch 3/4: allocate memory with NUMA in mind.
>actually look at the numanodes=n option to split the memory request up
>into n parts and allocate it from different nodes. Also change the VCPUs
>affinity to match the nodes.
> Patch 4/4: inject created SRAT table into the guest.
>create a SRAT table, fill it up with the desired NUMA topology and
>inject it into the guest
>
>Applies against staging c/s #15719.
>
>Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
>
>Regards,
>Andre.
>
>--
>Andre Przywara
>AMD-Operating System Research Center (OSRC), Dresden, Germany
>Tel: +49 351 277-84917
>----to satisfy European Law for business letters:
>AMD Saxony Limited Liability Company & Co. KG
>Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden,
>Deutschland
>Registergericht Dresden: HRA 4896
>vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington,
>Delaware, USA)
>Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
>
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@xxxxxxxxxxxxxxxxxxx
>http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|