This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] [PATCH 0/4] [HVM] NUMA support in HVM guests

To: "Andre Przywara" <andre.przywara@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [PATCH 0/4] [HVM] NUMA support in HVM guests
From: "Xu, Anthony" <anthony.xu@xxxxxxxxx>
Date: Fri, 7 Sep 2007 16:42:57 +0800
Delivery-date: Fri, 07 Sep 2007 01:43:38 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <46C02BE0.2070400@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <46C02BE0.2070400@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcfdkRL4xMzMV8OkQC2gkOLD+e+x6QTlsjPw
Thread-topic: [Xen-devel] [PATCH 0/4] [HVM] NUMA support in HVM guests
Hi Andre,

This is a good start for supporting guest NUMA.

I have some comments.

+    for (i=0;i<=dominfo.max_vcpu_id;i++)
+    {
+        node= ( i * numanodes ) / (dominfo.max_vcpu_id+1);
+        xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]);
+    }

This always starts from node0, this may make node0 very busy, while other nodes 
may not have many work.
It may be nice to pin node from the lightest overhead node.

We also need to add some limitations for numanodes. The number of vcpus on 
vnode should not be larger than the number of pcpus on pnode. Otherwise vcpus 
belonging to a domain run on the same pcpu, which is not what we want.

In setup_numa_mem, each node has even memory size, if the memory allocation 
fails, the domain creation fails. This may be too "rude", I think we can 
support guest NUMA with each node has different memory size, even more, and 
maybe some node doesn't have memory. What we need guarantee is guest see 
physical topology. 

In your patch, when create NUMA guest, vnode is pinned to pnode. While after 
some creations and destroys domain operation, the workload on the platform may 
be very imbalanced, we need a method to dynamically balance workload.
There are two methods IMO.
1. Implement NUMA-aware scheduler and page migration
2. Run a daemon in dom0, this daemon monitors workload, and use live-migration 
to balance workload if necessary.


>-----Original Message-----
>From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-
>bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Andre Przywara
>Sent: Monday, August 13, 2007 6:01 PM
>To: xen-devel@xxxxxxxxxxxxxxxxxxx
>Subject: [Xen-devel] [PATCH 0/4] [HVM] NUMA support in HVM guests
>these four patches allow to forward NUMA characteristics into HVM
>guests. This works by allocating memory explicitly from different NUMA
>nodes and create an appropriate SRAT-ACPI table which describes the
>topology. Needs a decent guest kernel which uses the SRAT table to
>discover the NUMA topology.
>This allows to break the current de-facto limitation of guests to one
>NUMA node, one can use more memory and/or more VCPUs than there are
>available on one node.
>       Patch 1/4: introduce numanodes=n config file option.
>this states how many NUMA nodes the guest should see, the default      is
>which means to turn off most parts of the code.
>       Patch 2/4: introduce CPU affinity for allocate_physmap call.
>the correct NUMA node to take the memory from is chosen by simply using
>the currently scheduled CPU, this patch allows to explicitly specify a
>CPU and provides XENMEM_DEFAULT_CPU for the old behavior
>       Patch 3/4: allocate memory with NUMA in mind.
>actually look at the numanodes=n option to split the memory request up
>into n parts and allocate it from different nodes. Also change the VCPUs
>affinity to match the nodes.
>       Patch 4/4: inject created SRAT table into the guest.
>create a SRAT table, fill it up with the desired NUMA topology and
>inject it into the guest
>Applies against staging c/s #15719.
>Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
>Andre Przywara
>AMD-Operating System Research Center (OSRC), Dresden, Germany
>Tel: +49 351 277-84917
>----to satisfy European Law for business letters:
>AMD Saxony Limited Liability Company & Co. KG
>Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden,
>Registergericht Dresden: HRA 4896
>vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington,
>Delaware, USA)
>Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
>Xen-devel mailing list

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>