This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH 0/4] [HVM] NUMA support in HVM guests

To: "Xu, Anthony" <anthony.xu@xxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 0/4] [HVM] NUMA support in HVM guests
From: "Andre Przywara" <andre.przywara@xxxxxxx>
Date: Fri, 07 Sep 2007 14:49:07 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 07 Sep 2007 06:00:18 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <51CFAB8CB6883745AE7B93B3E084EBE2010B74F5@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <46C02BE0.2070400@xxxxxxx> <51CFAB8CB6883745AE7B93B3E084EBE2010B74F5@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird (X11/20070409)

thanks for looking into the patches, I appreciate your comments.

+    for (i=0;i<=dominfo.max_vcpu_id;i++)
+    {
+        node= ( i * numanodes ) / (dominfo.max_vcpu_id+1);
+        xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]);
+    }

This always starts from node0, this may make node0 very busy, while other nodes 
may not have many work.
This is true, I encountered this before, but didn't want to wait longer for sending up the patches. Actually the "numanodes=n" config file option shouldn't specify the number of nodes, but a list of specific nodes to use, like "numanodes=0,2" to pin the domain on the first and the third node.
It may be nice to pin node from the lightest overhead node.
This sounds interesting. It shouldn't be that hard to do this in libxc, but we should think about a semantic to specify this behavior in the config file (if we change the semantic from the number to specific node like I described above).
We also need to add some limitations for numanodes. The number of vcpus on 
vnode should not be larger
>than the number of pcpus on pnode. Otherwise vcpus belonging to a domain run
> on the same pcpu, which is not what we want.
Would be nice, but in the moment I would push this into the sysadmin's responsibility.
In setup_numa_mem, each node has even memory size, if the memory allocation fails, >the domain creation fails. This may be too "rude", I think we can
support guest
> NUMA with each node has different memory size, even more, and maybe some node doesn't have
memory. What we need guarantee is guest see physical topology.
Sound reasonable. I will look into this.
In your patch, when create NUMA guest, vnode is pinned to pnode. While after 
some creations and destroys domain operation,
>the workload on the platform may be very imbalanced, we need a method to dynamically balance workload.
There are two methods IMO.
1. Implement NUMA-aware scheduler and page migration
2. Run a daemon in dom0, this daemon monitors workload, and use live-migration 
to balance workload if necessary.
You are right, this may become a problem. I think the second solution is easier to implement. A NUMA-aware scheduler would be nice, but my idea was that the guest OS can better schedule (more fine-grained on a per-process base than on a per-machine base) things. Changing the processing node without moving the memory along should be an exception (as it changes NUMA topology and in the moment I don't see methods to propagate this nicely to the (HVM) guest), so I think a kind of "real-emergency balancer" which includes page-migration (quite expensive with bigger memory sizes!) would be more appropriate.

After all my patches were more a discussion base than a final solution, so I see there is more work to do. In the moment I am working on including PV guests.


Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>