[Xen-devel] NUMA guest config options (was: Re: [PATCH 00/11] PV

To:	"Cui, Dexuan" <dexuan.cui@xxxxxxxxx>, Dulloor <dulloor@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-devel] NUMA guest config options (was: Re: [PATCH 00/11] PV NUMA Guests)
From:	Andre Przywara <andre.przywara@xxxxxxx>
Date:	Fri, 23 Apr 2010 14:46:12 +0200
Cc:	"Nakajima, Jun" <jun.nakajima@xxxxxxxxx>
Delivery-date:	Fri, 23 Apr 2010 05:50:00 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Thunderbird 2.0.0.23 (X11/20090820)

Hi,

yesterday Dulloor, Jun and I had a discussion about the NUMA guestconfiguration scheme, we came to the following conclusions:1. The configuration would be the same for HVM and PV guests, only theinternal method of propagation would differ.2. We want to make it as easy as possible, with best performance out ofthe box as the design goal. Another goal is predictable performance.3. We (at least for now) omit more sophisticated tuning options (exactuser-driven description of the guest's topology), so the guest'sresources are split equally across the guest nodes.

4. We have three basic strategies:
 - CONFINE: let the guest use only one node. If that does not work, fail.

- SPLIT: allocate resources from multiple nodes, inject a NUMAtopology into the guest (includes PV querying via hypercall). If theguest is paravirtualized and does not know about NUMA (missing ELFhint): fail.- STRIPE: allocate the memory in an interleaved way from multiplenodes, don't tell the guest about NUMA at all.

If any one the above strategies is explicitly specified in the configfile and it cannot be met, then the guest creation will fail.A fourth option would be the default: AUTOMATIC. This will try the threestrategies after each other (order: CONFINE, SPLIT, STRIP). If onefails, the next will be tried (this will never use striping for HVM guests).

5. The number of guest nodes is internally specified via a min/max pair.By default min is 1, max is the number of system nodes. The algorithmwill try to use the smallest possible number of nodes.


The question remaining is whether we want to expose this pair to the user:

- For predictable performance we want to specify an exact number ofguest nodes, so set min=max=<number of nodes>- For best performance, the number of nodes should be at small aspossible, so min is always 1. For the explicit CONFINE strategy, maxwould also be one, for AUTOMATIC it should be as few as possible, whichis already built in the algorithm.So it is not clear if "max nodes" is a useful option. If it would serveas an upper boundary, then it is questionable whether"failing-if-not-possible" is a useful result.


So maybe we get along with just one (optional) value: guestnodes.

This will be useful in the SPLIT case, where it specifies the number ofnodes the guest sees (for predictable performance). CONFINE internallyoverrides this value with "1". If one would impose a limit on the numberof nodes, one would choose "AUTOMATIC" and set guestnodes to thisnumber. If single-node allocations fail, it will use as few nodes aspossible, not exceeding the specified number.


Please comment on this.

Thanks and regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] NUMA guest config options (was: Re: [PATCH 00/11] PV NUMA Gu