RE: [Xen-devel] [PATCH 00/11] PV NUMA Guests

In general, I am of the opinion that in a virtualized world,
one gets best flexibility or best performance, but not both.
There may be a couple of reasonable points on this "slider
selector", but I'm not sure in general if it will be worth
a huge time investment as real users will not understand the
subtleties of their workloads well enough to choose from
a large number of (perhaps more than two) points on the
performance/flexibility spectrum.

So customers that want highest performance should be prepared
to pin their guests and not use ballooning.  And those that
want the flexibility of migration and ballooning etc should
expect to see a performance hit (including NUMA consequences).

But since I don't get to make that decision, let's look
at the combination of NUMA + dynamic memory utilization...

> Please refer to my previously submitted patch for this
> (http://old.nabble.com/Xen-devel--XEN-PATCH---Linux-PVOPS--ballooning-
> on-numa-domains-td26262334.html).
> I intend to send out a refreshed patch once the basic guest numa is
> checked in.

OK, will wait and take a look at that later.
 
> We first try to CONFINE a domain and only then proceed to STRIPE or
> SPLIT(if capable) the domain. So, in this (automatic) global domain
> memory allocation scheme, there is no possibility of starvation from
> memory pov. Hope I got your question right.

The example I'm concerned with is:
1) Domain A is CONFINE'd to node A and domain B/C/D/etc are not
   CONFINE'd
2) Domain A uses less than the total memory on node A and/or
   balloons down so it uses even less than when launched.
3) Domains B/C/D have an increasing memory need, and semi-randomly
   absorb memory from all nodes, including node A.

After (3), free memory is somewhat randomly distributed across
all nodes.  Then:

4) Domain A suddenly has an increasing memory need... but there's
   not enough free memory remaining on node A (in fact possibly
   there is none at all) to serve its need.   But by definition
   of CONFINE, domain A is not allowed to use memory other than
   on node A.

What happens now?  It appears to me that other domains have
(perhaps even maliciously) starved domain A.

I think this is a dynamic bin-packing problem which is unsolvable
in general form.  So the choice of heuristics is going to be
important.
 
> For the tmem, I was thinking of the ability to specify a set of nodes
> from which the tmem-space memory is preferred which could be derived
> from the domain's numa enlightenment, but as you mentioned the
> full-page copy overhead is less noticeable (at least on my smaller
> NUMA machine). But, the rate would determine if we should do this to
> reduce inter-node traffic. What do you suggest ?  I was looking at the
> data structures too.

Since tmem allocates individual xmalloc-tlsf memory pools per domain,
it should be possible to inform tmem of node preferences, but I don't
know that it will be feasible to truly CONFINE a domain's tmem.
On the other hand, because of the page copying, affinity by itself
may be sufficient.

> > Also, I will be looking into adding some page-sharing
> > techniques into tmem in the near future.  This (and the
> > existing page sharing feature just added to 4.0) may
> > create some other interesting challenges for NUMA-awareness.
> I have just started reading up on the memsharing feature of Xen. I
> would be glad to get your input on NUMA challenges over there.

Note that the tmem patch that does sharing (tmem calls it "page
deduplication") was just accepted into xen-unstable.  Basically
some memory may belong to more than one domain, so NUMA affects
and performance/memory tradeoffs may get very complicated.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] [PATCH 00/11] PV NUMA Guests