This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [xen-devel][vNUMA v2][PATCH 2/8] public interface

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: Re: [xen-devel][vNUMA v2][PATCH 2/8] public interface
From: Dulloor <dulloor@xxxxxxxxx>
Date: Tue, 3 Aug 2010 10:24:58 -0700
Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 03 Aug 2010 10:25:46 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=AlQxpPNO5lIAQYCK+GawW6ahto3r8j0tgXiZkC9KkoU=; b=Lmv74KtBL0lGSbHBOEfcrHACNCDITKgiLNPLR4uDL15iTydVRBZ9gS5JbW57ZAgReX lnQWFUSkXHCa1ELi9EXb3BwxjJT0artuRddqWZM4A80R1VO5R23kJU0T4VgzsqhJNzZ3 5m8ibxcxqF3PpXpeTOPGRzINbyPPVfQCLbW2s=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=mvBFM9tcT/eEUEWwaw3GZN3WKHBsnSfC44wwTCAnnJ4sofrZtiSFiDVo2ankAWvJbI xhAi9UE2oyzAw8BJuUKWLMTTeJeTYoQo3411nGU5tqnTXUpYdU799qDZSiVmYWZlsbjG Wk4SGFV66FGR9Ozo9JcioqgcH8lsMPgTYz52w=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C87DF9E6.1C973%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTimKJogS0m2HN53KK-6_c-CnzBqqF0Udp8BFsRCh@xxxxxxxxxxxxxx> <C87DF9E6.1C973%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Tue, Aug 3, 2010 at 8:52 AM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> On 03/08/2010 16:43, "Dulloor" <dulloor@xxxxxxxxx> wrote:
>>> I would expect guest would see nodes 0 to nr_vnodes-1, and the mnode_id
>>> could go away.
>> mnode_id maps the vnode to a particular physical node. This will be
>> used by balloon driver in
>> the VMs when the structure is passed as NUMA enlightenment to PVs and
>> PV on HVMs.
>> I have a patch ready for that (once we are done with this series).
> So what happens when the guest is migrated to another system with different
> physical node ids? Is that never to be supported? I'm not sure why you
> wouldn't hide the vnode-to-mnode translation in the hypervisor.

Right now, migration is not supported when NUMA strategy is set.
This is in my TODO list (along with PoD support).

There are a few open questions wrt migration :
- What if the destination host is not NUMA, but the guest is NUMA. Do we fake
those nodes ? Or, should we not select such a destination host to begin with.
- What if the destination host is not NUMA, but guest has asked to be
striped across
a specific number of nodes (possibly for higher aggregate memory bandwidth) ?
- What if the guest has asked for a particular memory strategy
but the destination host can't guarantee that (because of the
distribution of free memory
across the nodes) ?
Once we answer these questions, we will know whether vnode-to-mnode
translation is better
exposed or not. And, if exposed, could we just renegotiate the
vnode-to-mnode translation at the
destination host. I have started working on this. But, I have some
other patches ready to go
which we might want to check-in first - PV/Dom0 NUMA patches,
Ballooning support (see below).

As such, the purpose of vnode-to-mnode translation is for the enlightened
guests to know where their underlying memory comes from, so that
over-provisioning features
like ballooning are given a chance to maintain this distribution. This
way all that the hypervisor
cares about is to do sanity checks on increase/exchange reservation
requests from the guests
and the guest can decide whether to make an exact_node_request or not.
Other options which would allow us to discard this translation are :
- Ballooning at your risk : Let ballooning be as it is even when
guests use a numa strategy(particularly split/confined).
- Hypervisor-level policies : Let Xen do its best to maintain the
guest nodes (using gpfn ranges in guest nodes), which I think
is not a clean/flexible solution.

But, what I could do is to leave out vnode_to_mnode translation for
now and add it along with ballooning support
(if/when we decide to add it). I will just bump up the interface
version at that time. That might give us time to mull this over ?

>  -- Keir

Xen-devel mailing list