[Xen-ia64-devel] RE: Xen/ia64 - global or per VP VHPT

Hi Dan,

I am going to try combine replies for your previous two messages here and
then attempt to point out where I think the tradeoffs of the two approaches
are. I will then leave it up to the people involved with the implementation
to work out things going forward (what I mean here may be clearer after
reading the rest of this message), as I don't think I can contribute much
there.

Magenheimer, Dan (HP Labs Fort Collins) <mailto:dan.magenheimer@xxxxxx>
wrote on Sunday, May 01, 2005 11:42 AM:

>> Please let's talk about specifics
>> and how they
>> relate to the issues of:
>> 
>> - Scalability (additional contention in a Global VHPT)
> 
> I see your lock contention argument.  Is the contention any
> worse for 10 domains contending for a global VHPT than an existing
> 10-way SMP (e.g. HP-UX, not virtualized) contending for an lVHPT?

When I refer to "contention" here, I am just talking about contention in the
VHPT.

The answer to this question depends on a number of factors. First, I want to
point out that contention related to supporting 10 UP VMs is non-existent
with a per domain VHPT implementation. 

As you point out, with the global VHPT implementation it could be equivalent
to the contention suffered by an OS (HP-UX in your example) running on a
10-way SMP system. I tend to guess (I have no measurements) that the
contention in the case of a VMM supporting 10 UP VMs could potentially be
worse than the contention experienced by a mature OS that supports a 10-way
SMP system mainly because of RID allocation issues (this problem is more
difficult for a VMM than for a single OS). I do agree however that
virtualizing RIDs is the way to manage this problem, and also that when
supporting OSs using short format page table the RID allocation problem may
also exist in the per VM VHPT implementation (as Matt Chapman pointed out).

As an aside, the issues I have with comparing VMM architectures with
traditional operating systems are:

1- A VMM and an OS are very different in terms of resource management (and
allocation granularity). Something that works well for an OS may not
necessarily work well for a VMM. The main reason for my arguing this point
is that processes and VMs interact very differently. This may be self
evident to some, but I have had long debates with people regarding this
point.

2- I expect that VMMs will have to scale way beyond what OSs scale today to
help (along with partitioning) address cases in which a single OS cannot
scale to the size of a full machine. By the way, in my opinion large
machines are much more important/significant/relevant to IPF than to x-86.

>> I have not seen this. Would you mind sending me a pointer to
>> this. I tend to
>> follow these discussions sporadically, so I missed that one email.
> 
>
http://lists.xensource.com/archives/html/xen-ia64-devel/2005-04/msg00012.htm
l
> 
> Please note this is just a couple week's work (based on experience
> from vBlades) so please ask questions rather than shoot
> bullets at it.  It's definitely a work in progress.

OK. I'll try to be gentle :-)

> That's the point I was trying to make.  Wasteful is not
> strong enough though... if you have 64 such domains, all
> of memory is used for VHPTs.  So I think some mechanism
> for growing/shrinking per-domain VHPTs needs to part of the
> design or a lot of "utility computing" flexibility is lost.

I think the difference between your argument and mine boils down to whether
or not having the functionality to grow/shrink the VHPT is
necessary/beneficial in all cases (including the global VHPT case). The
argument I can offer is that if you want to support dynamic
addition/replacement of memory in systems (I tend to think this
functionality is more critical for larger systems. I tend to equate larger
systems to IPF), having the ability to grow/shrink the VHPT will be
important no matter what the VHPT implementation is (global or per domain).
By the way, I do agree with Mark Williamson's observation that allocating
all memory for the VHPT at boot time may not be wasteful, if there are no
VMs to use it anyway (although there is always the question of whether or
not that memory could be used for something beyond the VMM... But that is
not very relevant to this discussion).

These comments came from a different email message, but I think they are
relevant to the discussion of shrinking/growing VHPTs D> means Dan, B> means
Bert):

B>> No, the VHPT does not have to be pinned by a TR. People do 
B>> it, but it does
B>> not have to be that way. 

D> It doesn't architecturally, but it does practically, right?

It does for current OS implementations. This does not mean this is the best
thing for a VMM. This is an example of what I mean with my issue number 2
above, regarding comparing OS and VMMs.

D> If the the VHPT is greatly fragmented, I'll bet nearly all of
D> the performance advantage is gone due to extra misses and/or
D> loss of usable entries in the DTLB.  

The VHPT does not have to be greatly fragmented. If we agree to preallocate
memory for VHPTs (as we need to do for the global VHPT case), then we should
be able to manage things at a larger granularity than 4K (covering the VHPT
may need more than one TLB entry, but not one entry per 4K chunk in the
VHPT).

> Not to mention the complexity of the psr.ic-off code to handle this...

I am not sure there is as much complexity as you think.

In any case, I do think it all boils down to whether or not we believe
dynamically sized VHPTs will be necessary in the future.

D> My points are:

D> Growing or shrinking is not necessary for a global VHPT because
D> it is scaled to the actual physical memory in the machine
D> rather than the sum of the virtual physical memory of N
D> domains.
D> Preallocation is much easier for the global VHPT because it need
D> not grow or shrink (ignoring hot-plug machine memory) nor is
D> it proportional to the number of domains.

True. My argument here is that I think there are other reasons for wanting
to have this functionality (shrinking/growing the VHPT), like the ability to
dynamically add/remove memory from a system (I don't think we should ignore
this issue, as you suggest). If we consider this with the scalability
tradeoffs with the global VHPT, deciding to implement a dynamic VHPT may not
be that hard to swallow.

D> If the number of domains is dynamic (especially wildly so),
D> allocating memory for per-domain VHPTs is going to be painful.
D> And if your solution to this is "if it hurts don't do that"
D> (meaning don't allow the number of domains to be dynamic or
D> the amount of (meta)physical memory to be dynamic),
D> I'd consider that a design problem with per-domain VHPT.

If we agree to preallocate memory for the VHPT (as is required for the
global VHPT case), and replace the requirement to cover the entire VHPT with
a single TR with the ability to minimize the number of memory chunks making
up the VHPT (this can be done because the memory is preallocated), then this
problem can be addressed.

The question here is in your example of the wildly dynamic number of VMs,
what is better?:

- Having the ability to allocate the VHPT memory in one chunk, but having to
suffer the overhead of VHPT synchronizing all of those VMs being created and
deleted and the memory accesses of the running ones on a single VHPT.

Or

- Not having the VHPT synchronization overhead, but having to support
shrinking/growing VHPTs.

>> You keep on making this differentiation between full and
paravirtualization
>> (but I don't think that is very relevant to what I am saying), please
>> explain how in a paravirtualized guest the example I
>> presented above of 10
>> UP VMs having to synchronize updates to the VHPT is not an issue.
> 
> You are likely correct.  But it is a small matter of coding
> to add the synchronization.  Then if performance is poor, we
> tell system administrators that the the per-domain VHPT
> may be preferable on highly-scalable systems -- at the loss
> of some flexibility in dynamic domain migration/ballooning.
>
> And if it turns out that per-domain VHPT works "better" for
> ALL workloads, then I will admit I was wrong and pull the
> support for global VHPT.  But until then it should be left
> as an option (for non-VT domains).

I agree that the best way to address this type argument is to measure. The
main question is whether or not supporting both mechanisms can be done
naturally in the same source base without having to tradeoff other important
stuff. In any case, I am definitely not in a position to comment on how
viable it is to support both mechanisms in the existing implementation and
what else may be affected by it. If everyone agrees that doing both
implementations in the same source base is feasible and does not adversely
affect other stuff, then I have no objection to what you propose.

> 
> Dan

Bert

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

WARNING - OLD ARCHIVES

xen-ia64-devel

[Xen-ia64-devel] RE: Xen/ia64 - global or per VP VHPT