[Xen-devel] RE: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") f
Anthony Liguori <anthony@xxxxxxxxxxxxx>
[Xen-devel] RE: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") for Linux
Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Thu, 9 Jul 2009 15:34:39 -0700 (PDT)
npiggin@xxxxxxx, akpm@xxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, tmem-devel@xxxxxxxxxxxxxx, kurt.hackel@xxxxxxxxxx, Russell <rusty@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, dave.mccracken@xxxxxxxxxx, linux-mm@xxxxxxxxx, Rusty, sunil.mushran@xxxxxxxxxx, Avi Kivity <avi@xxxxxxxxxx>, jeremy@xxxxxxxx, Schwidefsky <schwidefsky@xxxxxxxxxx>, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx>, Marcelo Tosatti <mtosatti@xxxxxxxxxx>, alan@xxxxxxxxxxxxxxxxxxx, chris.mason@xxxxxxxxxx
Fri, 10 Jul 2009 06:12:12 -0700
Xen developer discussion <xen-devel.lists.xensource.com>
> > If it guesses wrong and overcommits too aggressively,
> > the hypervisor must swap some memory to a "hypervisor
> > swap disk" (which btw has some policy challenges).
> > IMHO this is more of a "mainframe" model.
> No, not at all. A guest marks a page as being "volatile",
> which tells
> the hypervisor it never needs to swap that page. It can discard it
> whenever it likes.
> If the guest later tries to access that page, it will get a special
> "discard fault". For a lot of types of memory, the discard fault
> handler can then restore that page transparently to the code that
> generated the discard fault.
But this means that either the content of that page must have been
preserved somewhere or the discard fault handler has sufficient
information to go back and get the content from the source (e.g.
the filesystem). Or am I misunderstanding?
With tmem, the equivalent of the "failure to access a discarded page"
is inline and synchronous, so if the tmem access "fails", the
normal code immediately executes.
> AFAICT, ephemeral tmem has the exact same characteristics as volatile
> CMM2 pages. The difference is that tmem introduces an API to
> manage this memory behind a copy interface whereas CMM2 uses
> hinting and
> a special fault handler to allow any piece of memory to be marked in
> this way.
> I don't really agree with your analysis of CMM2. We can map CMM2
> operations directly to ephemeral tmem interfaces so tmem is a
> subset of CMM2, no?
Not really. I suppose one *could* use tmem that way, immediately
writing every page read from disk into tmem, though that would
probably cause some real coherency challenges. But the patch as
proposed only puts ready-to-be-replaced pages (as determined by
Linux's PFRA) into ephemeral tmem.
The two services provided to Linux (in the proposed patch) by
1) "I have a page of memory that I'm about to throw away because
I'm not sure I need it any more and I have a better use for
that pageframe right now. Mr Tmem might you have someplace
you can squirrel it away for me in case I need it again?
Oh, and by the way, if you can't or you lose it, no big deal
as I can go get it from disk if I need to."
2) "I'm out of memory and have to put this page somewhere. Mr
Tmem, can you take it? But if you do take it, you have to
promise to give it back when I ask for it! If you can't
promise, never mind, I'll find something else to do with it."
> > In other words, CMM2, despite its name, is more of a
> > "subservient" memory management system (Linux is
> > subservient to the hypervisor) and tmem is more
> > collaborative (Linux and the hypervisor share the
> > responsibilities and the benefits/costs).
> What's appealing to me about CMM2 is that it doesn't change the guest
> semantically but rather just gives the VMM more information about how
> the VMM is using it's memory. This suggests that it allows greater
> flexibility in the long term to the VMM and more importantly,
> provides an easier implementation across a wide range of guests.
I suppose changing Linux to utilize the two tmem services
as described above is a semantic change. But to me it
seems no more of a semantic change than requiring a new
special page fault handler because a page of memory might
disappear behind the OS's back.
But IMHO this is a corollary of the fundamental difference. CMM2's
is more the "VMware" approach which is that OS's should never have
to be modified to run in a virtual environment. (Oh, but maybe
modified just slightly to make the hypervisor a little less
clueless about the OS's resource utilization.) Tmem asks: If an
OS is going to often run in a virtualized environment, what
can be done to share the responsibility for resource management
so that the OS does what it can with the knowledge that it has
and the hypervisor can most flexibly manage resources across
all the guests? I do agree that adding an additional API
binds the user and provider of the API less flexibly then without
the API, but as long as the API is optional (as it is for both
tmem and CMM2), I don't see why CMM2 provides more flexibility.
Xen-devel mailing list