This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") f

To: Chris Mason <chris.mason@xxxxxxxxxx>, Avi Kivity <avi@xxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, Rik van Riel <riel@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, npiggin@xxxxxxx, akpm@xxxxxxxx, jeremy@xxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, tmem-devel@xxxxxxxxxxxxxx, alan@xxxxxxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, kurt.hackel@xxxxxxxxxx, Rusty Russell <rusty@xxxxxxxxxxxxxxx>, dave.mccracken@xxxxxxxxxx, Marcelo Tosatti <mtosatti@xxxxxxxxxx>, sunil.mushran@xxxxxxxxxx, Schwidefsky <schwidefsky@xxxxxxxxxx>, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] Re: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") for Linux
From: Anthony Liguori <anthony@xxxxxxxxxxxxx>
Date: Mon, 13 Jul 2009 16:17:05 -0500
Delivery-date: Mon, 13 Jul 2009 14:17:36 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20090713210112.GC3783@think>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <a09e4489-a755-46e7-a569-a0751e0fc39f@default> <4A5A1A51.2080301@xxxxxxxxxx> <4A5A3AC1.5080800@xxxxxxxxxxxxx> <20090713201745.GA3783@think> <4A5B9B55.6000404@xxxxxxxxxxxxx> <20090713210112.GC3783@think>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird (X11/20090320)
Chris Mason wrote:
On Mon, Jul 13, 2009 at 03:38:45PM -0500, Anthony Liguori wrote:
I'll definitely grant that caching with writethough adds more caching,
but it does need trim support before it is similar to tmem.

I think trim is somewhat orthogonal but even if you do need it, the nice thing about implementing ATA trim support verses a paravirtualization is that it works with a wide variety of guests.

From the perspective of the VMM, it seems like a good thing.

  The caching
is transparent to the guest, but it is also transparent to qemu, and so
it is harder to manage and size (or even get a stat for how big it
currently is).

That's certainly a fixable problem though. We could expose statistics to userspace and then further expose those to guests. I think the first question to answer though is what you would use those statistics for.

The difference between our "tmem" is that instead of providing an interface where the guest explicitly says, "I'm throwing away this memory, I may need it later", and then asking again for it, the guest throws away the page and then we can later satisfy the disk I/O request that results from re-requesting the page instantaneously.

This transparent approach is far superior too because it enables transparent sharing across multiple guests. This works well for CoW images and would work really well if we had a file system capable of block-level deduplification... :-)

Grin, I'm afraid that even if someone were to jump in and write the
perfect cow based filesystem and then find a willing contributor to code
up a dedup implementation, each cow image would be a different file
and so it would have its own address space.

Dedup and COW are an easy way to have hints about which pages are
supposed to be have the same contents, but they would have to go with
some other duplicate page sharing scheme.

Yes. We have the information we need to dedup this memory though. We just need a way to track non-dirty pages that result from DMA, map the host page cache directly into the guest, and then CoW when the guest tries to dirty that memory.

We don't quite have the right infrastructure in Linux yet to do this effectively, but this is entirely an issue with the host. The guest doesn't need any changes here.


Anthony Liguori

Xen-devel mailing list