This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") f

To: Anthony Liguori <anthony@xxxxxxxxxxxxx>
Subject: [Xen-devel] Re: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") for Linux
From: Chris Mason <chris.mason@xxxxxxxxxx>
Date: Mon, 13 Jul 2009 17:01:12 -0400
Cc: npiggin@xxxxxxx, akpm@xxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, tmem-devel@xxxxxxxxxxxxxx, kurt.hackel@xxxxxxxxxx, Rusty Russell <rusty@xxxxxxxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, sunil.mushran@xxxxxxxxxx, Avi Kivity <avi@xxxxxxxxxx>, jeremy@xxxxxxxx, Schwidefsky <schwidefsky@xxxxxxxxxx>, dave.mccracken@xxxxxxxxxx, Marcelo Tosatti <mtosatti@xxxxxxxxxx>, alan@xxxxxxxxxxxxxxxxxxx, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 14 Jul 2009 05:36:20 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A5B9B55.6000404@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Mail-followup-to: Chris Mason <chris.mason@xxxxxxxxxx>, Anthony Liguori <anthony@xxxxxxxxxxxxx>, Avi Kivity <avi@xxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, Rik van Riel <riel@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, npiggin@xxxxxxx, akpm@xxxxxxxx, jeremy@xxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, tmem-devel@xxxxxxxxxxxxxx, alan@xxxxxxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, kurt.hackel@xxxxxxxxxx, Rusty Russell <rusty@xxxxxxxxxxxxxxx>, dave.mccracken@xxxxxxxxxx, Marcelo Tosatti <mtosatti@xxxxxxxxxx>, sunil.mushran@xxxxxxxxxx, Schwidefsky <schwidefsky@xxxxxxxxxx>, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx>
References: <a09e4489-a755-46e7-a569-a0751e0fc39f@default> <4A5A1A51.2080301@xxxxxxxxxx> <4A5A3AC1.5080800@xxxxxxxxxxxxx> <20090713201745.GA3783@think> <4A5B9B55.6000404@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
On Mon, Jul 13, 2009 at 03:38:45PM -0500, Anthony Liguori wrote:
> Chris Mason wrote:
>> This depends on the extent to which tmem is integrated into the VM.  For
>> filesystem usage, the hooks are relatively simple because we already
>> have a lot of code sharing in this area.  Basically tmem is concerned
>> with when we free a clean page and when the contents of a particular
>> offset in the file are no longer valid.
> But filesystem usage is perhaps the least interesting part of tmem.
> The VMM already knows which pages in the guest are the result of disk IO  
> (it's the one that put it there, afterall).  It also knows when those  
> pages have been invalidated (or it can tell based on write-faulting).
> The VMM also knows when the disk IO has been rerequested by tracking  
> previous requests.  It can keep the old IO requests cached in memory and  
> use that to satisfy re-reads as long as the memory isn't needed for  
> something else.  Basically, we have tmem today with kvm and we use it by  
> default by using the host page cache to do I/O caching (via  
> cache=writethrough).

I'll definitely grant that caching with writethough adds more caching,
but it does need trim support before it is similar to tmem.  The caching
is transparent to the guest, but it is also transparent to qemu, and so
it is harder to manage and size (or even get a stat for how big it
currently is).

> The difference between our "tmem" is that instead of providing an  
> interface where the guest explicitly says, "I'm throwing away this  
> memory, I may need it later", and then asking again for it, the guest  
> throws away the page and then we can later satisfy the disk I/O request  
> that results from re-requesting the page instantaneously.
> This transparent approach is far superior too because it enables  
> transparent sharing across multiple guests.  This works well for CoW  
> images and would work really well if we had a file system capable of  
> block-level deduplification... :-)

Grin, I'm afraid that even if someone were to jump in and write the
perfect cow based filesystem and then find a willing contributor to code
up a dedup implementation, each cow image would be a different file
and so it would have its own address space.

Dedup and COW are an easy way to have hints about which pages are
supposed to be have the same contents, but they would have to go with
some other duplicate page sharing scheme.


Xen-devel mailing list