Re: [Xen-devel] Xen Memory De-duplication

To:	Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, Pasi Kärkkäinen <pasik@xxxxxx>
Subject:	Re: [Xen-devel] Xen Memory De-duplication
From:	Aditya Gadre <adivb2003@xxxxxxxxx>
Date:	Sun, 10 Oct 2010 10:54:58 +0530
Cc:	Xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Sat, 09 Oct 2010 22:26:09 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=js3elgl7D4hL+2GLsf3aEIiB7VEQWq/kMSVTo3Z/MFc=; b=W4P+O7NxEmcl519PFyhZstkk5/XAgZyr8ywgU/T0+kwb/woDRIEzf0Wytd7ns/jJow DHTRDSw80MFS7ATaZMorYmv/K1RfVU+fM1lqYkyWd3ahvD8Xe7NwNO5AhYxbNGy0IfuH MhDGHpgItotdao47K4KGTMtIpDYgWOrDrR3fE=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=prGQ71NcQwMY/HYrgxIuLh73G+M44VfLpgUrQnxDnFUJFjm96uhOiQiCMnnz9Btbh2 UKfRF94BxK6K0ZzR48oE/rPXjgVleCwf1UGhzEbmq/FzrXC/1JlNWDzxKCA4lB7uLP14 sd0r7JneKRykl/X+zyI4Hnd578LMzoyz908TQ=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<96a60488-b3aa-4141-92a4-587257b48d86@default>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<AANLkTimbu2g_s9gdOD4N76TuY--x3nbhwAZDMNdkLCKh@xxxxxxxxxxxxxx> <96a60488-b3aa-4141-92a4-587257b48d86@default>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

This kind of implementation will require the disk blocks from different DomUs to be mapped to same physical disk block.
For example,
1) Shared read only filesystem
2) Union based filesystem
3) Virtual machine images deployed on a host filesystem which has deduplication enabled

What kind of arrangement of filesystem is used in production environments for DomUs which host large number of VMs as in cloud enviorment?

On Sun, Oct 10, 2010 at 5:10 AM, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> wrote:

I’m not an expert on it but I believe this sounds very similar to the page sharing implementation that already exists in Xen 4.0. The implementation in Xen only works on HVM guests and only on machines that have EPT though. The patches (which were accepted into Xen) were posted here:

http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html

From: Aditya Gadre [mailto:adivb2003@xxxxxxxxx]
Sent: Saturday, October 09, 2010 11:56 AM

To: Xen-devel@xxxxxxxxxxxxxxxxxxx

Subject: [Xen-devel] Xen Memory De-duplication

Aim is to implement Xen Memory Deduplication with minimum overhead.

Our approach to de-duplication is as follows

In most cases, Domain-U uses a small set of well-known operating systems such as Linux, FreeBSD and Microsoft Windows. In such environment many domains share read-only filesystems that contain operating system and frequently usedprogram files and libraries.Each domain has their own writable filesystems for storing data and temporary files. In this configuration, multiple pages scattered in different domains mostly happen to contain same disk block. So, in our approach to perform deduplication we intend to add a data structure in dom 0 which store disk block number and the machine frame number(MFN) when a read request for the read only code(and data) is made. Now when another domain U places the request for the block of code and Dom 0 recieves a request for I/O (DMA), it will first check into the data structure for the entry for the block. If it finds the block it will return the MFN of the already read page and map it to the requesting domain's PFN resulting in zero I/O processing time of blocks which are already read. This in turn results in de-duplication of the read only pages accessed by multiple domains without any overhead of hashing the page.

Test case scenario:

Consider a Dom0 linux kernel using a filesystem with deduplication enabled. Then we install a DomU kernel with the virtual disk as a image file on the disk(.img). Then we make multiple copies of the image to deploy multiple DomUs running same kernel. Now, as deduplication is enabled in the file system initially all the blocks of the domains will be pointing to the same disk blocks. Now when the kernel's are booted, they all will consume memory only once for the programs(code segment) loaded in the memory. Now as these OSs start to write to their own virtual filesystems the blocks of the image will be COW'ed by the filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a project. What are the suspected challanges?

Regards,
Aditya Gadre

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Xen Memory De-duplication