WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [RFC][PATCH] Basic support for page offline

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Subject: [Xen-devel] Re: [RFC][PATCH] Basic support for page offline
From: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Date: Tue, 10 Feb 2009 09:15:40 +0000
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 10 Feb 2009 01:16:13 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <E2263E4A5B2284449EEBD0AAB751098401C781605D@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E2263E4A5B2284449EEBD0AAB751098401C781605D@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.17 (2007-11-01)
At 03:54 -0500 on 09 Feb (1234151686), Jiang, Yunhong wrote:
> Hi, Tim, this patchset try to support page offline request. I want to get 
> some initial feedback before more testing.

I haven't had a chance to read the patches in detail yet, but my initial
impression is that:

 - The general approach so far seems good (I suspect that your 2.3 stage
   below could also be done like 2.2 without a full live migration but 
   since that's not implemented yet that's fine).
 - It seems like a lot of code for what it does.  On the Xen side that's
   just a general impression since I'm not familiar with the bits of the 
   heap allocators that you're changing.  In libxc you seem to have 
   duplicated parts of the save/restore code -- better to make those 
   routines externally visible to the rest of libxc and call them 
   from your new function.
 - Like all systems code everywhere, it needs more comments. :)  You've
   introduced some generic-sounding functions (adjust_pte &c) without
   describing what they do.

I'll have more detailed comments later in the week, I hope. 

Cheers,

Tim.

> Page offline can be used by multiple usage model, belows are some examples:
> a) If too many correctable error happen to one page, management tools may try 
> to offline the page to avoid more server error in future;
> b) When page is ECC error and can't be recoverd by hardware, Xen's MCA 
> handler may try to offline the page, so that it will not be accessed anymore.
> c) Offline some DIMM for power management etc (Of course, this is far more 
> than simple page offline)
> 
> The basic idea to offline a page is:
> 1) If a page is free, it will be removed from page allocator
> 2) If page is in use, the owner will be checked
>   2.1) if it is owned by xen/dom0, the offline will be failed
>   2.2) If it is owned by a PV guest with no device assigned, user space tools 
> will try to replace the page with new one.
>   2.3) It it is owned by a HVM guest with no device assigned, user space 
> tools will try to live migration it.
>   2.4) If it is owned by a guest with device assigned, user space tools can 
> do live migration if needed.
> 
> This patchset includes support for type 2.1/2.2. 
> 
> page_offfline_xen.patch gives basic support. The new hypercall 
> (XEN_SYSCTL_page_offline) will mark a page offlining if the page is in-use, 
> otherwise, it will remove the page from the page allocator. It also changes 
> the free_heap_pages(), so that if a page_offlining page is freed, that page 
> will be marked as page_offlined and will not be allocated anymore. One tricky 
> thing is, the offlined page may not be buddy-aligned (i.e., it may be in the 
> middle of a 2^order pages), so that we have to re-arrange the buddy system 
> (i.e. &heap[][][]) carefully.
> 
> page_offline_xen_memory.patch add support to PV guest, a new hypercall 
> (XENMEM_page_offline) try to replace the old page with the new one. This will 
> happen only when the guest has been suspeneded, to avoid complex page sharing 
> situation. I'm still checking if more situation need be considered, like LDT 
> pages and CR3 pages, so any suggestion is really great help.
> 
> page_offline_tools.patch is an example user space tools based on 
> libxc/xc_domain_save.c, it will try to firstly mark a page offline, and 
> checking the result. If a page is owned by a PV guest, it will try to replace 
> the pages.
> 
> I did some basic testing, tried free pages and PV guest pages and is ok. Of 
> course, I need more test on it. And more robust error handling is needed.
> 
> Any suggestion is welcome.
> 
> Thanks
> Yunhong Jiang





-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel