At 03:54 -0500 on 09 Feb (1234151686), Jiang, Yunhong wrote:
> Hi, Tim, this patchset try to support page offline request. I want to get
> some initial feedback before more testing.
I haven't had a chance to read the patches in detail yet, but my initial
impression is that:
- The general approach so far seems good (I suspect that your 2.3 stage
below could also be done like 2.2 without a full live migration but
since that's not implemented yet that's fine).
- It seems like a lot of code for what it does. On the Xen side that's
just a general impression since I'm not familiar with the bits of the
heap allocators that you're changing. In libxc you seem to have
duplicated parts of the save/restore code -- better to make those
routines externally visible to the rest of libxc and call them
from your new function.
- Like all systems code everywhere, it needs more comments. :) You've
introduced some generic-sounding functions (adjust_pte &c) without
describing what they do.
I'll have more detailed comments later in the week, I hope.
Cheers,
Tim.
> Page offline can be used by multiple usage model, belows are some examples:
> a) If too many correctable error happen to one page, management tools may try
> to offline the page to avoid more server error in future;
> b) When page is ECC error and can't be recoverd by hardware, Xen's MCA
> handler may try to offline the page, so that it will not be accessed anymore.
> c) Offline some DIMM for power management etc (Of course, this is far more
> than simple page offline)
>
> The basic idea to offline a page is:
> 1) If a page is free, it will be removed from page allocator
> 2) If page is in use, the owner will be checked
> 2.1) if it is owned by xen/dom0, the offline will be failed
> 2.2) If it is owned by a PV guest with no device assigned, user space tools
> will try to replace the page with new one.
> 2.3) It it is owned by a HVM guest with no device assigned, user space
> tools will try to live migration it.
> 2.4) If it is owned by a guest with device assigned, user space tools can
> do live migration if needed.
>
> This patchset includes support for type 2.1/2.2.
>
> page_offfline_xen.patch gives basic support. The new hypercall
> (XEN_SYSCTL_page_offline) will mark a page offlining if the page is in-use,
> otherwise, it will remove the page from the page allocator. It also changes
> the free_heap_pages(), so that if a page_offlining page is freed, that page
> will be marked as page_offlined and will not be allocated anymore. One tricky
> thing is, the offlined page may not be buddy-aligned (i.e., it may be in the
> middle of a 2^order pages), so that we have to re-arrange the buddy system
> (i.e. &heap[][][]) carefully.
>
> page_offline_xen_memory.patch add support to PV guest, a new hypercall
> (XENMEM_page_offline) try to replace the old page with the new one. This will
> happen only when the guest has been suspeneded, to avoid complex page sharing
> situation. I'm still checking if more situation need be considered, like LDT
> pages and CR3 pages, so any suggestion is really great help.
>
> page_offline_tools.patch is an example user space tools based on
> libxc/xc_domain_save.c, it will try to firstly mark a page offline, and
> checking the result. If a page is owned by a PV guest, it will try to replace
> the pages.
>
> I did some basic testing, tried free pages and PV guest pages and is ok. Of
> course, I need more test on it. And more robust error handling is needed.
>
> Any suggestion is welcome.
>
> Thanks
> Yunhong Jiang
--
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|