Hi, Tim, this patchset try to support page offline request. I want to get some
initial feedback before more testing.
Page offline can be used by multiple usage model, belows are some examples:
a) If too many correctable error happen to one page, management tools may try
to offline the page to avoid more server error in future;
b) When page is ECC error and can't be recoverd by hardware, Xen's MCA handler
may try to offline the page, so that it will not be accessed anymore.
c) Offline some DIMM for power management etc (Of course, this is far more than
simple page offline)
The basic idea to offline a page is:
1) If a page is free, it will be removed from page allocator
2) If page is in use, the owner will be checked
2.1) if it is owned by xen/dom0, the offline will be failed
2.2) If it is owned by a PV guest with no device assigned, user space tools
will try to replace the page with new one.
2.3) It it is owned by a HVM guest with no device assigned, user space tools
will try to live migration it.
2.4) If it is owned by a guest with device assigned, user space tools can do
live migration if needed.
This patchset includes support for type 2.1/2.2.
page_offfline_xen.patch gives basic support. The new hypercall
(XEN_SYSCTL_page_offline) will mark a page offlining if the page is in-use,
otherwise, it will remove the page from the page allocator. It also changes the
free_heap_pages(), so that if a page_offlining page is freed, that page will be
marked as page_offlined and will not be allocated anymore. One tricky thing is,
the offlined page may not be buddy-aligned (i.e., it may be in the middle of a
2^order pages), so that we have to re-arrange the buddy system (i.e.
&heap[][][]) carefully.
page_offline_xen_memory.patch add support to PV guest, a new hypercall
(XENMEM_page_offline) try to replace the old page with the new one. This will
happen only when the guest has been suspeneded, to avoid complex page sharing
situation. I'm still checking if more situation need be considered, like LDT
pages and CR3 pages, so any suggestion is really great help.
page_offline_tools.patch is an example user space tools based on
libxc/xc_domain_save.c, it will try to firstly mark a page offline, and
checking the result. If a page is owned by a PV guest, it will try to replace
the pages.
I did some basic testing, tried free pages and PV guest pages and is ok. Of
course, I need more test on it. And more robust error handling is needed.
Any suggestion is welcome.
Thanks
Yunhong Jiang
page_offline_xen_memory.patch
Description: page_offline_xen_memory.patch
page_offline_tools.patch
Description: page_offline_tools.patch
page_offline_xen.patch
Description: page_offline_xen.patch
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|