SUZUKI Kazuhiro <mailto:kaz@xxxxxxxxxxxxxx> wrote:
> Hi Yunhong and Shane,
>
> Thank you for your comments and reviews.
>
>> For page offlining, It may not be a good way to put the page into an array
>> but a list.
>
> I modified to register offlined pages to a list instead of an array.
>
>> I guess the code targets for offlining domain pages only,
> right? How about free pages and xen pages?
>> If so, no need to check in the following when allocating
> free pages, since the offlined pages will not be freed into heap()()().
>> If not, the following may have a bug.
>
> Yes, I assumed that offlining page was needed for domain pages. If xen
> pages are impacted, then it is enough to crash the Hypervisor, in current
> implementation.
We have a internal patch for the similar purpose, for page offline caused by
both #MC or other purpose. We can base our work on your patch if needed.
But I'm a bit confused some changes in this patch.
In your previous version, if a page is marked as PGC_reserved, it will not be
allocated anymore. However, in this version, when a page is marked as
PGC_reserved, it is just not freed, so does that mean the page will not be
removed from current list? That seems a bit hack for me.
What I think to do is for page offline is:
a) mark a page as PGC_reserved (or other name like PGC_broken )
b) If it is free, to step c, otherwise, wait till it is freed by the owner and
to step c.
c) Remove the page from the buddy system, motve it to a special and seperated
list (i.e. not in the heap[][][] anymore), and return other page to the buddy
allocator.
Some argument is in step b, that if the page is owned by a guest, we can
replace it with a new page through p2m table, and don't need wait till it is
freed, we didn't do that currently because it is a bit complex to achieve that.
How is your idea on this?
Thanks
Yunhong Jiang
>
> I attach an updated patch for xen part which also includes some bug fixes.
>
> Thanks,
> KAZ
>
> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx>
>
>
> From: "Wang, Shane" <shane.wang@xxxxxxxxx>
> Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
> Date: Tue, 16 Dec 2008 19:10:00 +0800
>
>> For page offlining, It may not be a good way to put the page into an array
>> but a list.
>>
>> + pg->count_info |= PGC_reserved;
>> + page_offlining[num_page_offlining++] = pg;
>>
>> I guess the code targets for offlining domain pages only,
> right? How about free pages and xen pages?
>> If so, no need to check in the following when allocating
> free pages, since the offlined pages will not be freed into heap()()().
>> If not, the following may have a bug.
>>
>> + if ( !list_empty(&heap(node, zone, j)) ) {
>> + pg = list_entry(heap(node, zone,
> j).next, struct page_info, list);
>> + if (!(pg->count_info & PGC_reserved))
>> + goto found;
>> + else
>> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be
>> allocated.\n", + pg, page_to_maddr(pg)); +
>> }
>>
>> If one free page (not pg) within pg and pg+(1U<<j) is
> offlined, the range pg~pg+(1U<<j) has the risk to be allocated
> with that page.
>>
>> Shane
>>
>> Jiang, Yunhong wrote:
>>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
>>>> Hi all,
>>>>
>>>> I had posted about MCA support for Intel64 before. It had only a
>>>> function to log the MCA error data received from hypervisor.
>>>>
>>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 0876.html
>>>>
>>>> I attach patches that support not only error logging but also Page
>>>> Offlining function. The page where an MCA occurs will offline and not
>>>> reuse. A new flag 'PGC_reserved' was added in page count_info to mark
>>>> the impacted page.
>>>>
>>>> I know that it is better to implement the page offlining for general
>>>> purpose, but I implemented for MCA specialized in this first step.
>>>
>>> Maybe the MCA page offline is a bit different to normal page offline
>>> requirement, so take it as first step maybe a good choice :)
>>>
>>> As for your current page_offlining, I'm not sure why the PGC_reserved
>>> page should not be freed? Also, for following code, will that make
>>> the heap(node, zone, j) can't be allocated anymore? Maybe we can
>>> creat a special list to hold all those pages and remove them from the
>>> heap list?
>>>
>>> + if ( !list_empty(&heap(node, zone, j)) ) {
>>> + pg = list_entry(heap(node, zone, j).next, struct
>>> page_info, list); + if (!(pg->count_info &
>>> PGC_reserved)) + goto found; +
>>> else + printk(XENLOG_DEBUG "Page %p(%lx) is not to
>>> be allocated.\n", + pg, page_to_maddr(pg));
>>> +
>>>
>>>
>>>>
>>>> And I also implement the MCA handler of Dom0 which support to
>>>> shutdown the remote domain where a MCA occurred. If the MCA occurred
>>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed,
>>>> Dom0 calls SCHEDOP_remote_shutdown hypercall.
>>>>
>>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch
>>>
>>> We are not sure we really need pass all #MC information to dom0
>>> firstly, and let dom0 to notify domU. Xen should knows about
>>> everything, so it may have knowledge to decide inject virtual #MC to
>>> guest or not. Of course, this does not impact your patch.
>>>
>>>> [2/2] linux/x86_64 part:
> mca-support-with-page-offlining-linux.patch
>>>
>>> As for how to inject virtual #MC to guest (including dom0), I think
>>> we need consider following point:
>>>
>>> a) Benefit from reusing guest #MC handler's . #MC handler is well
>>> known difficult to test, and the native guest handler may have been
>>> tested more widely. Also #MC handler improves as time going-on, reuse
>>> guest's MCA handler share us those improvement.
>>> b) Maintain the PV handler to different OS version may not so easy,
>>> especially as hardware improves, and kernel may have better support
>>> for error handling/containment.
>>> c) #MC handler may need some model specific information to decide the
>>> action, while guest (not dom0) has virtualized CPUID information.
>>> d) Guest's MCA handler may requires the physical information when the
>>> #MC hapen, like the CPU number the #MC happens.
>>> e) For HVM domain, PV handler will be difficult (considering Windows
>>> guest).
>>>
>>> And we have several option to support virtual #MC to guest:
>>>
>>> Option 1 is what currently implemented. A PV #MC handler is
>>> implemented in guest. This PV handler gets MCA information from Xen
>>> HV through hypercall, including MCA MSR value, also some additional
>>> information, like which physical CPU the MCA happened. Option 1 will
>>> help us on issue d), but we need main a PV handler, and can't get
>>> benifit from native handler. Also it does not resolve issue c) quite well.
>>>
>>> option 2, Xen will provide MCA MSR virtualization so that guest's
>>> native #MC handler can run without changes. It can benifit from guest
>>> #MC handler, but it will be difficult to get model specific
>>> information, and has no physical information.
>>>
>>> Option 3 uses a PV #MC handler for guest as option 1, but interface
>>> between Xen/guest is abstract event, like offline offending page,
>>> terminate current execution context etc. This should be straight
>>> forward for Linux, but may be difficult to Windows and other OS.
>>>
>>> Currently we are considering option 2 to provide MCA MSR
>>> virtualization to guest, and dom0 can also benifit from such support
>>> (if guest has different CPUID as native, we will either keep guest
>>> running, or kill guest based on error code). Of course, current
>>> mechanism of passing MCA information from xen to dom0 will still be
>>> useful, but that will be used for logging purpose or for Correcatable
>>> Error. How do you think about this?
>>>
>>> Thanks
>>> Yunhong Jiang
>>>
>>>> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx>
>>>>
>>>> Thanks,
>>>> KAZ
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|