Hi Wang,
Thank you very much for your comments.
I will post an updated patch later.
Thanks,
KAZ
From: "Wang, Shane" <shane.wang@xxxxxxxxx>
Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
Date: Mon, 22 Dec 2008 09:52:28 +0800
> Hi KAZ,
>
> For your latest patch, I am thinking of the following code.
>
> + /* check whether a page number have been already registered or not */
> + list_for_each_entry(page, &page_offlining, list) {
> + if (page == pg)
> + goto out;
> + }
>
> x86_page_offlining_t e is put into the list. Here, is it "if (e->page == pg)"?
>
> +
> + /* check whether already having attribute 'reserved' */
> + if (pg->count_info & PGC_reserved) {
> + printk(XENLOG_DEBUG "Page offlining: ( %lx ) failure.\n",
> + maddr);
> + return 1;
> + }
> +
> + /* add attribute 'reserved' and register the page */
> + get_page(pg, d);
> + pg->count_info |= PGC_reserved;
> +
> + e = xmalloc(x86_page_offlining_t);
> + BUG_ON(e == NULL);
> + e->page = pg;
> + list_add(&e->list, &page_offlining);
>
> Since page struct is also a list entry (containing "struct list_head list").
> You can pick off from domain page list and put it onto your page_offlining
> list.
> Certainly, the domain may be using it now. You can mark it by using your flag
> PGC_reserved first and then do this in the free function. I think this is
> what Jiang Yunhong suggested:)
> What do you think on those suggestions? PS: in that case,
> alloc_bitmap[]/heap[][][]/avail[] should be considered.
>
> For the code piece
>
> + if ( !list_empty(&heap(node, zone, j)) ) {
> + pg = list_entry(heap(node, zone, j).next, struct
> page_info, list);
> + if (!(pg->count_info & PGC_reserved))
> + goto found;
> + else
> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be
> allocated.\n",
> + pg, page_to_maddr(pg));
> +
>
> It seems this mechanism is not efficient enough since the first page is
> marked PGC_reserved (but the rest pages in heap(node, zone, j).next are not,
> the rest pages can't be used and allocated any more although they are not
> offlined), and it didn't solve the potential bug I mentioned.
> Moreover, since the original code (i.e. without your patch) just tries to
> find a space with an enough order, it just gets the first finding. So it is
> enough. However, I am wondering why you don't try to enumerate heap(node,
> zone, j) to get the second one ... but directly return "not to be allocated",
> when the code finds the first list entry can't meet the requirement?
>
> Thanks.
> Shane
>
> -----Original Message-----
> From: SUZUKI Kazuhiro [mailto:kaz@xxxxxxxxxxxxxx]
> Sent: 2008年12月22日 8:42
> To: Jiang, Yunhong
> Cc: Wang, Shane; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [PATCH 0/2] MCA support with page offlining
>
> Hi,
>
> > But I'm a bit confused some changes in this patch.
>
> Sorry, I attached a wrong patch, this is a correct one.
>
> Thanks,
> KAZ
>
> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx>
>
>
> From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
> Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
> Date: Fri, 19 Dec 2008 18:48:11 +0800
>
> > SUZUKI Kazuhiro <mailto:kaz@xxxxxxxxxxxxxx> wrote:
> > > Hi Yunhong and Shane,
> > >
> > > Thank you for your comments and reviews.
> > >
> > >> For page offlining, It may not be a good way to put the page into an
> > >> array
> > >> but a list.
> > >
> > > I modified to register offlined pages to a list instead of an array.
> > >
> > >> I guess the code targets for offlining domain pages only,
> > > right? How about free pages and xen pages?
> > >> If so, no need to check in the following when allocating
> > > free pages, since the offlined pages will not be freed into heap()()().
> > >> If not, the following may have a bug.
> > >
> > > Yes, I assumed that offlining page was needed for domain pages. If xen
> > > pages are impacted, then it is enough to crash the Hypervisor, in current
> > > implementation.
> >
> > We have a internal patch for the similar purpose, for page offline caused
> > by both #MC or other purpose. We can base our work on your patch if needed.
> >
> > But I'm a bit confused some changes in this patch.
> > In your previous version, if a page is marked as PGC_reserved, it will not
> > be allocated anymore. However, in this version, when a page is marked as
> > PGC_reserved, it is just not freed, so does that mean the page will not be
> > removed from current list? That seems a bit hack for me.
> >
> > What I think to do is for page offline is:
> > a) mark a page as PGC_reserved (or other name like PGC_broken )
> > b) If it is free, to step c, otherwise, wait till it is freed by the owner
> > and to step c.
> > c) Remove the page from the buddy system, motve it to a special and
> > seperated list (i.e. not in the heap[][][] anymore), and return other page
> > to the buddy allocator.
> >
> > Some argument is in step b, that if the page is owned by a guest, we can
> > replace it with a new page through p2m table, and don't need wait till it
> > is freed, we didn't do that currently because it is a bit complex to
> > achieve that.
> >
> > How is your idea on this?
> >
> > Thanks
> > Yunhong Jiang
> >
> > >
> > > I attach an updated patch for xen part which also includes some bug fixes.
> > >
> > > Thanks,
> > > KAZ
> > >
> > > Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx>
> > >
> > >
> > > From: "Wang, Shane" <shane.wang@xxxxxxxxx>
> > > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
> > > Date: Tue, 16 Dec 2008 19:10:00 +0800
> > >
> > >> For page offlining, It may not be a good way to put the page into an
> > >> array
> > >> but a list.
> > >>
> > >> + pg->count_info |= PGC_reserved;
> > >> + page_offlining[num_page_offlining++] = pg;
> > >>
> > >> I guess the code targets for offlining domain pages only,
> > > right? How about free pages and xen pages?
> > >> If so, no need to check in the following when allocating
> > > free pages, since the offlined pages will not be freed into heap()()().
> > >> If not, the following may have a bug.
> > >>
> > >> + if ( !list_empty(&heap(node, zone, j)) ) {
> > >> + pg = list_entry(heap(node, zone,
> > > j).next, struct page_info, list);
> > >> + if (!(pg->count_info & PGC_reserved))
> > >> + goto found;
> > >> + else
> > >> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be
> > >> allocated.\n", + pg, page_to_maddr(pg)); +
> > >>
> > >> }
> > >>
> > >> If one free page (not pg) within pg and pg+(1U<<j) is
> > > offlined, the range pg~pg+(1U<<j) has the risk to be allocated
> > > with that page.
> > >>
> > >> Shane
> > >>
> > >> Jiang, Yunhong wrote:
> > >>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> I had posted about MCA support for Intel64 before. It had only a
> > >>>> function to log the MCA error data received from hypervisor.
> > >>>>
> > >>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0
> > >>>> 0876.html
> > >>>>
> > >>>> I attach patches that support not only error logging but also Page
> > >>>> Offlining function. The page where an MCA occurs will offline and not
> > >>>> reuse. A new flag 'PGC_reserved' was added in page count_info to mark
> > >>>> the impacted page.
> > >>>>
> > >>>> I know that it is better to implement the page offlining for general
> > >>>> purpose, but I implemented for MCA specialized in this first step.
> > >>>
> > >>> Maybe the MCA page offline is a bit different to normal page offline
> > >>> requirement, so take it as first step maybe a good choice :)
> > >>>
> > >>> As for your current page_offlining, I'm not sure why the PGC_reserved
> > >>> page should not be freed? Also, for following code, will that make
> > >>> the heap(node, zone, j) can't be allocated anymore? Maybe we can
> > >>> creat a special list to hold all those pages and remove them from the
> > >>> heap list?
> > >>>
> > >>> + if ( !list_empty(&heap(node, zone, j)) ) {
> > >>> + pg = list_entry(heap(node, zone, j).next, struct
> > >>> page_info, list); + if (!(pg->count_info &
> > >>> PGC_reserved)) + goto found; +
> > >>> else + printk(XENLOG_DEBUG "Page %p(%lx) is not
> > >>> to
> > >>> be allocated.\n", + pg,
> > >>> page_to_maddr(pg));
> > >>> +
> > >>>
> > >>>
> > >>>>
> > >>>> And I also implement the MCA handler of Dom0 which support to
> > >>>> shutdown the remote domain where a MCA occurred. If the MCA occurred
> > >>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed,
> > >>>> Dom0 calls SCHEDOP_remote_shutdown hypercall.
> > >>>>
> > >>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch
> > >>>
> > >>> We are not sure we really need pass all #MC information to dom0
> > >>> firstly, and let dom0 to notify domU. Xen should knows about
> > >>> everything, so it may have knowledge to decide inject virtual #MC to
> > >>> guest or not. Of course, this does not impact your patch.
> > >>>
> > >>>> [2/2] linux/x86_64 part:
> > > mca-support-with-page-offlining-linux.patch
> > >>>
> > >>> As for how to inject virtual #MC to guest (including dom0), I think
> > >>> we need consider following point:
> > >>>
> > >>> a) Benefit from reusing guest #MC handler's . #MC handler is well
> > >>> known difficult to test, and the native guest handler may have been
> > >>> tested more widely. Also #MC handler improves as time going-on, reuse
> > >>> guest's MCA handler share us those improvement.
> > >>> b) Maintain the PV handler to different OS version may not so easy,
> > >>> especially as hardware improves, and kernel may have better support
> > >>> for error handling/containment.
> > >>> c) #MC handler may need some model specific information to decide the
> > >>> action, while guest (not dom0) has virtualized CPUID information.
> > >>> d) Guest's MCA handler may requires the physical information when the
> > >>> #MC hapen, like the CPU number the #MC happens.
> > >>> e) For HVM domain, PV handler will be difficult (considering Windows
> > >>> guest).
> > >>>
> > >>> And we have several option to support virtual #MC to guest:
> > >>>
> > >>> Option 1 is what currently implemented. A PV #MC handler is
> > >>> implemented in guest. This PV handler gets MCA information from Xen
> > >>> HV through hypercall, including MCA MSR value, also some additional
> > >>> information, like which physical CPU the MCA happened. Option 1 will
> > >>> help us on issue d), but we need main a PV handler, and can't get
> > >>> benifit from native handler. Also it does not resolve issue c) quite
> > >>> well.
> > >>>
> > >>> option 2, Xen will provide MCA MSR virtualization so that guest's
> > >>> native #MC handler can run without changes. It can benifit from guest
> > >>> #MC handler, but it will be difficult to get model specific
> > >>> information, and has no physical information.
> > >>>
> > >>> Option 3 uses a PV #MC handler for guest as option 1, but interface
> > >>> between Xen/guest is abstract event, like offline offending page,
> > >>> terminate current execution context etc. This should be straight
> > >>> forward for Linux, but may be difficult to Windows and other OS.
> > >>>
> > >>> Currently we are considering option 2 to provide MCA MSR
> > >>> virtualization to guest, and dom0 can also benifit from such support
> > >>> (if guest has different CPUID as native, we will either keep guest
> > >>> running, or kill guest based on error code). Of course, current
> > >>> mechanism of passing MCA information from xen to dom0 will still be
> > >>> useful, but that will be used for logging purpose or for Correcatable
> > >>> Error. How do you think about this?
> > >>>
> > >>> Thanks
> > >>> Yunhong Jiang
> > >>>
> > >>>> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx>
> > >>>>
> > >>>> Thanks,
> > >>>> KAZ
> > >>>>
> > >>>> _______________________________________________
> > >>>> Xen-devel mailing list
> > >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> > >>>> http://lists.xensource.com/xen-devel
> > >>> _______________________________________________
> > >>> Xen-devel mailing list
> > >>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> > >>> http://lists.xensource.com/xen-devel
> > >>
> > >>
> > >> _______________________________________________
> > >> Xen-devel mailing list
> > >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> > >> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|