WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Monday 02 March 2009 06:51:22 Jiang, Yunhong wrote:
> Frank/Christopher, can you please give more comments for it, or you are OK

Sorry, for the delay. I'm also busy with other tasks.

> with this? For the action reporting mechanism, we will send out a proposal
> for review soon.

I would like to see interface definition first, which covers all aspects
we discussed.



>
> Thanks
> Yunhong Jiang
>
> Jiang, Yunhong <> wrote:
> > Christopher/Frank, thanks for reply very much, see comments below.
> >
> >> -----Original Message-----
> >> From: Frank.Vanderlinden@xxxxxxx [mailto:Frank.Vanderlinden@xxxxxxx]
> >> Sent: 2009年2月26日 1:33 To: Christoph Egger
> >> Cc: Jiang, Yunhong; Kleen, Andi;
> >> xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ke, Liping; Gavin Maltby
> >> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
> >>
> >> Christoph Egger wrote:
> >>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
> >>>> So, Frank/Egger, can I assume followed are consensus currently?
> >>>>
> >>>> 1) MCE is handled by Xen HV totally, while guest's vMCE handler will
> >>>> only works for itself. 2) Xen present a virtual #MC to guest through
> >>>> MSR access emulation.(Xen will do the translation if needed).
> >>>> 3) Guest's unmodified
> >>>> MCE handler will handle the vMCE injected.
> >>>> 4) Dom0 will get all log/telemetry through hypercall.
> >>>> 5) The action taken by xen will be passed to dom0 through the
> >>>> telemetry mechanism.
> >>>
> >>> Mostly. Regarding 2) I want like to discuss first how to handle errors
> >>> impacting multiple contiguous physical pages which are non-contigous
> >>> in guest physical space.
> >>>
> >>>
> >>>
> >>> And I also want to discuss about how to do recovery actions requiring
> >>> PCI access. One example for this is
> >>> Shanghai's "L3 Cache Index Disable"-Feature.
> >>> Xen delegates PCI config space to Dom0 and
> >>> via PCI passthrough partly to DomU.
> >>> That means, if registers in PCI config space are independently
> >>> accessable by Xen, Dom0 and/or DomU, they can interfere with each
> >>> other. Therefore, we need to a) clearly define who handles what and
> >>> b) define some rules based on a)
> >>> c) discuss how to handle Dom0/DomU going wild
> >>>     and break the rules defined in b)
> >>
> >> I also agree on the approach in principle, but would like to see these
> >> points addressed. For non-contiguous pages, I suppose Xen
> >> could deliver
> >> multiple #vMCEs to the guest, split into contiguous parts. The
> >> vmce code
> >> seems to be set up to be able to do this.
> >
> > For the contigous pages, I agree with Gavin that such
> > contiguous page error should be triggered as multiple #MC and so is ok.
> >
> > For PCI config space issue, Christoph, can you please share
> > more information on it (or provide some document as Frank
> > suggested), like is it for CE (Correctable error or
> > UC(UnCorrectable error), is it in PCI range or PCI-E range
> > (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's
> > BDF caculated etc. Followed is some of my understanding.
> >
> > Firstly, if it is CE, Xen will do nothing and dom0 will take
> > recovery action. If it is UC, Xen will take action when all
> > CPU is in SoftIRQ context, and dom0 will not take action, so
> > it should be ok.
> >
> > Secondly, in Xen environment, per my understanding, CPU is
> > owned by Xen HV, so I'm not sure when dom0 disable L3 cache
> > (if it is CE), should Xen be aware or not. That is, should
> > dom0 disable the cache directly, or it should user hypercall
> > to ask Xen do that. Keir can give us more suggestion.
> >
> > For item C, currently Xen/dom0 can both access configuration
> > space, while domU will do that through PCI_frontend/backend.
> > Because PCI backend only cover device assigned to domU, so we
> > don't need worry about domU and dom0 should be trusted.
> > However, one thing left is, if this range is beyond 0x100
> > (i.e. in pci-e range), we need add mmconfig support in Xen,
> > although it can be added simply.
> >
> > Thanks
> > -- Yunhong Jiang
> >
> >> As for the Shanghai feature: Christoph, are there any documents
> >> available on that feature?

Yes, our BKDG.

> >> What kind of errors are delivered (corrected/correctable)?

The error type can be both depending on whether correction
via ECC was successful or not.


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel