[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Thursday 26 February 2009 03:16:29 Jiang, Yunhong wrote:
> Christopher/Egger, thanks for reply very much, see comments below.
> >-----Original Message-----
> >From: Frank.Vanderlinden@xxxxxxx [mailto:Frank.Vanderlinden@xxxxxxx]
> >Sent: 2009年2月26日 1:33
> >To: Christoph Egger
> >Cc: Jiang, Yunhong; Kleen, Andi;
> >xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ke, Liping; Gavin Maltby
> >Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
> >
> >Christoph Egger wrote:
> >> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
> >>> So, Frank/Egger, can I assume followed are consensus currently?
> >>>
> >>> 1) MCE is handled by Xen HV totally, while guest's vMCE
> >
> >handler will only
> >
> >>> works for itself.
> >>> 2) Xen present a virtual #MC to guest through MSR access
> >>> emulation.(Xen will do the translation if needed).
> >>> 3) Guest's unmodified
> >>> MCE handler will handle the vMCE injected.
> >>> 4) Dom0 will get all log/telemetry through hypercall.
> >>> 5) The action taken by xen will be passed to dom0 through
> >
> >the telemetry
> >
> >>> mechanism.
> >>
> >> Mostly. Regarding 2) I want like to discuss first how to
> >
> >handle errors
> >
> >> impacting multiple contiguous physical pages which are non-contigous
> >> in guest physical space.
> >>
> >>
> >>
> >> And I also want to discuss about how to do recovery actions requiring
> >> PCI access. One example for this is
> >> Shanghai's "L3 Cache Index Disable"-Feature.
> >> Xen delegates PCI config space to Dom0 and
> >> via PCI passthrough partly to DomU.
> >> That means, if registers in PCI config space are independently
> >> accessable by Xen, Dom0 and/or DomU, they can interfere with
> >
> >each other.
> >
> >> Therefore, we need to
> >> a) clearly define who handles what and
> >> b) define some rules based on a)
> >> c) discuss how to handle Dom0/DomU going wild
> >>     and break the rules defined in b)
> >
> >I also agree on the approach in principle, but would like to see these
> >points addressed. For non-contiguous pages, I suppose Xen
> >could deliver
> >multiple #vMCEs to the guest, split into contiguous parts. The
> >vmce code
> >seems to be set up to be able to do this.

For virtual MCEs that is ok. But note, for unmodified guests, the MC handler
is written with the assumption that the CPU powers off when an #MCE
happens before the handler cleared the MCIP bit in the MCG_STATUS MSR.

> For the contigous pages, I agree with Gavin that such contiguous page error
> should be triggered as multiple #MC and so is ok.
> For PCI config space issue, Christoph, can you please share more
> information on it (or provide some document as Frank suggested), like is it
> for CE (Correctable error or UC(UnCorrectable error), is it in PCI range or
> PCI-E range (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's
> BDF caculated etc. Followed is some of my understanding.

I would like to see a generic solution that works with any feature
requiring access to the pci space rather a per-feature solution.

> Firstly, if it is CE, Xen will do nothing and dom0 will take recovery
> action. If it is UC, Xen will take action when all CPU is in SoftIRQ
> context, and dom0 will not take action, so it should be ok.
> Secondly, in Xen environment, per my understanding, CPU is owned by Xen HV,
> so I'm not sure when dom0 disable L3 cache (if it is CE), should Xen be
> aware or not. That is, should dom0 disable the cache directly, or it should
> user hypercall to ask Xen do that. Keir can give us more suggestion.
> For item C, currently Xen/dom0 can both access configuration space, while
> domU will do that through PCI_frontend/backend. Because PCI backend only
> cover device assigned to domU, so we don't need worry about domU and dom0
> should be trusted. However, one thing left is, if this range is beyond
> 0x100 (i.e. in pci-e range), we need add mmconfig support in Xen, although
> it can be added simply.
> Thanks
> -- Yunhong Jiang
> >As for the Shanghai feature: Christoph, are there any documents
> >available on that feature? What kind of errors are delivered
> >(corrected/correctable)?
> >
> >- Frank

---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.