[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Frank/Christopher, can you please give more comments for it, or you are OK with 
For the action reporting mechanism, we will send out a proposal for review soon.

Yunhong Jiang

Jiang, Yunhong <> wrote:
> Christopher/Frank, thanks for reply very much, see comments below.
>> -----Original Message-----
>> From: Frank.Vanderlinden@xxxxxxx [mailto:Frank.Vanderlinden@xxxxxxx] Sent:
>> 2009年2月26日 1:33 To: Christoph Egger
>> Cc: Jiang, Yunhong; Kleen, Andi;
>> xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ke, Liping; Gavin Maltby
>> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>> Christoph Egger wrote:
>>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
>>>> So, Frank/Egger, can I assume followed are consensus currently?
>>>> 1) MCE is handled by Xen HV totally, while guest's vMCE handler will
>>>> only works for itself. 2) Xen present a virtual #MC to guest through MSR
>>>> access emulation.(Xen will do the translation if needed).
>>>> 3) Guest's unmodified
>>>> MCE handler will handle the vMCE injected.
>>>> 4) Dom0 will get all log/telemetry through hypercall.
>>>> 5) The action taken by xen will be passed to dom0 through the telemetry
>>>> mechanism.
>>> Mostly. Regarding 2) I want like to discuss first how to handle errors
>>> impacting multiple contiguous physical pages which are non-contigous
>>> in guest physical space.
>>> And I also want to discuss about how to do recovery actions requiring
>>> PCI access. One example for this is
>>> Shanghai's "L3 Cache Index Disable"-Feature.
>>> Xen delegates PCI config space to Dom0 and
>>> via PCI passthrough partly to DomU.
>>> That means, if registers in PCI config space are independently
>>> accessable by Xen, Dom0 and/or DomU, they can interfere with each other.
>>> Therefore, we need to a) clearly define who handles what and
>>> b) define some rules based on a)
>>> c) discuss how to handle Dom0/DomU going wild
>>>     and break the rules defined in b)
>> I also agree on the approach in principle, but would like to see these
>> points addressed. For non-contiguous pages, I suppose Xen
>> could deliver
>> multiple #vMCEs to the guest, split into contiguous parts. The
>> vmce code
>> seems to be set up to be able to do this.
> For the contigous pages, I agree with Gavin that such
> contiguous page error should be triggered as multiple #MC and so is ok.
> For PCI config space issue, Christoph, can you please share
> more information on it (or provide some document as Frank
> suggested), like is it for CE (Correctable error or
> UC(UnCorrectable error), is it in PCI range or PCI-E range
> (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's
> BDF caculated etc. Followed is some of my understanding.
> Firstly, if it is CE, Xen will do nothing and dom0 will take
> recovery action. If it is UC, Xen will take action when all
> CPU is in SoftIRQ context, and dom0 will not take action, so
> it should be ok.
> Secondly, in Xen environment, per my understanding, CPU is
> owned by Xen HV, so I'm not sure when dom0 disable L3 cache
> (if it is CE), should Xen be aware or not. That is, should
> dom0 disable the cache directly, or it should user hypercall
> to ask Xen do that. Keir can give us more suggestion.
> For item C, currently Xen/dom0 can both access configuration
> space, while domU will do that through PCI_frontend/backend.
> Because PCI backend only cover device assigned to domU, so we
> don't need worry about domU and dom0 should be trusted.
> However, one thing left is, if this range is beyond 0x100
> (i.e. in pci-e range), we need add mmconfig support in Xen,
> although it can be added simply.
> Thanks
> -- Yunhong Jiang
>> As for the Shanghai feature: Christoph, are there any documents
>> available on that feature? What kind of errors are delivered
>> (corrected/correctable)? 
>> - Frank
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.