This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, "Frank.Vanderlinden@xxxxxxx" <Frank.Vanderlinden@xxxxxxx>, Christoph Egger <Christoph.Egger@xxxxxxx>
Subject: RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Mon, 2 Mar 2009 13:51:22 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "Kleen, Andi" <andi.kleen@xxxxxxxxx>, Gavin Maltby <Gavin.Maltby@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Ke, Liping" <liping.ke@xxxxxxxxx>
Delivery-date: Sun, 01 Mar 2009 21:53:16 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C5BF30B3.2C2B%keir.fraser@xxxxxxxxxxxxx> <49A45CF0.6080807@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7B6E888@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <200902251319.29299.Christoph.Egger@xxxxxxx> <49A580C0.7050501@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcmXbx7+5I84RFRYQPuwLzcuL7KWogAQyiFwANIVvRA=
Thread-topic: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Frank/Christopher, can you please give more comments for it, or you are OK with 
For the action reporting mechanism, we will send out a proposal for review soon.

Yunhong Jiang

Jiang, Yunhong <> wrote:
> Christopher/Frank, thanks for reply very much, see comments below.
>> -----Original Message-----
>> From: Frank.Vanderlinden@xxxxxxx [mailto:Frank.Vanderlinden@xxxxxxx] Sent:
>> 2009年2月26日 1:33 To: Christoph Egger
>> Cc: Jiang, Yunhong; Kleen, Andi;
>> xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ke, Liping; Gavin Maltby
>> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>> Christoph Egger wrote:
>>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
>>>> So, Frank/Egger, can I assume followed are consensus currently?
>>>> 1) MCE is handled by Xen HV totally, while guest's vMCE handler will
>>>> only works for itself. 2) Xen present a virtual #MC to guest through MSR
>>>> access emulation.(Xen will do the translation if needed).
>>>> 3) Guest's unmodified
>>>> MCE handler will handle the vMCE injected.
>>>> 4) Dom0 will get all log/telemetry through hypercall.
>>>> 5) The action taken by xen will be passed to dom0 through the telemetry
>>>> mechanism.
>>> Mostly. Regarding 2) I want like to discuss first how to handle errors
>>> impacting multiple contiguous physical pages which are non-contigous
>>> in guest physical space.
>>> And I also want to discuss about how to do recovery actions requiring
>>> PCI access. One example for this is
>>> Shanghai's "L3 Cache Index Disable"-Feature.
>>> Xen delegates PCI config space to Dom0 and
>>> via PCI passthrough partly to DomU.
>>> That means, if registers in PCI config space are independently
>>> accessable by Xen, Dom0 and/or DomU, they can interfere with each other.
>>> Therefore, we need to a) clearly define who handles what and
>>> b) define some rules based on a)
>>> c) discuss how to handle Dom0/DomU going wild
>>>     and break the rules defined in b)
>> I also agree on the approach in principle, but would like to see these
>> points addressed. For non-contiguous pages, I suppose Xen
>> could deliver
>> multiple #vMCEs to the guest, split into contiguous parts. The
>> vmce code
>> seems to be set up to be able to do this.
> For the contigous pages, I agree with Gavin that such
> contiguous page error should be triggered as multiple #MC and so is ok.
> For PCI config space issue, Christoph, can you please share
> more information on it (or provide some document as Frank
> suggested), like is it for CE (Correctable error or
> UC(UnCorrectable error), is it in PCI range or PCI-E range
> (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's
> BDF caculated etc. Followed is some of my understanding.
> Firstly, if it is CE, Xen will do nothing and dom0 will take
> recovery action. If it is UC, Xen will take action when all
> CPU is in SoftIRQ context, and dom0 will not take action, so
> it should be ok.
> Secondly, in Xen environment, per my understanding, CPU is
> owned by Xen HV, so I'm not sure when dom0 disable L3 cache
> (if it is CE), should Xen be aware or not. That is, should
> dom0 disable the cache directly, or it should user hypercall
> to ask Xen do that. Keir can give us more suggestion.
> For item C, currently Xen/dom0 can both access configuration
> space, while domU will do that through PCI_frontend/backend.
> Because PCI backend only cover device assigned to domU, so we
> don't need worry about domU and dom0 should be trusted.
> However, one thing left is, if this range is beyond 0x100
> (i.e. in pci-e range), we need add mmconfig support in Xen,
> although it can be added simply.
> Thanks
> -- Yunhong Jiang
>> As for the Shanghai feature: Christoph, are there any documents
>> available on that feature? What kind of errors are delivered
>> (corrected/correctable)? 
>> - Frank
Xen-devel mailing list