This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

To: Christoph Egger <Christoph.Egger@xxxxxxx>
Subject: RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Tue, 3 Mar 2009 00:15:04 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Gavin Maltby <Gavin.Maltby@xxxxxxx>, "Ke, Liping" <liping.ke@xxxxxxxxx>, "Frank.Vanderlinden@xxxxxxx" <Frank.Vanderlinden@xxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Kleen, Andi" <andi.kleen@xxxxxxxxx>
Delivery-date: Mon, 02 Mar 2009 08:15:34 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <200903021558.37334.Christoph.Egger@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C5BF30B3.2C2B%keir.fraser@xxxxxxxxxxxxx> <49A580C0.7050501@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7B6F2C0@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <200903021558.37334.Christoph.Egger@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcmbR4UZcWoRSEtSQGeuNuuufmKobgACbk7w
Thread-topic: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
> For virtual MCEs that is ok. But note, for unmodified guests,
> the MC handler
> is written with the assumption that the CPU powers off when an #MCE
> happens before the handler cleared the MCIP bit in the MCG_STATUS MSR.

That should depends on implementation, for example, we can inject the vMCE one 
by one, i.e. only inject next after the first is handled already.

>> For the contigous pages, I agree with Gavin that such contiguous page error
>> should be triggered as multiple #MC and so is ok.
>> For PCI config space issue, Christoph, can you please share more
>> information on it (or provide some document as Frank suggested), like is it
>> for CE (Correctable error or UC(UnCorrectable error), is it in PCI range or
>> PCI-E range (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's
>> BDF caculated etc. Followed is some of my understanding.
> I would like to see a generic solution that works with any feature
> requiring access to the pci space rather a per-feature solution.

I think the solution is , Xen care for MCE while dom0 care for CE error. Or 
another solution is all PCI access for CPU RAS is done by Xen since Xen owns 
CPU. ISome information like how the pci config space is arranged will be 
helpful, I think.

Yunhong Jiang

>> Firstly, if it is CE, Xen will do nothing and dom0 will take recovery
>> action. If it is UC, Xen will take action when all CPU is in SoftIRQ
>> context, and dom0 will not take action, so it should be ok.
>> Secondly, in Xen environment, per my understanding, CPU is owned by Xen HV,
>> so I'm not sure when dom0 disable L3 cache (if it is CE), should Xen be
>> aware or not. That is, should dom0 disable the cache directly, or it should
>> user hypercall to ask Xen do that. Keir can give us more suggestion.
>> For item C, currently Xen/dom0 can both access configuration space, while
>> domU will do that through PCI_frontend/backend. Because PCI backend only
>> cover device assigned to domU, so we don't need worry about domU and dom0
>> should be trusted. However, one thing left is, if this range is beyond
>> 0x100 (i.e. in pci-e range), we need add mmconfig support in Xen, although
>> it can be added simply. 
>> Thanks
>> -- Yunhong Jiang
>>> As for the Shanghai feature: Christoph, are there any documents
>>> available on that feature? What kind of errors are delivered
>>> (corrected/correctable)? 
>>> - Frank
> --
> ---to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
Xen-devel mailing list