Frank/Christopher, can you please give more comments for it, or you are OK with
this?
For the action reporting mechanism, we will send out a proposal for review soon.
Thanks
Yunhong Jiang
Jiang, Yunhong <> wrote:
> Christopher/Frank, thanks for reply very much, see comments below.
>
>> -----Original Message-----
>> From: Frank.Vanderlinden@xxxxxxx [mailto:Frank.Vanderlinden@xxxxxxx] Sent:
>> 2009年2月26日 1:33 To: Christoph Egger
>> Cc: Jiang, Yunhong; Kleen, Andi;
>> xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ke, Liping; Gavin Maltby
>> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>>
>> Christoph Egger wrote:
>>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
>>>
>>>> So, Frank/Egger, can I assume followed are consensus currently?
>>>>
>>>> 1) MCE is handled by Xen HV totally, while guest's vMCE handler will
>>>> only works for itself. 2) Xen present a virtual #MC to guest through MSR
>>>> access emulation.(Xen will do the translation if needed).
>>>> 3) Guest's unmodified
>>>> MCE handler will handle the vMCE injected.
>>>> 4) Dom0 will get all log/telemetry through hypercall.
>>>> 5) The action taken by xen will be passed to dom0 through the telemetry
>>>> mechanism.
>>>
>>> Mostly. Regarding 2) I want like to discuss first how to handle errors
>>> impacting multiple contiguous physical pages which are non-contigous
>>> in guest physical space.
>
>
>>>
>>> And I also want to discuss about how to do recovery actions requiring
>>> PCI access. One example for this is
>>> Shanghai's "L3 Cache Index Disable"-Feature.
>>> Xen delegates PCI config space to Dom0 and
>>> via PCI passthrough partly to DomU.
>>> That means, if registers in PCI config space are independently
>>> accessable by Xen, Dom0 and/or DomU, they can interfere with each other.
>>> Therefore, we need to a) clearly define who handles what and
>>> b) define some rules based on a)
>>> c) discuss how to handle Dom0/DomU going wild
>>> and break the rules defined in b)
>>
>> I also agree on the approach in principle, but would like to see these
>> points addressed. For non-contiguous pages, I suppose Xen
>> could deliver
>> multiple #vMCEs to the guest, split into contiguous parts. The
>> vmce code
>> seems to be set up to be able to do this.
>
> For the contigous pages, I agree with Gavin that such
> contiguous page error should be triggered as multiple #MC and so is ok.
>
> For PCI config space issue, Christoph, can you please share
> more information on it (or provide some document as Frank
> suggested), like is it for CE (Correctable error or
> UC(UnCorrectable error), is it in PCI range or PCI-E range
> (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's
> BDF caculated etc. Followed is some of my understanding.
>
> Firstly, if it is CE, Xen will do nothing and dom0 will take
> recovery action. If it is UC, Xen will take action when all
> CPU is in SoftIRQ context, and dom0 will not take action, so
> it should be ok.
>
> Secondly, in Xen environment, per my understanding, CPU is
> owned by Xen HV, so I'm not sure when dom0 disable L3 cache
> (if it is CE), should Xen be aware or not. That is, should
> dom0 disable the cache directly, or it should user hypercall
> to ask Xen do that. Keir can give us more suggestion.
>
> For item C, currently Xen/dom0 can both access configuration
> space, while domU will do that through PCI_frontend/backend.
> Because PCI backend only cover device assigned to domU, so we
> don't need worry about domU and dom0 should be trusted.
> However, one thing left is, if this range is beyond 0x100
> (i.e. in pci-e range), we need add mmconfig support in Xen,
> although it can be added simply.
>
> Thanks
> -- Yunhong Jiang
>
>>
>> As for the Shanghai feature: Christoph, are there any documents
>> available on that feature? What kind of errors are delivered
>> (corrected/correctable)?
>>
>> - Frank _______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|