xen-devel

[Top] [All Lists]

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

from [Frank van der Linden]

[Permanent Link][Original]

To:	"Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Subject:	Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
From:	Frank van der Linden <Frank.Vanderlinden@xxxxxxx>
Date:	Fri, 20 Feb 2009 14:01:14 -0700
Cc:	Gavin Maltby <Gavin.Maltby@xxxxxxx>, Christoph Egger <Christoph.Egger@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Ke, Liping" <liping.ke@xxxxxxxxx>
Delivery-date:	Fri, 20 Feb 2009 13:01:54 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<E2263E4A5B2284449EEBD0AAB751098401C7AACC2B@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<C5BF30B3.2C2B%keir.fraser@xxxxxxxxxxxxx> <200902181905.55015.Christoph.Egger@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7AAC7A0@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <200902191725.32556.Christoph.Egger@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7AACC2B@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Thunderbird 2.0.0.17 (X11/20081023)

I had some time to look over the patches in more detail and the previousdiscussions that were referenced.


From your patches, what you write, and your slides, I gather the following:

* Corrected errors (found through polling and CMCI):
  1) Collected error data (telemetry)
  2) Inform dom0 through the VIRQ.

* Uncorrected errors:
  1) See if any immediate action can be taken (CPU offline,
     page retire)
  2) Collect telemetry
  3) Deliver vMCE to dom0 (and possibly domU)

I think it's fine that the hypervisor takes some immediate action insome cases. It is good to do this as quickly as possible, and only thehypervisor has all the information immediately available.

What would be needed for the Solaris framework, however, is to provideinformation on what action was taken, along with the telemetry. AsChristoph noted, the Solaris FMA code checks, at bootup, if there werecomponents that previously had errors, and if so, it disables them againto prevent further errors. To be able to do this, it needs the fullinformation not just on the error data, but also on any action taken bythe hypervisor, so that it can repeat this action. It may take somemodifications in the FMA code to account for the case where an actionhas already been taken (to avoid trying to take conflicting action), butI think that shouldn't be a big problem. Although I don't know that partof our code very well.

The part that I still have doubts about, is the vMCE code. As far as Ican tell, it takes the information out of the MCA banks, and stores it,per event, in a linked list. Per vMCE, the head of the list is taken andused as an MSR context. The rdmsr instruction is trapped and redirectedto that information. It seems that the wrmsr instruction is accepted,but has no effect (except that if the trap handler writes a value andthen reads it back again immediately, the values will be the same).

The main argument for the vMCE code seems to be that it allows existingMCA handlers to be reused. However, I don't see the advantage in this.Basically, it allows the handler to retrieve the MCA banks through plainrdmsr instructions. Which is fine, but that's as far as it goes. Withoutany additional information, that feature does not seem useful. wrmsrinstructions has no effect.

To take further action, the MCA code in dom0 (or a domU) needs to knowthat it is running under Xen, and it needs to have detailed physicalinformation on the system. In other words, the existing code that can beused is only the code that gathers some information. So, the only thingthat vMCE is good for, is that you can run unmodified error loggingcode. But you can't interpret any of the error information furtherwithout knowing more. Especially for a domU, which might not knowanything, this doesn't seem useful. What would the user of a domU dowith that information?

To recap, I think the part where Xen itself takes action is fine, withsome modifications. But I don't see any advantages in vMCE delivery,unless I'm missing something of course..


- Frank

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, (continued) RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank Van Der Linden Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank Van Der Linden RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden <= RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden Message not available Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden Message not available Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Gavin Maltby RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong

Previous by Date:	[Xen-devel] Re: stubdom (with vif) startup failure, Dulloor
Next by Date:	[Xen-devel] SHUTDOWN_crash and vcpu deferrals, John Levon
Previous by Thread:	RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
Next by Thread:	RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
Indexes:	[Date] [Thread] [Top] [All Lists]