WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] RFC: MCA/MCE concept

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] RFC: MCA/MCE concept
From: "Christoph Egger" <Christoph.Egger@xxxxxxx>
Date: Wed, 30 May 2007 11:10:49 +0200
Cc: Gavin Maltby <Gavin.Maltby@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxxxx>
Delivery-date: Wed, 30 May 2007 02:12:10 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <465D56C4.76E4.0078.0@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: AMD / OSRC
References: <200705291732.46709.Christoph.Egger@xxxxxxx> <200705300945.51163.Christoph.Egger@xxxxxxx> <465D56C4.76E4.0078.0@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.9.6
On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote:
> >>> "Christoph Egger" <Christoph.Egger@xxxxxxx> 30.05.07 09:45 >>>
> >
> >On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
> >> >case I) - Xen reveives a MCE from the CPU
> >> >
> >> >1) Xen MCE handler figures out if error is an correctable error (CE)
> >> >    or uncorrectable error (UE)
> >> >2a) error == CE:
> >> >     Xen notifies Dom0 if Dom0 installed an MCA event handler
> >> >     for statistical purpose
> >> >2b) error == UE and UE impacts Xen or Dom0:
> >>
> >> A very important aspect here is how you want to classify what impact an
> >> uncorrectable has - generally, I can see very few situations where you
> >> could confine the impact to a sub-portion of the system (i.e. a single
> >> domU, dom0, or Xen). The general rule in my opinion must be to halt the
> >> system, the question just is how likely it is that you can get a
> >> meaningful message out (to screen, serial, or logs) that can help
> >> analyze the problem afterwards. If it is somewhat likely, then dom0
> >> should be involved, otherwise Xen should just shut down the system.
> >
> >Here you can best help out using HW features to handle errors.
> >AMD CPUs features online-spare RAM and Chipkill since K8 RevF.
> >
> >CPUs such as the Sparc features Data Poisoning. That would be the
> >most handy technique that can be used here.
>
> But that assumes the error is recoverable (i.e. no other data got
> corrupted). You still didn't clarify how you intend to determine the
> impact an uncorrectable error had.

I know. I am lacking a sudden inspiration here.
That's why I discuss this here before writing code that goes to nowhere.
Anyone here with a flash of genius? :-)


> >> >3a) DomU is a PV guest:
> >> >       if DomU installed MCA event handler, it gets notified to perform
> >> >          self-healing
> >> >       if DomU did not install MCA event handler, notify Dom0 to do
> >> >          some operations on DomU (case II)
> >> >       if neither DomU nor Dom0 did not install MCA event handlers,
> >> >          then Xen kills DomU
> >> >3b) DomU is a HVM guest:
> >> >       if DomU features a PV driver then behave as in 3a)
> >>
> >> What significance do pv drivers have here? Or do you mean a pv MCA
> >> driver?
> >
> >Yes, I mean the pv MCA driver.
> >
> >> >       if DomU enabled MCA/MCE via MSR, inject MCE into guest
> >> >       if DomU did not enable MCA/MCE via MSR, notify Dom0
> >> >            to do some operations on DomU (case II)
> >> >       if neither DomU enabled MCA/MCE nor Dom0 did not install
> >> >            MCA event handler, Xen kills DomU
> >>
> >> Injecting an MCE to a hvm guest seems at least questionable. It can't
> >> really do anything about it (it doesn't even know the real topology of
> >> the system it's running on, so addresses stored in MSRs are meaningless
> >> - either you allow them to be read untranslated [in which case the guest
> >> cannot make sense of them] or you do translation for the guest [in which
> >> case it might make assumptions about co-locality of other nearby pages
> >> which will be wrong]).
> >
> >Yes, Xen should do the translation for the guest. The assumptions must
> >be fixed then. I know that's easier said than done.
>
> Exactly - you are proposing to fix all possible OSes, including
> sufficiently old ones. That's impossible. And I can't even see why an OS
> intended to run on native hardware would care to try to deal with
> virtualization aspects like this.

I think, it was not obvious that
Xen should not inject failures into DomU that don't feature
a fault management. In this case, either Dom0 tells Xen what
to do with the DomU or Xen just kills the DomU.

<snippet from above>
> >> >3a) DomU is a PV guest:
                    ....
> >> >       if DomU did not install MCA event handler, notify Dom0 to do
> >> >          some operations on DomU (case II)
> >> >       if neither DomU nor Dom0 did not install MCA event handlers,
> >> >          then Xen kills DomU

> >> >3b) DomU is a HVM guest:
                    ....
> >> >       if DomU did not enable MCA/MCE via MSR, notify Dom0
> >> >            to do some operations on DomU (case II)
> >> >       if neither DomU enabled MCA/MCE nor Dom0 did not install
> >> >            MCA event handler, Xen kills DomU
</snippet>


Christoph

-- 
AMD Saxony, Dresden Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel