WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] RFC: MCA/MCE concept

To: "Egger, Christoph" <Christoph.Egger@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] RFC: MCA/MCE concept
From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Date: Fri, 1 Jun 2007 10:55:28 +0200
Cc: Gavin Maltby <Gavin.Maltby@xxxxxxx>
Delivery-date: Fri, 01 Jun 2007 01:53:57 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <200706011011.35336.Christoph.Egger@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcekJJ822FUAj9U0QQ2SBvOOlzlL/AABBI4g
Thread-topic: [Xen-devel] RFC: MCA/MCE concept
 

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Christoph Egger
> Sent: 01 June 2007 09:12
> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> Cc: Gavin Maltby
> Subject: Re: [Xen-devel] RFC: MCA/MCE concept
> 
> On Wednesday 30 May 2007 17:03:55 Petersson, Mats wrote:
> > [snip]
> >
> > > My feeling is that the hypervisor and dom0 own the hardware
> > > and as such
> > > all hardware fault management should reside there.  So we 
> should never
> > > deliver any form of #MC to a domU, nor should a poll of 
> MCA state from
> > > a domU ever observe valid state (e.g, make the RDMSR return 0).
> > > So all handling, logging and diagnosis as well as hardware
> > > response actions
> > > (such as to deploy an online spare chip-select) are controlled
> > > in the hypervisor/dom0 combination.  That seems a consistent
> > > model - e.g.,
> > > if a domU is migrated to another system it should not carry the
> > > diagnosis state of the original system across etc, since that
> > > belongs with
> > > the one domain that cannot migrate.
> >
> > I agree entirely with this.
> >
> > > But that is not to say that (I think at a future phase) domU
> > > should not
> > > participate in a higher-level fault management function, at
> > > the direction
> > > of the hypervisor/dom0 combo.  For example if/when we can 
> isolate an
> > > uncorrectable error to a single domU we could forward 
> such an event to
> > > the affected domU if it has registered its 
> ability/interest in such
> > > events.  These won't be in the form of a faked #MC or anything,
> > > instead they'd be some form of synchronous trap 
> experienced when next
> > > the affected domU context resumes on CPU.  The intelligent
> > > domU handler
> > > can then decide whether the domU must panic, whether it 
> could simply
> > > kill the affected process etc.  Those details are clearly
> > > sketchy, but the
> > > idea is to up-level the communication to a domU to be more like
> > > "you're broken" rather than "here's a machine-level 
> hardware error for
> > > you to interpret and decide what to do with".
> >
> > Yes, this makes much more sense than forwarding #MC, as the 
> guest would
> > have a hard time to actually do anything really useful with 
> this. As far
> > as I know, most uncorrectable errors are near enough 
> entirely fatal in
> > most commercial non-Enterprise OS's anyways - e.g. in Windows XP or
> > Server 2K3, it always ends in a blue-screen - which is 
> hardly any better
> > than the guest being "humanely euthenazed" by Dom0.
> >
> > I take it this would be some sort of hypercall (available 
> through the
> > regular PV-driver interface for HVM guests) to say "Let me 
> know if I'm
> > broken - trap on vector X".
> 
> For short, guests with a PV MCA driver will see a certain event
> (assuming the event mechanism will be used for the notification)
> and guests w/o a PV MCA driver will see a "General Protection Fault".
> Is that right?

Not sure if GP fault is the right thing for non-"MCA PV driver" domains. I 
think "just killing" the domain is the right thing to do. 

We can't gurantee that a GP fault is actually going to "kill" the guest. Let's 
assume the code that ran on the guest was something along the lines of:


int some_function(...)
{
   ... 

   try {
      ...
      /* Some code that does quite a lot of "random" processing that may cause, 
for example, GP fault */
        ...
   } catch(Exception e) 
   {
        ... 
        /* handles GP fault within the kernel code */
        ...
   }
}


Note that Windows kernel drivers are allowed to use the kernel exception 
handling, and ARE allowed to "allow" GP faults if they wish to do so. [Don't 
ask me why MS allows this, but that's the case, so we have to live with it].

I'm not sure if Linux, Solaris, *BSD, OS/2 or other OS's will allow "catching" 
a Kernel GP fault in a non-precise fashion (I know Linux has exception handling 
for EXACT positions in the code). But since at least one kernel DOES allow 
this, we can't be sure that a GPF will destroy the guest. 

Second point to note is of course that if the guest is in user-mode when the 
GPF happens, then almost all OS's will just kill the application - and there's 
absolutely no reason to believe that the application running is necessarily 
where the actual memory problem is - it may be caused by memory scrubbing for 
example. 

Whatever we do to the guest, it should be a "certain death", unless the kernel 
has told us "I can handle MCE's". 

--
Mats

> 
> > --
> > Mats
> >
> > > Gavin
> > >
> 
> -- 
> AMD Saxony, Dresden, Germany
> Operating System Research Center
> 
> Legal Information:
> AMD Saxony Limited Liability Company & Co. KG
> Sitz (Geschäftsanschrift):
>    Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
> Registergericht Dresden: HRA 4896
> vertretungsberechtigter Komplementär:
>    AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
> Geschäftsführer der AMD Saxony LLC:
>    Dr. Hans-R. Deppe, Thomas McCoy
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel