WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] RFC: MCA/MCE concept

To: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Subject: Re: [Xen-devel] RFC: MCA/MCE concept
From: Gavin Maltby <Gavin.Maltby@xxxxxxx>
Date: Fri, 01 Jun 2007 11:57:23 +0100
Cc: "Egger, Christoph" <Christoph.Egger@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Keir Fraser <Keir.Fraser@xxxxxxxxxxxxx>
Delivery-date: Fri, 01 Jun 2007 03:58:10 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <907625E08839C4409CE5768403633E0B02561D9C@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <907625E08839C4409CE5768403633E0B02561D9C@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.0 (X11/20070508)
Hi

On 06/01/07 10:48, Petersson, Mats wrote:
[cut]

Note that Windows kernel drivers are allowed to use the
kernel exception
handling, and ARE allowed to "allow" GP faults if they wish
to do so.
[Don't ask me why MS allows this, but that's the case, so
we have to live
with it].
In that case, it will die sooner or later *after* consuming the data in error.
That means, the guest continues to live for an unknown time...

Yes. What I'm worried about is that if you have a "transient" or "few-bit"
> error in a rarely used, the guest may well live a LONG time with incorrect
> data and potentially not get it detected for quite some time again (say it's
two bits have stuck to 0, and the data is then written back with the zero's 
there
> - next time we read it, no error, since the data has zero's in that location.

I don't believe GP faults and uncorrectable errors really overlap that much.
In a GP fault the extent of the damage is known - you tried to read from
an address not in your address space, you lacked permissions for an operation
etc.  In an uncorrected error situation it is difficult to understand the
bounds of the problem in that way - unless the hardware assists with
data poisoning etc such errors may well be unconstrained and affect
a wider area than just the bracket of code that caught a GP fault.

You can often ring-fence critical code sequences by inserting error
barrier instructions before and after it.  Those operations are
usually very expensive (drain the pipeline or similar) and are
suitable only in special places.

When running natively it is usually the "owner" of affected data that
sees it bad in memory, eg from a read it made.  In those cases we
have the owner on cpu and can kill/signal it synchronously.
There are times when the kernel may be shifting some data
on behalf of the application owner (eg, copyin/copyout, shift
network data etc) in which case we still have a handle on the
real owner.  If the access is from a scrub then we should not
panic - just wait and see if the owner does indeed use the bad data
at which time we take appropriate action.

With the virtualisation layer there is the additional case of the HV or
dom0 performing operations on behalf of a guest, ie the HV may make the
access that traps but it's own state is not affected.

CPU errors get still trickier.  For example what do we do when we're told that
while running guest A we displaced modified data from l2cache that had
uncorrectable ECC?  We have a physical address only, and no idea of who the
data belongs to (guest A, a recently scheduled guest, or the HV?).  Where
cachelines are tagged with some form of context or guest ID you have
a chance, provided that is reported in the error state.

Also consider the case where one cell (or small block of cells) has gone bad,
> but it's only used by one single piece of code that is using this try/catch 
code?
> I know, this is probably relatively rare, but I'm still worried that it will 
"break" things...

I'm not sure if Linux, Solaris, *BSD, OS/2 or other OS's will allow
"catching" a Kernel GP fault in a non-precise fashion (I
know Linux has
exception handling for EXACT positions in the code). But
since at least one
kernel DOES allow this, we can't be sure that a GPF will
destroy the guest.

When Linux and *BSD see a GPF while they are in userspace, then they kill the process with a SIGSEGV. If they are in kernelspace, then they panic.

Solaris has some wrappers that can be applied, maybe at some expense to
performance, to make protected accesses that will catch and
survive various types of error including hardware errors,
wild pointers etc.

Second point to note is of course that if the guest is in
user-mode when
the GPF happens, then almost all OS's will just kill the
application - and
there's absolutely no reason to believe that the
application running is
necessarily where the actual memory problem is - it may be
caused by memory
scrubbing for example.

Yes, these are the myriad permutations I was alluding to above.

Whatever we do to the guest, it should be a "certain
death", unless the

Yes, certain and instant death unless it is a PV guest that has registered
the ability to deal with these more elegantly.

kernel has told us "I can handle MCE's".
It is obvious that there is no absolute generic way to handle all sort of buggy guests. I vote for:

If DomU has a PV MCA driver use this or inject a GPF.
Multiplexing all the MSR's related to emulate MCA/MCE for the guests is much
more complex than just injecting a GPF - and slower.

Do we need to send the non-PV guest a signal of any kind to kill it?
After all, we can stop it running any further instructions (and perhaps
avoid the use of bad data) by deciding within the HV or dom0 simply
to abort that guest.  There is no loss to diagnosability since the
HV/dom0 combination is doing that, anyway.

Emulating MCE to the guest wasn't my intended alternative suggestion. Instead,
my idea was that if the guest hasn't registered a "PV MCE handler", we just immediately kill the domain as such - e.g similar to "domain_crash_synchronous()".
> Don't let the guest have any chance to "do something wrong" in the process - 
it's
> already broken, and letting it run any further will almost certainly not help 
matters.
> This may not be the prettiest solution, but then on the other hand, a "Windows 
blue-screen"
> or Linux "oops" saying GP fault happened at some random place in the guest 
isn't exactly
helping the SysAdmin understand the problem either.

Agreed - don't let the affected guest run one more instruction if we can.  
Sysadmins
will learn to consult dom0 diagnostics to see if they explain any sudden guest 
deaths -
no need, as you say, to splurge any raw error data to them.

Gavin


--
Mats
Keir, what are your opinions on this thread?


Christoph

--
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel





--
Gavin Maltby, Solaris Kernel Development.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel