RE: [Xen-devel] RFC: MCA/MCE concept

 

> -----Original Message-----
> From: Gavin.Maltby@xxxxxxx [mailto:Gavin.Maltby@xxxxxxx] 
> Sent: 01 June 2007 11:57
> To: Petersson, Mats
> Cc: Egger, Christoph; xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser
> Subject: Re: [Xen-devel] RFC: MCA/MCE concept
> 
> Hi
> 
> On 06/01/07 10:48, Petersson, Mats wrote:
> [cut]
> 
> >>> Note that Windows kernel drivers are allowed to use the 
> >> kernel exception
> >>> handling, and ARE allowed to "allow" GP faults if they wish 
> >> to do so.
> >>> [Don't ask me why MS allows this, but that's the case, so 
> >> we have to live
> >>> with it].
> >> In that case, it will die sooner or later *after* consuming 
> >> the data in error.
> >> That means, the guest continues to live for an unknown time...
> > 
> > Yes. What I'm worried about is that if you have a 
> "transient" or "few-bit"
>  > error in a rarely used, the guest may well live a LONG 
> time with incorrect
>  > data and potentially not get it detected for quite some 
> time again (say it's
> > two bits have stuck to 0, and the data is then written back 
> with the zero's there
>  > - next time we read it, no error, since the data has 
> zero's in that location.
> 
> I don't believe GP faults and uncorrectable errors really 
> overlap that much.
> In a GP fault the extent of the damage is known - you tried 
> to read from
> an address not in your address space, you lacked permissions 
> for an operation
> etc.  In an uncorrected error situation it is difficult to 
> understand the
> bounds of the problem in that way - unless the hardware assists with
> data poisoning etc such errors may well be unconstrained and affect
> a wider area than just the bracket of code that caught a GP fault.
> 
> You can often ring-fence critical code sequences by inserting error
> barrier instructions before and after it.  Those operations are
> usually very expensive (drain the pipeline or similar) and are
> suitable only in special places.
> 
> When running natively it is usually the "owner" of affected data that
> sees it bad in memory, eg from a read it made.  In those cases we
> have the owner on cpu and can kill/signal it synchronously.
> There are times when the kernel may be shifting some data
> on behalf of the application owner (eg, copyin/copyout, shift
> network data etc) in which case we still have a handle on the
> real owner.  If the access is from a scrub then we should not
> panic - just wait and see if the owner does indeed use the bad data
> at which time we take appropriate action.
> 
> With the virtualisation layer there is the additional case of 
> the HV or
> dom0 performing operations on behalf of a guest, ie the HV 
> may make the
> access that traps but it's own state is not affected.
> 
> CPU errors get still trickier.  For example what do we do 
> when we're told that
> while running guest A we displaced modified data from l2cache that had
> uncorrectable ECC?  We have a physical address only, and no 
> idea of who the
> data belongs to (guest A, a recently scheduled guest, or the 
> HV?).  Where
> cachelines are tagged with some form of context or guest ID you have
> a chance, provided that is reported in the error state.
> 
> > Also consider the case where one cell (or small block of 
> cells) has gone bad,
>  > but it's only used by one single piece of code that is 
> using this try/catch code?
>  > I know, this is probably relatively rare, but I'm still 
> worried that it will "break" things...
> 
> >>> I'm not sure if Linux, Solaris, *BSD, OS/2 or other OS's 
> will allow
> >>> "catching" a Kernel GP fault in a non-precise fashion (I 
> >> know Linux has
> >>> exception handling for EXACT positions in the code). But 
> >> since at least one
> >>> kernel DOES allow this, we can't be sure that a GPF will 
> >> destroy the guest.
> >>
> >> When Linux and *BSD see a GPF while they are in userspace, 
> >> then they kill
> >> the process with a SIGSEGV. If they are in kernelspace, then 
> >> they panic.
> 
> Solaris has some wrappers that can be applied, maybe at some 
> expense to
> performance, to make protected accesses that will catch and
> survive various types of error including hardware errors,
> wild pointers etc.
> 
> >>> Second point to note is of course that if the guest is in 
> >> user-mode when
> >>> the GPF happens, then almost all OS's will just kill the 
> >> application - and
> >>> there's absolutely no reason to believe that the 
> >> application running is
> >>> necessarily where the actual memory problem is - it may be 
> >> caused by memory
> >>> scrubbing for example.
> 
> Yes, these are the myriad permutations I was alluding to above.
> 
> >>> Whatever we do to the guest, it should be a "certain 
> >> death", unless the
> 
> Yes, certain and instant death unless it is a PV guest that 
> has registered
> the ability to deal with these more elegantly.
> 
> >>> kernel has told us "I can handle MCE's".
> >> It is obvious that there is no absolute generic way to handle 
> >> all sort of 
> >> buggy guests. I vote for:
> >>
> >> If DomU has a PV MCA driver use this or inject a GPF.
> >> Multiplexing all the MSR's related to emulate MCA/MCE for the 
> >> guests is much
> >> more complex than just injecting a GPF - and slower.
> 
> Do we need to send the non-PV guest a signal of any kind to kill it?
> After all, we can stop it running any further instructions 
> (and perhaps
> avoid the use of bad data) by deciding within the HV or dom0 simply
> to abort that guest.  There is no loss to diagnosability since the
> HV/dom0 combination is doing that, anyway.

No, HV can "kill" a guest without notifying the guest - in worst case, it may 
need to pause the physical CPU that may still be running on the guest (e.g. we 
have multiple CPU's, one of them got "bad data", but other CPU's are still 
processing stuff). But the pausing of the CPU is a "in hypervisor" function, so 
still no need to tell the guest anything - just "Bang, you're dead" type thing. 

> 
> > Emulating MCE to the guest wasn't my intended alternative 
> suggestion. Instead,
> > my idea was that if the guest hasn't registered a "PV MCE 
> handler", we just 
> > immediately kill the domain as such - e.g similar to 
> "domain_crash_synchronous()".
>  > Don't let the guest have any chance to "do something 
> wrong" in the process - it's
>  > already broken, and letting it run any further will almost 
> certainly not help matters.
>  > This may not be the prettiest solution, but then on the 
> other hand, a "Windows blue-screen"
>  > or Linux "oops" saying GP fault happened at some random 
> place in the guest isn't exactly
> > helping the SysAdmin understand the problem either. 
> 
> Agreed - don't let the affected guest run one more 
> instruction if we can.  Sysadmins
> will learn to consult dom0 diagnostics to see if they explain 
> any sudden guest deaths -
> no need, as you say, to splurge any raw error data to them.

Exactly, particularly when it's bogus raw error data, that isn't actually 
caused by the original problem. 

--
Mats
> 
> Gavin
> 
> > 
> > --
> > Mats
> >> Keir, what are your opinions on this thread?
> >>
> >>
> >> Christoph
> >>
> >> -- 
> >> AMD Saxony, Dresden, Germany
> >> Operating System Research Center
> >>
> >> Legal Information:
> >> AMD Saxony Limited Liability Company & Co. KG
> >> Sitz (Geschäftsanschrift):
> >>    Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
> >> Registergericht Dresden: HRA 4896
> >> vertretungsberechtigter Komplementär:
> >>    AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
> >> Geschäftsführer der AMD Saxony LLC:
> >>    Dr. Hans-R. Deppe, Thomas McCoy
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> >>
> >>
> >>
> > 
> 
> -- 
> Gavin Maltby, Solaris Kernel Development.
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] RFC: MCA/MCE concept