|   | 
      | 
  
  
      | 
      | 
  
 
     | 
    | 
  
  
     | 
    | 
  
  
    |   | 
      | 
  
  
    | 
         
xen-devel
RE: [Xen-devel] RFC: MCA/MCE concept
 
[snip]
> My feeling is that the hypervisor and dom0 own the hardware 
> and as such
> all hardware fault management should reside there.  So we should never
> deliver any form of #MC to a domU, nor should a poll of MCA state from
> a domU ever observe valid state (e.g, make the RDMSR return 0).
> So all handling, logging and diagnosis as well as hardware 
> response actions
> (such as to deploy an online spare chip-select) are controlled
> in the hypervisor/dom0 combination.  That seems a consistent 
> model - e.g.,
> if a domU is migrated to another system it should not carry the
> diagnosis state of the original system across etc, since that 
> belongs with
> the one domain that cannot migrate.
I agree entirely with this. 
> 
> But that is not to say that (I think at a future phase) domU 
> should not
> participate in a higher-level fault management function, at 
> the direction
> of the hypervisor/dom0 combo.  For example if/when we can isolate an
> uncorrectable error to a single domU we could forward such an event to
> the affected domU if it has registered its ability/interest in such
> events.  These won't be in the form of a faked #MC or anything,
> instead they'd be some form of synchronous trap experienced when next
> the affected domU context resumes on CPU.  The intelligent 
> domU handler
> can then decide whether the domU must panic, whether it could simply
> kill the affected process etc.  Those details are clearly 
> sketchy, but the
> idea is to up-level the communication to a domU to be more like
> "you're broken" rather than "here's a machine-level hardware error for
> you to interpret and decide what to do with".
Yes, this makes much more sense than forwarding #MC, as the guest would
have a hard time to actually do anything really useful with this. As far
as I know, most uncorrectable errors are near enough entirely fatal in
most commercial non-Enterprise OS's anyways - e.g. in Windows XP or
Server 2K3, it always ends in a blue-screen - which is hardly any better
than the guest being "humanely euthenazed" by Dom0. 
I take it this would be some sort of hypercall (available through the
regular PV-driver interface for HVM guests) to say "Let me know if I'm
broken - trap on vector X". 
--
Mats
> 
> Gavin
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 
 |   
 
 | 
    | 
  
  
    |   | 
    |