This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: [Xen-devel] [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN
From: "Ke, Liping" <liping.ke@xxxxxxxxx>
Date: Fri, 20 Mar 2009 13:02:24 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 19 Mar 2009 22:03:08 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcmpGQ7exSph+lTXQ3GLtU+il4Yylw==
Thread-topic: [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN
Hi, Keir

The patches are for MCA enabling in XEN. Those patches based on AMD and SUN's 
MCA related jobs.
We have some discussions with AMD/SUN and did refinements from the last 
sending. Also we rebase it after 
SUN's latest improvements. We will have following patches for recovery actions. 
This is a basic framework 
for Intel MCA.
Some implementation notes:
1) When error happens, if the error is fatal (pcc = 1) or can't be recovered 
(pcc = 0, yet no good recovery methods),
    for avoiding losing logs in DOM0, we will reset machine immediately. Most 
of MCA MSRs are sticky. After reboot, 
    MCA polling mechanism will send vIRQ to DOM0 for logging.
2) When MCE# happens, all CPUs enter MCA context. The first CPU who read&clear 
the error MSR bank will be this
    MCE# owner. Necessary locks/synchronization will help to judge the owner 
and select most severe error.
3) For convenience, we will select the most offending CPU to do most of 
processing&recovery job.
4) MCE# happens, we will do three jobs:
    a. Send vIRQ to DOM0 for logging
    b. Send vMCE# to Impacted Guest (Currently Only inject to impacted DOM0)
    c. Guest vMCE MSR virtualization
5) Some further improvement/adds for newer CPUs might be done  later
    a) Connection with recovery actions (cpu/memory online/offline)
    b) More software-recovery identification in severity_scan
    c) More refines and tests for HVM might be done when needed.
For discussion details between amd/sun: please refer to the mail thread: 

Patch Description:
1. intel_mce_base: Basic MCA enabling support For Intel. 
2. vmsr_virtualization: Guest MCE# MSR read/write virtualization support in XEN.
3. interface: xen/dom0 interface, let DOM0 know the recovery details in XEN
    For interface discussion details, please refer to the mail thread:
About Test:
We did some internal test and the result is just fine.

Any problem, just let me know.
Thanks a lot for your help!

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>