Hi, Frank
I am now doing some tests based on latest Intel platform for this patch
since CMCI needs some owner_checking and only the owned CPU will report
the error.
Without the patch, when CMCI happened, since CPU0 is the owner of bank8,
so when do checking, Only CPU0 will report the error.
Below is the correct log
(XEN) CMCI: cmci_intr happen on CPU3
[root@lke-ep inject]# (XEN) CMCI: cmci_intr happen on CPU2
(XEN) CMCI: cmci_intr happen on CPU0
(XEN) CMCI: cmci_intr happen on CPU1
(XEN) mcheck_poll: bank8 CPU0 status[cc0000800001009f]
(XEN) mcheck_poll: CPU0, SOCKET0, CORE0, APICID[0], thread[0]
(XEN) MCE: The hardware reports a non fatal, correctable incident occured on CPU
0.
After applied your patch, I found all CPUs will report the error.
Below is the log
(XEN) MCE: The hardware reports a non fatal, correctable i
ncident occured on CPU 0.
(XEN) MCE: The hardware reports a non fatal, correctable incident occured on CPU
2.
(XEN) MCE: The hardware reports a non fatal, correctable incident occured on CPU
3.
(XEN) MCE: The hardware reports a non fatal, correctable incident occured on CPU
1.
(XEN) Bank 8: cc0000c00001009f<1>Bank 8: 8c0000400001009f<1>Bank 8: cc0001c00001
009f<1>MCE: The hardware reports a non fatal, correctable incident occured on CP
U 0.
I noticed your patch has passed in the cmci_owner mask, I can't see the reason
since
this is really a big patch. I need some time to figure it out.
Also we found the polling mechanism has some changes. My feeling is that this
patch is really too big.
We can't easily figured out the impaction to our checked-in codes right now.
Just wonder whether you could split this big patch into two parts :-)
part1: mce log telem mechanism and required mce_intel interfaces changes. So
that we can verify easily
whether the new interfaces works fine for our CMCI as well as non-fatal
polling. I guess this should not be a big work,
you can just modify the new telem interfaces machine_check_poll?
part2: common handler part. (including both CMCI parts and non-fatal polling
parts).
How do you think about it :-)
Thanks a lot for your help!
Criping
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Frank van der Linden
Sent: 2009年3月17日 7:28
To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] [PATCH] re-work MCA telemetry internals; use common code
for Intel/AMD MCA
The following patch reworks the MCA error telemetry handling inside Xen,
and shares code between the Intel and AMD implementations as much as
possible.
I've had this patch sitting around for a while, but it wasn't ported to
-unstable yet. I finished porting and testing it, and am submitting it
now, because the Intel folks want to go ahead and submit their new
changes, so we agreed that I should push our changes first.
Brief explanation of the telemetry part: previously, the telemetry was
accessed in a global array, with index variables used to access it.
There were some issues with that: race conditions with regard to new
machine checks (or CMCIs) coming in while handling the telemetry, and
interaction with domains having been notified or not, which was a bit
hairy. Our changes (I should say: Gavin Maltby's changes, as he did the
bulk of this work for our 3.1 based tree, I merely ported/extended it to
3.3 and beyond) make telemetry access transactional (think of a
database). Also, the internal database updates are atomic, since the
final commit is done by a pointer swap. There is a brief explanation of
the mechanism in mctelem.h.This patch also removes dom0->domU
notification, which is ok, since Intel's upcoming changes will replace
domU notification with a vMCE mechanism anyway.
The common code part is pretty much what it says. It defines a common
MCE handler, with a few hooks for the special needs of the specific CPUs.
I've been told that Intel's upcoming patch will need to make some parts
of the common code specific to the Intel CPU again, but we'll work
together to use as much common code as possible.
- Frank
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|