|
|
|
|
|
|
|
|
|
|
xen-bugs
[Xen-bugs] [Bug 1562] New: On NHM-EX ER, two SRAO errors will cause CPU
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1562
Summary: On NHM-EX ER, two SRAO errors will cause CPU shutdown.
Product: Xen
Version: unstable
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P1
Component: Hypervisor
AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx
ReportedBy: jiajun.xu@xxxxxxxxx
xen-changeset: 20122:8faef78ea759
pvops git:
commit 16529fc075a95a84901842f7353ac906cd912bba
Merge: 5d78a20... 3186c67...
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
ioemu git:
commit a83d119cfcc20bc7edb427992d6e31b3e99430be
Author: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Date: Mon Aug 10 18:02:56 2009 +0100
---
the cause is MCE handler doesn't clear up MCG_STATUS MCIP bit.
when I inject a SRAO, get following log:
(XEN) MCE: clear_bank map 100
(XEN) CPU42 enter softirq
(XEN) CPU10 enter softirq
(XEN) CPU54 enter softirq
(XEN) CPU22 enter softirq
(XEN) CPU6 enter softirq
(XEN) CPU38 enter softirq
(XEN) CPU46 enter softirq
(XEN) CPU14 enter softirq
(XEN) CPU30 enter softirq
(XEN) CPU62 enter softirq
(XEN) CPU58 enter softirq
(XEN) CPU26 enter softirq
(XEN) CPU34 enter softirq
(XEN) CPU2 enter softirq
(XEN) CPU18 enter softirq
(XEN) CPU50 enter softirq
(XEN) CPU41 enter softirq
(XEN) CPU9 enter softirq
(XEN) CPU33 enter softirq
(XEN) CPU1 enter softirq
(XEN) CPU49 enter softirq
(XEN) CPU17 enter softirq
(XEN) CPU61 enter softirq
(XEN) CPU29 enter softirq
(XEN) CPU45 enter softirq
(XEN) CPU13 enter softirq
(XEN) CPU5 enter softirq
(XEN) CPU37 enter softirq
(XEN) CPU57 enter softirq
(XEN) CPU25 enter softirq
(XEN) CPU21 enter softirq
(XEN) CPU53 enter softirq
(XEN) CPU15 enter softirq
(XEN) CPU47 enter softirq
(XEN) CPU23 enter softirq
(XEN) CPU55 enter softirq
(XEN) CPU7 enter softirq
(XEN) CPU39 enter softirq
(XEN) CPU31 enter softirq
(XEN) CPU63 enter softirq
(XEN) CPU27 enter softirq
(XEN) CPU59 enter softirq
(XEN) CPU19 enter softirq
(XEN) CPU51 enter softirq
(XEN) CPU43 enter softirq
(XEN) CPU11 enter softirq
(XEN) CPU0: Machine Check Exception: 5
(XEN) CPU3 enter softirq
(XEN) Bank 8: bd000000004000cf[ 89] at 85cb4f040
(XEN) CPU32 enter softirq
(XEN) CPU0 enter softirq
(XEN) CPU4 enter softirq
(XEN) CPU36 enter softirq
(XEN) CPU16 enter softirq
(XEN) CPU48 enter softirq
(XEN) CPU28 enter softirq
(XEN) CPU60 enter softirq
(XEN) CPU12 enter softirq
(XEN) CPU44 enter softirq
(XEN) CPU56 enter softirq
(XEN) CPU24 enter softirq
(XEN) CPU40 enter softirq
(XEN) CPU8 enter softirq
(XEN) CPU52 enter softirq
(XEN) CPU20 enter softirq
(XEN) CPU35 enter softirq
(XEN) CPU26 handling errors
(XEN) MCE: send MCE# to DOM0 through virq
(XEN) mce.c:694:d0 MCE: rdmsr MCG_CAP 0x1000816
Then I inject a CMCI error, get following log:
(XEN) CMCI: send CMCI to DOM0 through virq
(XEN) mce.c:694:d0 MCE: rdmsr MCG_CAP 0x100081
[root@lkp-nex03 einj]# mcelog
mcelog: warning: record length longer than expected. Consider update.
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 BANK 8 TSC 8d40ecdc30
STATUS d00000800800009f MCGSTATUS 5
from MCGSTATUS=5, we can get MCIP is still there not cleared.
from code logic, seems mce_action is not called, i.e. UCR handler code is not
executed.
--
Configure bugmail:
http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
_______________________________________________
Xen-bugs mailing list
Xen-bugs@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-bugs
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-bugs] [Bug 1562] New: On NHM-EX ER, two SRAO errors will cause CPU shutdown.,
bugzilla-daemon <=
|
|
|
|
|