This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: breakage with c/s 19950

To: "Jan Beulich" <JBeulich@xxxxxxxxxx>
Subject: [Xen-devel] Re: breakage with c/s 19950
From: Christoph Egger <Christoph.Egger@xxxxxxx>
Date: Tue, 21 Jul 2009 17:35:11 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 21 Jul 2009 08:36:18 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A65FA3C020000780000B96E@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <200907211649.16613.Christoph.Egger@xxxxxxx> <4A65FA3C020000780000B96E@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.9.7
On Tuesday 21 July 2009 17:26:20 Jan Beulich wrote:
> >>> Christoph Egger <Christoph.Egger@xxxxxxx> 21.07.09 16:49 >>>
> >
> >In c/s 19950, you disable the GART TLB walk error of the northbridge
> >as a workaround for AMD K8 CPUs. The comment says, it happens with IOMMU &
> >3ware & Cerberus.
> >
> >There's a Linux/Dom0 boot problem with this:
> >
> >The Linux/Dom0 kernel also has this workaround which leads to a #GP in the
> >guest because in Xen mce.c:mce_wrmsr() returns -1.
> >
> >The question is, is the workaround really really necessary ?
> >IOMMU is not yet on market and if there will be AMD K8 machines with
> >an IOMMU is very questionable.
> The code has been in Linux for quite some time, and given it relates to K8
> *and* disables some GART functionality I'd suppose the IOMMU talked about
> here is the old GART one, not the not-yet-on-the-market-one.

Comment should be fixed then. Also it is unclear if it happens generally with
3ware and Cerberus products or with certain combinations of them.

> >If the workaround is not needed, please remove it from both Linux and Xen.
> >If the workaround is valid, then mce.c:mce_wrmsr() needs a special check
> >for this workaround to NOT return -1.
> I'd rather say this handling is supposed to happen only in the hypervisor,
> i.e. Dom0 should not even try to do it (which would, without mce_wrmsr(),
> have no effect anyway due to the white-listing of MSRs being writable by
> domains).

So the bug is in Linux then. Below is a snippet from the boot dmesg on a

(XEN) mce.c:693:d0 MCE: rdmsr MCG_CAP lo 105 hi 0
(XEN) mce.c:688:d0 MCE: rd MCG_STATUS lo 0 hi 0
(XEN) mce.c:733:d0 MCE: rd MC0_STATUS
(XEN) mce.c:733:d0 MCE: rd MC1_STATUS
(XEN) mce.c:733:d0 MCE: rd MC2_STATUS
(XEN) mce.c:733:d0 MCE: rd MC3_STATUS
(XEN) mce.c:733:d0 MCE: rd MC4_STATUS
(XEN) mce.c:810:d0 MCE: wrmsr MCG_STATUS 0
(XEN) mce.c:876:d0 MCE: wr MC0_STATUS 0
(XEN) mce.c:876:d0 MCE: wr MC1_STATUS 0
(XEN) mce.c:876:d0 MCE: wr MC2_STATUS 0
(XEN) mce.c:876:d0 MCE: wr MC3_STATUS 0
(XEN) mce.c:852:d0 MCE: value written to MC4_CTLshould be all 0s or 1s (is 
general protection fault: 0000 [1] SMP 
CPU 0 
Modules linked in:
Pid: 0, comm: swapper Not tainted #1
RIP: e030:[<ffffffff8106f37f>]  [<ffffffff8106f37f>] mce_init+0xb1/0xc9
RSP: e02b:ffffffff81511f58  EFLAGS: 00010217
RAX: fffffffffffffbff RBX: 0000000000000105 RCX: 0000000000000410
RDX: 00000000ffffffff RSI: 0000000000000410 RDI: 0000000000000004
RBP: ffffffff8150d100 R08: 0000000000000005 R09: 0000000000000000
R10: ffffffff81511ea8 R11: 0000000000000048 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff814fa000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff81510000, task ffffffff8143e4a0)
Stack:  0000000000020800 ffffffff8106cd20 0000000000000000 0000000000000000
0000000000000000 ffffffff810abd29 0000000000020800 0000000000000000
0000000000000000 ffffffff8151d7c3 0000000000000000 0000000000000000
Call Trace:
[<ffffffff8106cd20>] identify_cpu+0x41c/0x439
[<ffffffff810abd29>] __delayacct_tsk_init+0x19/0x3a
[<ffffffff8151d7c3>] start_kernel+0x25a/0x26e
[<ffffffff8151d215>] _sinittext+0x215/0x21b

Code: 0f 30 31 c0 8d 4e 01 31 d2 0f 30 48 ff c7 83 c6 04 41 39 f8 
RIP  [<ffffffff8106f37f>] mce_init+0xb1/0xc9
RSP <ffffffff81511f58>
<0>Kernel panic - not syncing: Attempted to kill the idle task!
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.

---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>