This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH 3 of 3] kexec: disable iommu jumping into the kdu

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Wed, 18 May 2011 22:24:32 +0100
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 18 May 2011 14:25:28 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110518205719.GA5979@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <patchbomb.1305742093@andrewcoop> <aaf44d1a903dcb97b3ac.1305742096@andrewcoop> <20110518184946.GB14013@xxxxxxxxxxxx> <4DD430A4.5000305@xxxxxxxxxx> <20110518205719.GA5979@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10

On 18/05/11 21:57, Konrad Rzeszutek Wilk wrote:
On Wed, May 18, 2011 at 09:48:36PM +0100, Andrew Cooper wrote:

On 18/05/11 19:49, Konrad Rzeszutek Wilk wrote:
On Wed, May 18, 2011 at 07:08:16PM +0100, Andrew Cooper wrote:
kdump kernels are unable to boot with IOMMU enabled,
this patch disabled IOMMU mode and removes some of the generic
code from the shutdown path which doesnt work after other
CPUs have been shot down.

Also, leave local interrupts disabled when jumping into pugatory
purgatory is the bit of code which the a crashing kernel jumps into,
which pretends to do minimal bootloader things to book the kdump
kernel.  It is part of the kexec-tools package.
Ok. Might want to include that in the description.

as we have no idea whats in there and really dont want to be
servicing interrupts when our entire state is invalid.

Signed-off-by: Andrew Cooper<andrew.cooper3@xxxxxxxxxx>

diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/crash.c
--- a/xen/arch/x86/crash.c      Wed May 18 19:00:13 2011 +0100
+++ b/xen/arch/x86/crash.c      Wed May 18 19:00:13 2011 +0100
@@ -27,6 +27,8 @@

  static atomic_t waiting_for_crash_ipi;
  static unsigned int crashing_cpu;
@@ -43,7 +45,10 @@ static int crash_nmi_callback(struct cpu


-    __stop_this_cpu();
+    disable_local_APIC();
+    hvm_cpu_down();
+    clts();
+    asm volatile ( "fninit" );
Can you provide a comment why you are using fninit and clt?
Is this what the Linux kernel does too when it goes through the kexec path?
I was replacing __stop_this_cpu() with the safe subset of its
contents - it was a verbatim copy minus the SMP stuff which the
regular __stop_this_cpu() does.  I suppose I could have split
__stop_this_cpu() to __crash_stop_this_cpu() but it didn't seem
worth making such a trivially small function.
Why can't you use the SMP version? I know you are not running
in SMP mode, but does it hurt?

It sadly does hurt. All the code relating to the x2apic_enabled variable is broken. That variable is used to mean "is the user expecting me to use x2apic mode", "did the bios boot me in x2apic mode" and "what mode is the boot processor currently in". For normal operation, it works (assuming no race conditions creep in when starting non-boot processors). However, x2apic mode really needs to be per_cpu because it depends on the lapic MSR as to whether you should be using MSRs or MMIO to talk to the lapic/ioapic registers, and you kindly get a protection fault if you use MSRs outside of x2apic mode. As a result, 1 global variable doesn't cut it when you are doing an nmi shootdown of processors. I was in the middle of making a fix for that, but Keir made it clear that such a patch would not be accepted.

@@ -56,6 +61,7 @@ static int crash_nmi_callback(struct cpu
  static void nmi_shootdown_cpus(void)
      unsigned long msecs;
+    u64 msr_contents;


@@ -77,18 +83,43 @@ static void nmi_shootdown_cpus(void)

-    __stop_this_cpu();
+    disable_local_APIC();
+    hvm_cpu_down();
+    clts();
+    asm volatile ( "fninit" );
+    /* This is a bit of a hack but there is no other way to shutdown correctly
+     * without a significant refactoring of the APIC code */
+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
+    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
+&&   (msr_contents&   MSR_IA32_APICBASE_EXTD) )
+        x2apic_enabled = 1;
+    else
+        x2apic_enabled = 0;
-    local_irq_enable();
Because that local_irq_enable() results in the interrupt flag being
set when jumping into purgatory, which (at the moment) doesn't touch
interrupts at all.  The result is that interrupts from PCI devices
which are unaware of the crash are (potentially) being serviced by
the xen handlers even though we have left the Xen context for good.
Yikes. Please do explain this in the code right there were you
remove the local_irq_enable..


  void machine_crash_shutdown(void)
      crash_xen_info_t *info;
+    const struct iommu_ops * ops;


+    /* Yes i know this is hacky but it is the easiest solution.  I should add 
an iommu_ops
+     * function called crash() or so which just disables the iommu 'fun' 
without saving state
+     */
+    ops = iommu_get_ops();
+    if(ops)
+        ops->suspend();
Uh, no checking if ops->suspend exists?

True - at the moment both intel and amd iommu_ops are fully
implemented but I will add an extra condition to the if statement.
+    /* Yes i know this is from driver/passthrough/vtd/ but it appears to be 
+     * independant, and also bears little/no relation to x2apic.  Needs 
cleaning up
What about AMD VI IOMMUs? Does it work when that IOMMU is used?

It worked on the AMD box I tested the code on.  Like the comment
says - as far as I can tell, it is architecture independent code.
+     */
+    iommu_disable_x2apic_IR();
Can't that function be done in the suspend code of the IOMMU?
There is a comment in iommu suspend stating that it cant and isn't
done, but rather is left for the local/ioapic_suspend functions
which dont properly work in the kexec path.
OK, how about just moving it out of driver/passthrought/vtd then?
Because that code is fragile enough without me poking about in it. I would prefer someone with more knowledge about IOMMU to make that call.

Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>