[Xen-devel] Re: [PATCH 02/02] Kexec / Kdump: Don't declare _end

To:	"Ian Campbell" <Ian.Campbell@xxxxxxxxxxxxx>
Subject:	[Xen-devel] Re: [PATCH 02/02] Kexec / Kdump: Don't declare _end
From:	"Magnus Damm" <magnus.damm@xxxxxxxxx>
Date:	Tue, 5 Dec 2006 15:55:44 +0900
Cc:	Magnus Damm <magnus@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Akio Takebe <takebe_akio@xxxxxxxxxxxxxx>, Alex Williamson <alex.williamson@xxxxxx>
Delivery-date:	Mon, 04 Dec 2006 22:55:53 -0800
Domainkey-signature:	a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=LotIaw0DScB1A79cuQRqgU9IyUfydxuVM7+ymHktqM3vDA2vvMS7fNpiyXU2cuC67/BzGAZkfy2BQ5estQJHHNAJ9ZSb+zZ/NXPXcMOgpr1t6++R8chqJGDbvKKJiOAO1gXU2pl/uu9v5x6f6n2FYMybyBEDgV491AcEaLe0B68=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<1165254351.9694.30.camel@xxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<20061204043528.25410.42087.sendpatchset@localhost> <20061204043538.25410.29650.sendpatchset@localhost> <1165254351.9694.30.camel@xxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Hi again Ian,

On 12/5/06, Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx> wrote:

Hey Magnus,

On Mon, 2006-12-04 at 13:35 +0900, Magnus Damm wrote:
> [PATCH 02/02] Kexec / Kdump: Don't declare _end
>
> _end is already declared in xen/include/asm/config.h, so don't declare
> it twice. This solves a powerpc/ia64 build problem where _end is declared
> as char _end[] compared to unsigned long _end on x86.

This change has broken x86 kdump :-( I think because you fixed a bug
with your change and thereby uncovered an another latent bug.


Yes, you are right. Thanks for noticing and cooking up a fix.

Before the range->size returned from kexec_get_xen() was 1/4 of the
correct value because you were subtracting unsigned long * pointers so
size was the number of words not the number of bytes as expected. After
this change we are now subtracting unsigned longs so the correct value
is returned.

This seems to have caused the crash notes to disappear from /proc/iomem:
        Before:
                00100000-def7efff : System RAM
                  00100000-001397bf : Hypervisor code and data
                  00193000-001930f7 : Crash note
                  00194000-001940f7 : Crash note
                  02000000-05ffffff : Crash kernel
        After:
                00100000-def7efff : System RAM
                  00100000-001e5eff : Hypervisor code and data
                  02000000-05ffffff : Crash kernel

I presume they went missing because "Hypervisor code and data" now
overlaps the notes.


Your reasoning makes sense, this indeed looks like a problem related to overlap.

For some reason this has broken kdump for me (on x86_32p). The kdump
kernel gives this stack trace and then hangs a little later on:
        general protection fault: 0000 [#1]
        Modules linked in:
        CPU:    0
        EIP:    0060:[<c204954d>]    Not tainted VLI
        EFLAGS: 00010002   (2.6.16.33-x86_32p-kdump #17)
        EIP is at free_block+0x6d/0x100
        eax: 00000000   ebx: ffffffff   ecx: ffffffff   edx: c2455000
        esi: 00000001   edi: c5f22540   ebp: c253bef0   esp: c253bed8
        ds: 007b   es: 007b   ss: 0068
        Process events/0 (pid: 4, threadinfo=c253a000 task=c5f0aa70)
        Stack: <0>00000001 c5e71210 00000000 c5e71210 00000001 c5e71200 
c253bf14 c2049625
               00000000 c234b100 c5f0aa70 c5f22540 c5f22588 c5f22540 c257a4c0 
c253bf34
               c204a796 00000000 00000086 00000000 c242d364 c5f51680 00000296 
c253bf64
        Call Trace:
         [<c2003685>] show_stack_log_lvl+0xc5/0xf0
         [<c2003847>] show_registers+0x197/0x220
         [<c20039ae>] die+0xde/0x210
         [<c20048fe>] do_general_protection+0xee/0x1a0
         [<c200310f>] error_code+0x4f/0x54
         [<c2049625>] drain_array_locked+0x45/0xa0
         [<c204a796>] cache_reap+0x66/0x130
         [<c2021456>] run_workqueue+0x66/0xd0
         [<c2021a08>] worker_thread+0x138/0x160
         [<c202461f>] kthread+0xaf/0xe0
         [<c2001005>] kernel_thread_helper+0x5/0x10


I have not investigated your stack trace, but there are no crash notes
present in /proc/iomem without your patch which will lead to a vmcore
without PT_NOTE. This may trigger all sorts of errors in the secondary
kernel, but I'm not sure exactly which.

I changed xen_machine_kexec_register_resources() on the Linux side to
correctly nest the crash note resources under the xen resource which has
fixed things for me. Does the change below make sense to you? If so I'll
commit.


The patch looks good, please commit. I've tested it and the crash
notes now show up in /proc/iomem as expected. Thank you.

As a secondary point, perhaps the hypervisor resource should go all the
way to the end of the Xen heap (xenheap_phys_end I think) rather than
just the the end of .data/.bss?


Sounds like a good idea. I'm currently thinking how to pass down
virtual addresses from the hypervisor down to userspace so I can
modify kexec-tools to use proper virtual addresses for the PT_LOAD
program headers. I need the virtual address to make use of the
hypervisor resource, so they are sort of connected together. Anyway,
using the end of the Xen heap sounds like a step in the right
direction.

Many thanks,

/ magnus

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: [PATCH 02/02] Kexec / Kdump: Don't declare _end