WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT

To: <keir.fraser@xxxxxxxxxxxxx>, <jbeulich@xxxxxxxxxx>
Subject: RE: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
From: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
Date: Wed, 1 Sep 2010 19:32:48 +0800
Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 01 Sep 2010 04:33:53 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
Importance: Normal
In-reply-to: <C8A3E270.21A27%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <BAY121-W693BCEFADA48DA1129883DA8B0@xxxxxxx>, <C8A3E270.21A27%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
When I put bug on code into if statement, the server can start.
Well, I should have committed another stupid mistakes during manually copy the patch, I apologize.
 
Anyway, I have one server run with patch one, where the patch is move into if statement, I shall get
the page address, and other information if it panic.
 
Meanwhile,  I'll have another server to run the second patch.
I'll keep u updated, thanks.

 
> Date: Wed, 1 Sep 2010 10:58:54 +0100
> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> From: keir.fraser@xxxxxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx; jbeulich@xxxxxxxxxx
> CC: xen-devel@xxxxxxxxxxxxxxxxxxx
>
> Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent
> merging across node boundaries. Nonetheless the code is simpler and more
> obvious if we put a further merging constraint in free_heap_pages() instead.
> It's also correcter, since I'm not sure that the
> phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won't possibly BUG out
> if pg-1 is not a RAM page and is not in a known NUMA node range.
>
> Please give the attached patch a spin. (You should revert the previous
> patch, of course).
>
> Thanks,
> Keir
>
> On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>
> > Well. It did crash on every startup.
> >
> > below is what I got.
> > ---------------------------------------------------
> > root (hd0,0)
> > Filesystem type is ext2fs, partition type 0x83
> > kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
> > dom0_max_
> > vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot
> > [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078,
> > entry=0x100000
> > ]
> > module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0
> > [Multiboot-module @ 0x39b000, 0x3214d0 bytes]
> >
> >
> > ? __ __ _ _
> > ___ ___
> > \ \/ /___ _ __ | || | / _ \ / _ \ *
> > \ // _ \ '_ \ | || |_| | | | | | | *
> > / \ __/ | | | |__ _| |_| | |_| | * *
> > /_/\_\___|_| |_| |_|(_)___(_)___/ **************************************
> > hich entry is highlighted.
> > (XEN) Xen version 4.0.0 (root@xxxxxxxxxxxxxxxxx) (gcc version 4.1.2 20080704
> > (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010
> > (XEN) Latest ChangeSet: unavailableto modify the kernel arguments
> > (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
> > dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax
> > noreboot
> > (XEN) Video information:
> > (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds.
> > (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds
> > (XEN) EDID info not retrieved because no DDC retrieval method detected
> > (XEN) Disc information:
> > (XEN) Found 6 MBR signatures
> > (XEN) Found 6 EDD information structures
> > (XEN) Xen-e820 RAM map:
> > (XEN) 0000000000000000 - 000000000009a800 (usable)
> > (XEN) 000000000009a800 - 00000000000a0000 (reserved)
> > (XEN) 00000000000e4bb0 - 0000000000100000 (reserved)
> > (XEN) 0000000000100000 - 00000000bf790000 (usable)
> > (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data)
> > (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
> > (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved)
> > (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved)
> > (XEN) 00000000e0000000 - 00000000f0000000 (reserved)
> > (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
> > (XEN) 00000000fff00000 - 0000000100000000 (reserved)
> > (XEN) 0000000100000000 - 0000000640000000 (usable)
> > (XEN) --------------849
> > (XEN) --------------849
> > (XEN) --------------849
> > (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM)
> > (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97)
> > (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97)
> > (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117)
> > (XEN) ACPI: FACS BF79E000, 0040
> > (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97)
> > (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97)
> > (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97)
> > (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1)
> > (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97)
> > (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117)
> > (XEN) --------------847
> > (XEN) ---------srat enter
> > (XEN) ---------prepare enter into pfn
> > (XEN) -------in pfn
> > (XEN) -------hole shift returned
> > (XEN) --------------849
> > (XEN) System RAM: 24542MB (25131224kB)
> > (XEN) Unknown interrupt (cr2=0000000000000000)
> > (XEN) 00000000000000ab 0000000000000000 ffff82f600004020
> > 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000
> > 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008
> > 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000
> > ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18
> > 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000
> > 0000000000000163 0000000900000000 00000000000000ab 0000000000000201
> > 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff
> > 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020
> > 0000000000001000 0000000000000004 0000000000000080 0000000000000001
> > ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000
> > 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc
> > 0000000000540000 00000000005fde36 0000000000540000 0000000000100000
> > 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630
> > 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0
> > 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000
> > 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000
> > 0000000800000000 000000010000006e 0000000000000003 00000000000002f8
> > 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 00000000fffff000
> >
> >> Date: Wed, 1 Sep 2010 09:49:18 +0100
> >> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
> >> From: keir.fraser@xxxxxxxxxxxxx
> >> To: JBeulich@xxxxxxxxxx
> >> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> >>
> >> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
> >>
> >>>> Well I agree with your logic anyway. So I don't see that this can be the
> >>>> cause of MaoXiaoyun's bug. At least not directly. But then I'm stumped as
> >>>> to
> >>>> why the page arithmetic and checks in free_heap_pages are (apparently)
> >>>> resulting in a page pointer way outside the frame-table region and actually
> >>>> in the directmap region.
> >>>
> >>> There must be some unchecked use of PAGE_LIST_NULL, i.e.
> >>> running off a list end without taking notice (0xffff8315ffffffe4
> >>> exactly corresponds with that).
> >>
> >> Okay, my next guess then is that we are deleting a chunk from the wrong list
> >> head. I don't see any check that the adjacent chunks we are considering to
> >> merge are from the same node and zone. I suppose the zone logic does just
> >> work as we're dealing with 2**x aligned and sized regions. But, shouldn't
> >> the merging logic in free_heap_pages be checking that the merging candidate
> >> is from the same NUMA node? I see I have an ASSERTion later in the same
> >> function, but it's too weak and wishful I suspect.
> >>
> >> MaoXiaoyun: can you please test with the attached patch? If I'm right, you
> >> will crash on one of the BUG_ON checks that I added, rather than crashing on
> >> a pointer dereference. You may even crash during boot. Anyhow, what is
> >> interesting is whether this patch always makes you crash on BUG_ON before
> >> you would normally crash on pointer dereference. If so this is trivial to
> >> fix.
> >>
> >> Thanks,
> >> Keir
> >>
> >
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel