WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT

To: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>, "jbeulich@xxxxxxxxxx" <jbeulich@xxxxxxxxxx>
Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Wed, 1 Sep 2010 10:58:54 +0100
Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 01 Sep 2010 03:00:35 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <BAY121-W693BCEFADA48DA1129883DA8B0@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: ActJt2/7qudDMzZrRyy8tY38D1vRrwABNkqw
Thread-topic: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
User-agent: Microsoft-Entourage/12.26.0.100708
Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent
merging across node boundaries. Nonetheless the code is simpler and more
obvious if we put a further merging constraint in free_heap_pages() instead.
It's also correcter, since I'm not sure that the
phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won't possibly BUG out
if pg-1 is not a RAM page and is not in a known NUMA node range.

Please give the attached patch a spin. (You should revert the previous
patch, of course).

 Thanks,
 Keir

On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:

> Well. It did crash on every startup.
>  
> below is what I got.
> ---------------------------------------------------
> root (hd0,0)     
>  Filesystem type is ext2fs, partition type 0x83
> kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
> dom0_max_ 
> vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax noreboot
>    [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078,
> entry=0x100000 
> ]                
> module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe console=hvc0
>    [Multiboot-module @ 0x39b000, 0x3214d0 bytes]
>                  
>                  
>                                                 ? __  __            _  _
> ___   ___  
>  \ \/ /___ _ __   | || |  / _ \ / _ \                                      *
>   \  // _ \ '_ \  | || |_| | | | | | |                                     *
>   /  \  __/ | | | |__   _| |_| | |_| |                                     * *
>  /_/\_\___|_| |_|    |_|(_)___(_)___/ **************************************
>                                       hich entry is highlighted.
> (XEN) Xen version 4.0.0 (root@xxxxxxxxxxxxxxxxx) (gcc version 4.1.2 20080704
> (Red Hat 4.1.2-46)) Wed Sep  1 17:13:35 CST 2010
> (XEN) Latest ChangeSet: unavailableto modify the kernel arguments
> (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
> dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax
> noreboot
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16automatically in 3 seconds.
> (XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
> (XEN)  EDID info not retrieved because no DDC retrieval method detected
> (XEN) Disc information:
> (XEN)  Found 6 MBR signatures
> (XEN)  Found 6 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN)  0000000000000000 - 000000000009a800 (usable)
> (XEN)  000000000009a800 - 00000000000a0000 (reserved)
> (XEN)  00000000000e4bb0 - 0000000000100000 (reserved)
> (XEN)  0000000000100000 - 00000000bf790000 (usable)
> (XEN)  00000000bf790000 - 00000000bf79e000 (ACPI data)
> (XEN)  00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
> (XEN)  00000000bf7d0000 - 00000000bf7e0000 (reserved)
> (XEN)  00000000bf7ec000 - 00000000c0000000 (reserved)
> (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
> (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
> (XEN)  00000000fff00000 - 0000000100000000 (reserved)
> (XEN)  0000000100000000 - 0000000640000000 (usable)
> (XEN) --------------849
> (XEN) --------------849
> (XEN) --------------849
> (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM)
> (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT       97)
> (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT       97)
> (XEN) ACPI: DSDT BF7904B0, 4D6A (r2  CTSAV CTSAV122      122 INTL 20051117)
> (XEN) ACPI: FACS BF79E000, 0040
> (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT       97)
> (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG  20091123 MSFT       97)
> (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT       97)
> (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT         1 INTL        1)
> (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET  20091123 MSFT       97)
> (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm    CpuPm       12 INTL 20051117)
> (XEN) --------------847
> (XEN) ---------srat enter
> (XEN) ---------prepare enter into pfn
> (XEN) -------in pfn
> (XEN) -------hole shift returned
> (XEN) --------------849
> (XEN) System RAM: 24542MB (25131224kB)
> (XEN) Unknown interrupt (cr2=0000000000000000)
> (XEN)     00000000000000ab    0000000000000000    ffff82f600004020
> 00007d0a00000000    ffff82f600004000    0000000000000020    0000000000201000
> 0000000000000000    ffffffffffffffff    0000000000000000    0000000000000008
> 0000000000000000    00000000000001ff    00000000000001ff    0000000000000000
> ffff82c480115787    000000000000e008    0000000000010002    ffff82c48035fd18
> 0000000000000000    ffff82c48011536a    0000000000000000    0000000000000000
> 0000000000000163    0000000900000000    00000000000000ab    0000000000000201
> 0000000000000000    0000000000000100    ffff82f600004020    0000000000000eff
> 0000000000000000    ffff82c480115e60    0000000000000000    ffff82f600002020
> 0000000000001000    0000000000000004    0000000000000080    0000000000000001
> ffff82c48020be8d    ffff830000100000    0000000000000008    0000000000000000
> 0000000000000000    ffffffffffffffff    0000000000000101    ffff82c48022d8fc
> 0000000000540000    00000000005fde36    0000000000540000    0000000000100000
> 0000000100000000    0000000000000010    ffff82c48024deb4    ffff82c4802404f7
> 0000000000000000    0000000000000000    0000000000000000    0000000000000000
> 0000000000000000    ffff8300bf568ff8    ffff8300bf569ff8    000000000022a630
> 000000000022a695    0000000000087f00    0000000000000000    ffff830000087fc0
> 00000000005fde36    000000000087b6d0    0000000000d44000    0000000001000000
> 0000000000000000    ffffffffffffffff    ffff830000087f00    0000100000000000
> 0000000800000000    000000010000006e    0000000000000003    00000000000002f8
> 0000000000000000    0000000000000000    0000000000067ebc    0000000000000000
> 0000000000000000    0000000000000000    0000000000000000    ffff82c4801000b5
> 0000000000000000    0000000000000000    0000000000000000    0000000000000000
> 0000000000000000    0000000000000000    0000000000000000    0000000000000000
> 0000000000000000    0000000000000000    0000000000000000    0000000000000000
> 0000000000000000    0000000000000000    0000000000000000    0000000000000000
> 0000000000000000    0000000000000000    0000000000000000    0000000000000000
> 0000000000000000    0000000000000000    0000000000000000    0000000000000000
> 0000000000000000    0000000000000000    00000000fffff000
>  
>> Date: Wed, 1 Sep 2010 09:49:18 +0100
>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
>> From: keir.fraser@xxxxxxxxxxxxx
>> To: JBeulich@xxxxxxxxxx
>> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>> 
>> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>> 
>>>> Well I agree with your logic anyway. So I don't see that this can be the
>>>> cause of MaoXiaoyun's bug. At least not directly. But then I'm stumped as
>>>> to
>>>> why the page arithmetic and checks in free_heap_pages are (apparently)
>>>> resulting in a page pointer way outside the frame-table region and actually
>>>> in the directmap region.
>>> 
>>> There must be some unchecked use of PAGE_LIST_NULL, i.e.
>>> running off a list end without taking notice (0xffff8315ffffffe4
>>> exactly corresponds with that).
>> 
>> Okay, my next guess then is that we are deleting a chunk from the wrong list
>> head. I don't see any check that the adjacent chunks we are considering to
>> merge are from the same node and zone. I suppose the zone logic does just
>> work as we're dealing with 2**x aligned and sized regions. But, shouldn't
>> the merging logic in free_heap_pages be checking that the merging candidate
>> is from the same NUMA node? I see I have an ASSERTion later in the same
>> function, but it's too weak and wishful I suspect.
>> 
>> MaoXiaoyun: can you please test with the attached patch? If I'm right, you
>> will crash on one of the BUG_ON checks that I added, rather than crashing on
>> a pointer dereference. You may even crash during boot. Anyhow, what is
>> interesting is whether this patch always makes you crash on BUG_ON before
>> you would normally crash on pointer dereference. If so this is trivial to
>> fix.
>> 
>> Thanks,
>> Keir
>> 
>        

Attachment: 00-freeheap
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel