WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT

To: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>, "jbeulich@xxxxxxxxxx" <jbeulich@xxxxxxxxxx>
Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Wed, 1 Sep 2010 11:28:23 +0100
Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 01 Sep 2010 03:29:10 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C8A3E8A2.21BB6%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: ActJv3dvtC3qfM7JQUCMpmdsUb1dUAAAIQ7uAAAa+TM=
Thread-topic: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
User-agent: Microsoft-Entourage/12.26.0.100708
More interesting would be to turn the BUG_ON stamements in my first patch
into if() statements and print out that kind of info before panic()ing. It
would tell us which BUG_ON() fired, the page addresses (and maybe MFNs) and
order, mask, node, and zone info.

 -- Keir

On 01/09/2010 11:25, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote:

> That doesn't imply anything. It is perfectly valid for a page's prev or next
> index to be PAGE_LIST_NULL, if that page is not in a list, or if it is at
> the head and/or tail of a list.
> 
>  -- Keir
> 
> On 01/09/2010 11:21, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> 
>> Thanks Keir.
>> 
>> I myself did below test. in page_alloc.c.
>> check_page will panic on all pages which the 6th character in its adddress is
>> '3', i used to indicate which line paniced.
>> 
>> Below output indicates the panic comes from line 558, and the page address is
>> ffff82f600002040, while its next page
>> is ffff8315ffffffe0, compare to the panic address in previous
>> panic(ffff8315ffffffe4), which is very similar.
>> 
>> I think this should imply something.
>> 
>> ---------------------------------------
>> (XEN) -----------18
>> (XEN) System RAM: 24542MB (25131224kB)
>> (XEN) SRAT: No PXM for e820 range: 0000000000000000 - 000000000009a7ff
>> (XEN) SRAT: SRAT not used.
>> (XEN) ----------------pgb ffff82f600002040 pg ffff8315ffffffe0, mask 1, order
>> 0, 0
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) xmao invalid page address assigned
>> (XEN) ****************************************
>> (XEN)
>> 
>> ----------------------------------------------------
>>  485 static int check_page(struct page_info* pgb, struct page_info* pg,
>> unsigned long mask, unsigned int order, int i){
>>  486
>>  487         if((unsigned long)pg & 0x0000020000000000 &&
>>  488            (unsigned long)pg & 0x0000010000000000
>>  489               ){
>>  490                   printk("----------------pgb %p pg %p, mask %lx, order
>> %d, %d\n", pgb, pg, mask, order, i);
>>  491                   panic("xmao invalid page address assigned \n");
>>  492              }
>>  493             return 0;
>>  494 }
>> 
>> 549         if ( (page_to_mfn(pg) & mask) )
>>  550         {
>>  551             /* Merge with predecessor block? */
>>  552             if ( !mfn_valid(page_to_mfn(pg-mask)) ||
>>  553                  !page_state_is(pg-mask, free) ||
>>  554                  (PFN_ORDER(pg-mask) != order) )
>>  555                 break;
>>  556             pg -= mask;
>>  557
>>  558             check_page(pg, pdx_to_page(pg->list.next), mask, order, 0);
>>  559             check_page(pg, pdx_to_page(pg->list.prev), mask, order, 1);
>>  560
>>  561             page_list_del(pg, &heap(node, zone, order));
>>  562         }
>>  563         else
>>  564         {
>>  565             /* Merge with successor block? */
>>  566             if ( !mfn_valid(page_to_mfn(pg+mask)) ||
>>  567                  !page_state_is(pg+mask, free) ||
>>  568                  (PFN_ORDER(pg+mask) != order) )
>>  569                 break;
>>  570
>>  571             pgt = pg + mask;
>>  572             check_page(pg, pdx_to_page(pgt->list.next), mask, order, 2);
>>  573             check_page(pg, pdx_to_page(pgt->list.prev), mask, order, 3);
>>  574
>> 
>>> Date: Wed, 1 Sep 2010 10:58:54 +0100
>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
>>> From: keir.fraser@xxxxxxxxxxxxx
>>> To: tinnycloud@xxxxxxxxxxx; jbeulich@xxxxxxxxxx
>>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx
>>> 
>>> Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent
>>> merging across node boundaries. Nonetheless the code is simpler and more
>>> obvious if we put a further merging constraint in free_heap_pages() instead.
>>> It's also correcter, since I'm not sure that the
>>> phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won't possibly BUG out
>>> if pg-1 is not a RAM page and is not in a known NUMA node range.
>>> 
>>> Please give the attached patch a spin. (You should revert the previous
>>> patch, of course).
>>> 
>>> Thanks,
>>> Keir
>>> 
>>> On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>>> 
>>>> Well. It did crash on every startup.
>>>> 
>>>> below is what I got.
>>>> ---------------------------------------------------
>>>> root (hd0,0)
>>>> Filesystem type is ext2fs, partition type 0x83
>>>> kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
>>>> dom0_max_
>>>> vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax
>>>> noreboot
>>>> [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078,
>>>> entry=0x100000
>>>> ]
>>>> module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe
>>>> console=hvc0
>>>> [Multiboot-module @ 0x39b000, 0x3214d0 bytes]
>>>> 
>>>> 
>>>> ? __ __ _ _
>>>> ___ ___
>>>> \ \/ /___ _ __ | || | / _ \ / _ \ *
>>>> \ // _ \ '_ \ | || |_| | | | | | | *
>>>> / \ __/ | | | |__ _| |_| | |_| | * *
>>>> /_/\_\___|_| |_| |_|(_)___(_)___/ **************************************
>>>> hich entry is highlighted.
>>>> (XEN) Xen version 4.0.0 (root@xxxxxxxxxxxxxxxxx) (gcc version 4.1.2
>>>> 20080704
>>>> (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010
>>>> (XEN) Latest ChangeSet: unavailableto modify the kernel arguments
>>>> (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
>>>> dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1
>>>> conswitch=ax
>>>> noreboot
>>>> (XEN) Video information:
>>>> (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds.
>>>> (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds
>>>> (XEN) EDID info not retrieved because no DDC retrieval method detected
>>>> (XEN) Disc information:
>>>> (XEN) Found 6 MBR signatures
>>>> (XEN) Found 6 EDD information structures
>>>> (XEN) Xen-e820 RAM map:
>>>> (XEN) 0000000000000000 - 000000000009a800 (usable)
>>>> (XEN) 000000000009a800 - 00000000000a0000 (reserved)
>>>> (XEN) 00000000000e4bb0 - 0000000000100000 (reserved)
>>>> (XEN) 0000000000100000 - 00000000bf790000 (usable)
>>>> (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data)
>>>> (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
>>>> (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved)
>>>> (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved)
>>>> (XEN) 00000000e0000000 - 00000000f0000000 (reserved)
>>>> (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
>>>> (XEN) 00000000fff00000 - 0000000100000000 (reserved)
>>>> (XEN) 0000000100000000 - 0000000640000000 (usable)
>>>> (XEN) --------------849
>>>> (XEN) --------------849
>>>> (XEN) --------------849
>>>> (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM)
>>>> (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97)
>>>> (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97)
>>>> (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117)
>>>> (XEN) ACPI: FACS BF79E000, 0040
>>>> (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97)
>>>> (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97)
>>>> (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97)
>>>> (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1)
>>>> (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97)
>>>> (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117)
>>>> (XEN) --------------847
>>>> (XEN) ---------srat enter
>>>> (XEN) ---------prepare enter into pfn
>>>> (XEN) -------in pfn
>>>> (XEN) -------hole shift returned
>>>> (XEN) --------------849
>>>> (XEN) System RAM: 24542MB (25131224kB)
>>>> (XEN) Unknown interrupt (cr2=0000000000000000)
>>>> (XEN) 00000000000000ab 0000000000000000 ffff82f600004020
>>>> 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000
>>>> 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008
>>>> 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000
>>>> ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18
>>>> 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000
>>>> 0000000000000163 0000000900000000 00000000000000ab 0000000000000201
>>>> 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff
>>>> 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020
>>>> 0000000000001000 0000000000000004 0000000000000080 0000000000000001
>>>> ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000
>>>> 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc
>>>> 0000000000540000 00000000005fde36 0000000000540000 0000000000100000
>>>> 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7
>>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630
>>>> 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0
>>>> 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000
>>>> 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000
>>>> 0000000800000000 000000010000006e 0000000000000003 00000000000002f8
>>>> 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000
>>>> 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5
>>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> 0000000000000000 0000000000000000 00000000fffff000
>>>> 
>>>>> Date: Wed, 1 Sep 2010 09:49:18 +0100
>>>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
>>>>> From: keir.fraser@xxxxxxxxxxxxx
>>>>> To: JBeulich@xxxxxxxxxx
>>>>> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>> 
>>>>> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>>>>> 
>>>>>>> Well I agree with your logic anyway. So I don't see that this can be the
>>>>>>> cause of MaoXiaoyun's bug. At least not directly. But then I'm stumped
>>>>>>> as
>>>>>>> to
>>>>>>> why the page arithmetic and checks in free_heap_pages are (apparently)
>>>>>>> resulting in a page pointer way outside the frame-table region and
>>>>>>> actually
>>>>>>> in the directmap region.
>>>>>> 
>>>>>> There must be some unchecked use of PAGE_LIST_NULL, i.e.
>>>>>> running off a list end without taking notice (0xffff8315ffffffe4
>>>>>> exactly corresponds with that).
>>>>> 
>>>>> Okay, my next guess then is that we are deleting a chunk from the wrong
>>>>> list
>>>>> head. I don't see any check that the adjacent chunks we are considering to
>>>>> merge are from the same node and zone. I suppose the zone logic does just
>>>>> work as we're dealing with 2**x aligned and sized regions. But, shouldn't
>>>>> the merging logic in free_heap_pages be checking that the merging
>>>>> candidate
>>>>> is from the same NUMA node? I see I have an ASSERTion later in the same
>>>>> function, but it's too weak and wishful I suspect.
>>>>> 
>>>>> MaoXiaoyun: can you please test with the attached patch? If I'm right, you
>>>>> will crash on one of the BUG_ON checks that I added, rather than crashing
>>>>> on
>>>>> a pointer dereference. You may even crash during boot. Anyhow, what is
>>>>> interesting is whether this patch always makes you crash on BUG_ON before
>>>>> you would normally crash on pointer dereference. If so this is trivial to
>>>>> fix.
>>>>> 
>>>>> Thanks,
>>>>> Keir
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel