WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xenpaging crashes xen in is_iomem_page()

On 10 August 2010 10:19, Olaf Hering <olaf@xxxxxxxxx> wrote:
> On Mon, Aug 09, Patrick Colp wrote:
>
>> > I tried to move the initial evict_victim() calls into the while(1) loop.
>> > If there is no event from xc_wait_for_event_or_timeout(), fill &victims
>> > one by one.
>> >
>> > My attempt looks basically like shown below.
>> > Unfortunately, it crashes xen itself in odd ways. I will look at this
>> > route further tomorrow.
>>
>> It's not immediately clear to me why your change wouldn't work.
>
> Patrick,
>
> there is something weird going on.
> Today I was able to boot the client sucessfully with my change. Still I
> got a few 'grant_table.c:583:d0 Iomem mapping not permitted ffffffffff
> (domain 1)' lines.

This sounds like it's trying to grant pages which have been paged out
(since paged out pages change their p2m mapping to MFN_INVALID which
is 0xffffffff).


> After some tries I found that /usr/bin/free in the client gives IO Error
> when I tried to run it. The same happend with cat /usr/bin/free > /dev/null
> While doing that, I saw that Iomem error above. The gfn happend to be
> 3aba9. I searched that in my xenpaging debug output. There was a
> page-out of gfn 3aba9, but no page-in request.
>
> So it seems that gfn lost its "state" somehow.

I think this means there's a fault path that isn't caught by xenpaging
(again, my guess here would be with the grant table stuff).


> Another thing:
> Now that xenpaging does the page-out process in a slow way, it will take
> alot more time to finish 65K pages. I did a 'init 0' while it was still
> in the middle of the process of filling &victims. This shutdown killed
> xen itself. (ept_get_entry lines come from my own dbg printk, just there
> to check where the 0xffffffffff is coming from.)
>
> --- xen-unstable.hg-4.1.21925.orig/xen/arch/x86/mm/hap/p2m-ept.c
> +++ xen-unstable.hg-4.1.21925/xen/arch/x86/mm/hap/p2m-ept.c
> @@ -488,8 +488,11 @@ static mfn_t ept_get_entry(struct domain
>
>     if ( ept_entry->avail1 != p2m_invalid )
>     {
> +       ept_entry_t **__p = (ept_entry_t **)ept_entry;
>         *t = ept_entry->avail1;
>         mfn = _mfn(ept_entry->mfn);
> +    if ((mfn_x(mfn) & 0xffffffffffUL) == 0xffffffffffUL)
> +           printk("%s:%s(%u) %lx %p mp %lx gfn 
> %lx\n",__FILE__,__func__,__LINE__,mfn_x(mfn), *__p, max_page, gfn);
>         if ( i )
>         {
>             /*
>
>
> (XEN) p2m-ept.c:ept_get_entry(495) ffffffffff 000ffffffffffc00 mp 140000 gfn 
> 135a
> (XEN) mem_event.c:195:d0 Ignoring memory paging op on dying domain 1
> (XEN) p2m-ept.c:ept_get_entry(495) ffffffffff 000ffffffffffa00 mp 140000 gfn 
> a7c2
> (XEN) p2m-ept.c:ept_get_entry(495) ffffffffff 000ffffffffffa00 mp 140000 gfn 
> a7c2
> (XEN) Assertion '(((lport) >= 0) && ((lport) < 
> ((((ld)->arch.has_32bit_shinfo) ? 32 : 64) * (((ld)->arch.has_32bit_shinfo) ? 
> 32 : 64))) && (((ld)->evtchn[(lport)/128]) != ((void*)0)))' failed at 
> event_channel.c:1033
> (XEN) Debugging connection not set up.
> (XEN) ----[ Xen-4.1.21925-20100810.075543  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    3
> (XEN) RIP:    e008:[<ffff82c480105fed>] notify_via_xen_event_channel+0x43/0xfb
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 0000000000000007   rcx: 0000000000000000
> (XEN) rdx: 0000000000000040   rsi: 0000000000000007   rdi: ffff830138370194
> (XEN) rbp: ffff83013febfc88   rsp: ffff83013febfc68   r8:  0000000000000000
> (XEN) r9:  ffff82c48020aee0   r10: 00000000fffffff9   r11: 0000000000000004
> (XEN) r12: ffff830138370000   r13: ffff830138370190   r14: 000000000000a7c2
> (XEN) r15: 000000000012f977   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 000000012fb44000   cr2: ffff8800e948fe98
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff83013febfc68:
> (XEN)    0000000000000282 ffff830138370000 ffff83013febfcd8 ffff830138371548
> (XEN)    ffff83013febfcb8 ffff82c4801cef11 ffff830138370000 ffff83013febff18
> (XEN)    ffff830138370000 ffff8300bf752000 ffff83013febfd18 ffff82c4801cd070
> (XEN)    000000000000a7c2 0000000a00000003 000000000000a7c2 0000000000000000
> (XEN)    000000030000000a 0000000000000000 000000000000a7c2 0000000000000000
> (XEN)    0000000000000000 0000000000000000 ffff83013febfef8 ffff82c48016c18b
> (XEN)    ffff82c480153f82 ffff83013febfd70 ffff82c480151176 ffff83013febff18
> (XEN)    ffff83013febff18 ffff83013febff18 ffff83013febff18 ffff83013febff18
> (XEN)    ffff83013febff18 ffff83013febff18 ffff83013febff18 ffff83013febff18
> (XEN)    ffff83013febff18 ffff83013febfde0 0000000000000286 ffff83013febfe00
> (XEN)    00000195a8185d6b 0000000000000286 ffff8300bf752030 0000000000000000
> (XEN)    0000000000000000 0000000100000001 0000000000000000 ffff83013febfe10
> (XEN)    00000000bf752000 ffff83012f977e98 ffff82f6025f2ee0 ffff83013cf50000
> (XEN)    ffff830138370000 ffff8300bf752000 ffff8800f271d000 ffff83013febfe40
> (XEN)    00000195a7fb7184 ffff82c480122617 0000000000000000 800000000a7c2627
> (XEN)    ffff83013febfe68 ffff82c48014bcc4 ffff83013febfe68 ffff82c4801615d2
> (XEN)    ffff83013febff18 ffff8300bf752000 0000000000000001 0000000000000000
> (XEN)    ffff83013febfef8 ffff82c4802033c0 00007f20d9bd3000 0000000000000206
> (XEN)    0000000a800073f0 0000000000000001 000000012f977e98 800000000a7c2627
> (XEN)    ffff83013febfed8 ffff8300bf752000 8000000000000427 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c480105fed>] notify_via_xen_event_channel+0x43/0xfb
> (XEN)    [<ffff82c4801cef11>] mem_event_put_request+0x99/0xa7
> (XEN)    [<ffff82c4801cd070>] p2m_mem_paging_populate+0x230/0x242
> (XEN)    [<ffff82c48016c18b>] do_mmu_update+0x696/0x1839
> (XEN)    [<ffff82c4801fe1e2>] syscall_enter+0xf2/0x14c
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) Assertion '(((lport) >= 0) && ((lport) < 
> ((((ld)->arch.has_32bit_shinfo) ? 32 : 64) * (((ld)->arch.has_32bit_shinfo) ? 
> 32 : 64)****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN) Debugging connection not set up.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

This crash is caused by something in dom0 playing around with the
guest's memory. My guess here is that the guest has shutdown enough to
destroy its event channels. Not entire sure who's the culprit here. It
seems like the xenpaging daemon tried to page something in at some
point, but was denied by Xen since the guest was shutting down. So I
would hazard that the PV drivers are again the culprit (as I've not
encountered this error before either). I suppose it could be a result
of evicting slowly instead of up-front. I'll need to get my hands on
SLES or PV drivers so I can fix the grant table stuff (I had it
working before, but that was before the new grant table v2 stuff).


Patrick

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel