WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] RE: [PATCH] mem_sharing: fix race condition of nominate and

To: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] RE: [PATCH] mem_sharing: fix race condition of nominate and unshare
From: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
Date: Fri, 21 Jan 2011 14:10:30 +0800
Cc: george.dunlap@xxxxxxxxxxxxx, tim.deegan@xxxxxxxxxx, juihaochiang@xxxxxxxxx
Delivery-date: Thu, 20 Jan 2011 22:11:59 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
Importance: Normal
In-reply-to: <20110120091934.GG8286@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <BLU157-w1861EFE53CB51FC710011FDAF10@xxxxxxx>, <AANLkTimOz_uauDEnu_XaPEgwD1EZJWEgOO1oiFccFNs1@xxxxxxxxxxxxxx>, <20110113092427.GJ5651@xxxxxxxxxxxxxxxxxxxxxxx>, <AANLkTinSga8xDkuH0BsqbhbBtvgwgbn=T0qmg9y9CeGr@xxxxxxxxxxxxxx>, <20110113155344.GN5651@xxxxxxxxxxxxxxxxxxxxxxx>, <BLU157-w507CBBB94539BFDB92B339DAF30@xxxxxxx>, <AANLkTimC9OYTiHYaeeytH5OCN-EF2v6L=KDVMwBjtB0z@xxxxxxxxxxxxxx>, <BLU157-w995512A939ADE678A6401DAF40@xxxxxxx>, <AANLkTikj6gJ2we+3FcfmqdeSkrvBFyj=p6R47UcmS3Rk@xxxxxxxxxxxxxx>, <BLU157-w352F69CD38F5FBFCA60477DAF90@xxxxxxx>, <20110120091934.GG8286@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
it is later found that domain is dying, so when dying alloc page is prohibitted
 
 (XEN) ---domain is 1, max_pages 132096, total_pages 29736
 
output log from line 914
 
 909     old_page = page;
 910     page = mem_sharing_alloc_page(d, gfn, flags & MEM_SHARING_MUST_SUCCEED);
 911     if(!page)
 912     {
 913         mem_sharing_debug_gfn(d, gfn);
 914         printk("---domain is %d, max_pages %u, total_pages %u \n", d->is_dying, d->max_pages, d->tot_pages);
 915         BUG_ON(!d);

 
--------------
Well the logic is a bit of complicate, my fix is to set gfn's mfn to  INVALID_MFN
 
 
 876     ret = page_make_private(d, page);                                                                                                                  
 877     /*last_gfn shoule able to be make_private*/
 878     BUG_ON(last_gfn & ret);
 879   &n bsp; if(ret == 0) goto private_page_found;
 880
 881     ld = rcu_lock_domain_by_id(d->domain_id);                                                                                                        
 882     BUG_ON(!ld);
 883     if(ld->is_dying )
 884     {
 885         if(!ld)
 8 86             printk("d is NULL %d\n", d->domain_id);
 887         else
 888             printk("d is dying %d %d\n", d->is_dying, d->domain_id);
 889
 890         /*decrease page type count and destory gfn*/
 891         put_page_and_type(page);
 892         mem_sharing_gfn_destroy(gfn_info, !last_gfn);
 893
 894         if(last_gfn)
 895             mem_sharing_hash_delete(handle);
 896         else
 897           ;   /* Even though we don't allocate a private page, we have to account
 898              * for the MFN that originally backed this PFN. */
 899             atomic_dec(&nr_saved_mfns);
 900
 901         /*set mfn invalid*/
 902         BUG_ON(set_shared_p2m_entry_invalid(d, gfn)==0);
 903         if(ld)
 904           rcu_unlock_domain(ld);
 905         shr_unlock();
 906         return 0;
 907     }

 
Any other suggestions?
 
 


> Date: Thu, 20 Jan 2011 09:19:34 +0000
> From: Tim.Deegan@xxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx
> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; juihaochiang@xxxxxxxxx
> Subject: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare
>
> At 07:19 +0000 on 20 Jan (1295507976), MaoXiaoyun wrote:
> > Hi:
> >
> > The latest BUG in mem_sharing_alloc_page from mem_sharing_unshare_page.
> > I printed heap info, which shows plenty memory left.
> > Could domain be NULL during in unshare, or should it be locked by rcu_lock_domain_by_id ?
> >
>
> 'd' probably isn't NULL; more likely is that the domain is not allowed
> to have any more memory. You should look at the values of d->max_pages
> and d->tot_pages when the failure happens.
>
> Cheers.
>
> Tim.
>
> > -----------code------------
> > 422 extern void pa gealloc_info(unsigned char key);
> > 423 static struct page_info* mem_sharing_alloc_page(struct domain *d,
> > 424 unsigned long gfn,
> > 425 int must_succeed)
> > 426 {
> > 427 struct page_info* page;
> > 428 struct vcpu *v = current;
> > 429 mem_event_request_t req;
> > 430
> > 431 page = alloc_domheap_page(d, 0);
> > 432 if(page != NULL) return page;
> > 433
> > 434 memset(&req, 0, sizeof(req));
> > 435 if(must_succeed)
> > 436 {
> > 437 /* We do not support 'must_succeed' any more. External operations such
> > 438 * as grant table mappings may fail with OOM condition!
> > 439 */
> > 440 pagealloc_info('m');
> > 441 BUG();
> > 442 }
> >
> > -------------serial output-------
> > (XEN) Physical memory information:
> > (XEN) Xen heap: 0kB free
> > (X EN) heap[14]: 64480kB free
> > (XEN) heap[15]: 131072kB free
> > (XEN) heap[16]: 262144kB free
> > (XEN) heap[17]: 524288kB free
> > (XEN) heap[18]: 1048576kB free
> > (XEN) heap[19]: 1037128kB free
> > (XEN) heap[20]: 3035744kB free
> > (XEN) heap[21]: 2610292kB free
> > (XEN) heap[22]: 2866212kB free
> > (XEN) Dom heap: 11579936kB free
> > (XEN) Xen BUG at mem_sharing.c:441
> > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]----
> > (XEN) CPU: 0
> > (XEN) RIP: e008:[<ffff82c4801c0531>] mem_sharing_unshare_page+0x681/0x790
> > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000 rbx: ffff83040092d808 rcx: 0000000000000096
> > (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff82c48021eac4
> > (XEN) rbp: 0000000000000000 rsp: ffff82c48035f5e8 r8: 0000000000000001
> > (XE N) r9: 0000000000000001 r10: 00000000fffffff5 r11: 0000000000000008
> > (XEN) r12: ffff8305c61f3980 r13: ffff83040eff0000 r14: 000000000001610f
> > (XEN) r15: ffff82c48035f628 cr0: 000000008005003b cr4: 00000000000026f0
> > (XEN) cr3: 000000052bc4f000 cr2: ffff880120126e88
> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c48035f5e8:
> > (XEN) ffff8305c61f3990 00018300bf2f0000 ffff82f604e6a4a0 000000002ab84078
> > (XEN) ffff83040092d7f0 00000000001b9c9c ffff8300bf2f0000 000000010eff0000
> > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN) 0000000000000000 0000000d0000010f ffff8305447ec000 000000000001610f
> > (XEN) 0000000000273525 ffff82c48035f724 ffff830502c705a0 ffff82f602c89a00
> > (XEN) ffff83040eff0000 ffff82c48010bfa9 ffff830572c5dbf0 000000000029e07f
> > (XEN) 0000000000000 000 ffff830572c5dbf0 000000008035fbe8 ffff82c48035f6f8
> > (XEN) 0000000100000002 ffff830572c5dbf0 ffff83063fc30000 ffff830572c5dbf0
> > (XEN) 0000035900000000 ffff88010d14bbe0 ffff880159e09000 00003f7e00000002
> > (XEN) ffffffffffff0032 ffff88010d14bbb0 ffff830438dfa920 0000000d8010a650
> > (XEN) 0000000000000100 ffff83063fc30000 ffff8305f9203730 ffffffffffffffea
> > (XEN) ffff88010d14bb70 0000000000000000 ffff88010d14bc10 ffff88010d14bbc0
> > (XEN) 0000000000000002 ffff82c48010da9b 0000000000000202 ffff82c48035fec8
> > (XEN) ffff82c48035f7c8 00000000801880af ffff83063fc30010 0000000000000000
> > (XEN) ffff82c400000008 ffff82c48035ff28 0000000000000000 ffff88010d14bbc0
> > (XEN) ffff880159e08000 0000000000000000 0000000000000000 00020000000002d7
> > (XEN) 00000000003f2b38 ffff8305b1f4b6b8 ffff8305b30f0000 ffff880159e09000
> > (XEN) 0000000000000000 0000000000000000 000200000000 008a 00000000003ed1f9
> > (XEN) ffff83063fc26450 ffff8305b30f0000 ffff880159e0a000 0000000000000000
> > (XEN) 0000000000000000 00020000000001fa 000000000029e2ba ffff83063fc26fd0
> > (XEN) Xen call trace:
> > (XEN) [<ffff82c4801c0531>] mem_sharing_unshare_page+0x681/0x790
> > (XEN) [<ffff82c48010bfa9>] gnttab_map_grant_ref+0xbf9/0xe30
> > (XEN) [<ffff82c48010da9b>] do_grant_table_op+0x14b/0x1080
> > (XEN) [<ffff82c48010fb44>] do_xen_version+0xb4/0x480
> > (XEN) [<ffff82c4801b8215>] set_p2m_entry+0x85/0xc0
> > (XEN) [<ffff82c4801bc92e>] set_shared_p2m_entry+0x1be/0x2f0
> > (XEN) [<ffff82c480121c4c>] xmem_pool_free+0x2c/0x310
> > (XEN) [<ffff82c4801bfaf8>] mem_sharing_share_pages+0xd8/0x3d0
> > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70
> > (XEN) [<ffff82c48011c519>] cpumask_raise_softirq+0x89/0 xa0
> > (XEN) [<ffff82c480118351>] csched_vcpu_wake+0x101/0x1b0
> > (XEN) [<ffff82c48014717d>] vcpu_kick+0x1d/0x80
> > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70
> > (XEN) [<ffff82c48015a1d8>] get_page+0x28/0xf0
> > (XEN) [<ffff82c48015ed72>] do_update_descriptor+0x1d2/0x210
> > (XEN) [<ffff82c480113d7e>] do_multicall+0x14e/0x340
> > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Xen BUG at mem_sharing.c:441
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required ('noreboot' specified)
> >
> > > Date: Mon, 17 Jan 2011 17:02:02 +0800
> > > Subject: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare
> > > From: juihaochiang@xxxxxxxxx
> > > To: tinnycloud@xxxxxxxxxxx
> > > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; tim.deegan@xxxxxxxxxx
> > >
> > > Hi, tinnycloud:
> > >
> > > Do you have xenpaging tools running properly?
> > > I haven't gone through that one, but it seems you have run out of memory.
> > > When this case happens, mem_sharing will request memory to the
> > > xenpaging daemon, which tends to page out and free some memory.
> > > Otherwise, the allocation would fail.
> > > Is this your scenario?
> > >
> > > Bests,
> > > Jui-Hao
> > >
> > > 2011/1/17 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> > > > Another failure on BUG() in mem_sharing_alloc_page()
> > > >
> > > > memset(&req, 0, sizeof(req));
> > > > if(must_succeed)
> > > > {
> > > > /* We do not support 'must_succeed' any more. External operations
> > > > such
> > > > * as grant table mappings may fail with OOM condition!
> > > > */
> > > > BUG();===================>bug here
> > > > }
> > > > else
> > > > {
> > > > /* All foreign attempts to unshare pages should be handled through
> > > > * 'must_succeed' case. */
> > > > ASSERT(v->domain->domain_id == d->domain_id);
> > > > vcpu_pause_nosync(v);
> > > > req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
> > > > }
> > > >
>
> --
> Tim Deegan <Tim.Deegan@xxxxxxxxxx>
> Principal Software Engineer, Xen Platform Team
> Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>