xen-devel
[Xen-devel] RE: [PATCH] mem_sharing: fix race condition of nominate and
To: |
xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx> |
Subject: |
[Xen-devel] RE: [PATCH] mem_sharing: fix race condition of nominate and unshare |
From: |
MaoXiaoyun <tinnycloud@xxxxxxxxxxx> |
Date: |
Fri, 21 Jan 2011 14:10:30 +0800 |
Cc: |
george.dunlap@xxxxxxxxxxxxx, tim.deegan@xxxxxxxxxx, juihaochiang@xxxxxxxxx |
Delivery-date: |
Thu, 20 Jan 2011 22:11:59 -0800 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
Importance: |
Normal |
In-reply-to: |
<20110120091934.GG8286@xxxxxxxxxxxxxxxxxxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<BLU157-w1861EFE53CB51FC710011FDAF10@xxxxxxx>, <AANLkTimOz_uauDEnu_XaPEgwD1EZJWEgOO1oiFccFNs1@xxxxxxxxxxxxxx>, <20110113092427.GJ5651@xxxxxxxxxxxxxxxxxxxxxxx>, <AANLkTinSga8xDkuH0BsqbhbBtvgwgbn=T0qmg9y9CeGr@xxxxxxxxxxxxxx>, <20110113155344.GN5651@xxxxxxxxxxxxxxxxxxxxxxx>, <BLU157-w507CBBB94539BFDB92B339DAF30@xxxxxxx>, <AANLkTimC9OYTiHYaeeytH5OCN-EF2v6L=KDVMwBjtB0z@xxxxxxxxxxxxxx>, <BLU157-w995512A939ADE678A6401DAF40@xxxxxxx>, <AANLkTikj6gJ2we+3FcfmqdeSkrvBFyj=p6R47UcmS3Rk@xxxxxxxxxxxxxx>, <BLU157-w352F69CD38F5FBFCA60477DAF90@xxxxxxx>, <20110120091934.GG8286@xxxxxxxxxxxxxxxxxxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
it is later found that domain is dying, so when dying alloc page is prohibitted
(XEN) ---domain is 1, max_pages 132096, total_pages 29736
output log from line 914
909 old_page = page; 910 page = mem_sharing_alloc_page(d, gfn, flags & MEM_SHARING_MUST_SUCCEED); 911 if(!page) 912 { 913 mem_sharing_debug_gfn(d, gfn); 914 printk("---domain is %d, max_pages %u, total_pages %u \n", d->is_dying, d->max_pages, d->tot_pages); 915 BUG_ON(!d);
--------------
Well the logic is a bit of complicate, my fix is to set gfn's mfn to INVALID_MFN
876 ret = page_make_private(d, page); 877 /*last_gfn shoule able to be make_private*/ 878 BUG_ON(last_gfn & ret); 879 &n
bsp; if(ret == 0) goto private_page_found; 880 881 ld = rcu_lock_domain_by_id(d->domain_id); 882 BUG_ON(!ld); 883 if(ld->is_dying ) 884 { 885 if(!ld) 8
86 printk("d is NULL %d\n", d->domain_id); 887 else 888 printk("d is dying %d %d\n", d->is_dying, d->domain_id); 889 890 /*decrease page type count and destory gfn*/ 891 put_page_and_type(page); 892 mem_sharing_gfn_destroy(gfn_info, !last_gfn); 893 894 if(last_gfn) 895 mem_sharing_hash_delete(handle); 896 else 897  
; /* Even though we don't allocate a private page, we have to account 898 * for the MFN that originally backed this PFN. */ 899 atomic_dec(&nr_saved_mfns); 900 901 /*set mfn invalid*/ 902 BUG_ON(set_shared_p2m_entry_invalid(d, gfn)==0); 903 if(ld) 904 rcu_unlock_domain(ld); 905 shr_unlock(); 906 return 0; 907 }
Any other suggestions?
> Date: Thu, 20 Jan 2011 09:19:34 +0000 > From: Tim.Deegan@xxxxxxxxxx > To: tinnycloud@xxxxxxxxxxx > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; juihaochiang@xxxxxxxxx > Subject: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare > > At 07:19 +0000 on 20 Jan (1295507976), MaoXiaoyun wrote: > > Hi: > > > > The latest BUG in mem_sharing_alloc_page from mem_sharing_unshare_page. > > I printed heap info, which shows plenty memory left. > > Could domain be NULL during in unshare, or should it be locked by rcu_lock_domain_by_id ? > > > > 'd' probably isn't NULL; more likely is that the domain is not allowed > to have any more memory. You should look at the values of d->max_pages > and d->tot_pages when the failure happens. > > Cheers. > > Tim. > > > -----------code------------ > > 422 extern void pa
gealloc_info(unsigned char key); > > 423 static struct page_info* mem_sharing_alloc_page(struct domain *d, > > 424 unsigned long gfn, > > 425 int must_succeed) > > 426 { > > 427 struct page_info* page; > > 428 struct vcpu *v = current; > > 429 mem_event_request_t req; > > 430 > > 431 page = alloc_domheap_page(d, 0); > > 432 if(page != NULL) return page; > > 433 > > 434 memset(&req, 0, sizeof(req)); > > 435 if(must_succeed) > > 436 { > > 437 /* We do not support 'must_succeed' any more. External operations such > > 438 * as grant table mappings may fail with OOM condition! > > 439 */ > > 440 pagealloc_info('m'); > > 441 BUG(); > > 442 } > > > > -------------serial output------- > > (XEN) Physical memory information: > > (XEN) Xen heap: 0kB free > > (X
EN) heap[14]: 64480kB free > > (XEN) heap[15]: 131072kB free > > (XEN) heap[16]: 262144kB free > > (XEN) heap[17]: 524288kB free > > (XEN) heap[18]: 1048576kB free > > (XEN) heap[19]: 1037128kB free > > (XEN) heap[20]: 3035744kB free > > (XEN) heap[21]: 2610292kB free > > (XEN) heap[22]: 2866212kB free > > (XEN) Dom heap: 11579936kB free > > (XEN) Xen BUG at mem_sharing.c:441 > > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82c4801c0531>] mem_sharing_unshare_page+0x681/0x790 > > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > > (XEN) rax: 0000000000000000 rbx: ffff83040092d808 rcx: 0000000000000096 > > (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff82c48021eac4 > > (XEN) rbp: 0000000000000000 rsp: ffff82c48035f5e8 r8: 0000000000000001 > > (XE
N) r9: 0000000000000001 r10: 00000000fffffff5 r11: 0000000000000008 > > (XEN) r12: ffff8305c61f3980 r13: ffff83040eff0000 r14: 000000000001610f > > (XEN) r15: ffff82c48035f628 cr0: 000000008005003b cr4: 00000000000026f0 > > (XEN) cr3: 000000052bc4f000 cr2: ffff880120126e88 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > > (XEN) Xen stack trace from rsp=ffff82c48035f5e8: > > (XEN) ffff8305c61f3990 00018300bf2f0000 ffff82f604e6a4a0 000000002ab84078 > > (XEN) ffff83040092d7f0 00000000001b9c9c ffff8300bf2f0000 000000010eff0000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000d0000010f ffff8305447ec000 000000000001610f > > (XEN) 0000000000273525 ffff82c48035f724 ffff830502c705a0 ffff82f602c89a00 > > (XEN) ffff83040eff0000 ffff82c48010bfa9 ffff830572c5dbf0 000000000029e07f > > (XEN) 0000000000000
000 ffff830572c5dbf0 000000008035fbe8 ffff82c48035f6f8 > > (XEN) 0000000100000002 ffff830572c5dbf0 ffff83063fc30000 ffff830572c5dbf0 > > (XEN) 0000035900000000 ffff88010d14bbe0 ffff880159e09000 00003f7e00000002 > > (XEN) ffffffffffff0032 ffff88010d14bbb0 ffff830438dfa920 0000000d8010a650 > > (XEN) 0000000000000100 ffff83063fc30000 ffff8305f9203730 ffffffffffffffea > > (XEN) ffff88010d14bb70 0000000000000000 ffff88010d14bc10 ffff88010d14bbc0 > > (XEN) 0000000000000002 ffff82c48010da9b 0000000000000202 ffff82c48035fec8 > > (XEN) ffff82c48035f7c8 00000000801880af ffff83063fc30010 0000000000000000 > > (XEN) ffff82c400000008 ffff82c48035ff28 0000000000000000 ffff88010d14bbc0 > > (XEN) ffff880159e08000 0000000000000000 0000000000000000 00020000000002d7 > > (XEN) 00000000003f2b38 ffff8305b1f4b6b8 ffff8305b30f0000 ffff880159e09000 > > (XEN) 0000000000000000 0000000000000000 000200000000
008a 00000000003ed1f9 > > (XEN) ffff83063fc26450 ffff8305b30f0000 ffff880159e0a000 0000000000000000 > > (XEN) 0000000000000000 00020000000001fa 000000000029e2ba ffff83063fc26fd0 > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801c0531>] mem_sharing_unshare_page+0x681/0x790 > > (XEN) [<ffff82c48010bfa9>] gnttab_map_grant_ref+0xbf9/0xe30 > > (XEN) [<ffff82c48010da9b>] do_grant_table_op+0x14b/0x1080 > > (XEN) [<ffff82c48010fb44>] do_xen_version+0xb4/0x480 > > (XEN) [<ffff82c4801b8215>] set_p2m_entry+0x85/0xc0 > > (XEN) [<ffff82c4801bc92e>] set_shared_p2m_entry+0x1be/0x2f0 > > (XEN) [<ffff82c480121c4c>] xmem_pool_free+0x2c/0x310 > > (XEN) [<ffff82c4801bfaf8>] mem_sharing_share_pages+0xd8/0x3d0 > > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70 > > (XEN) [<ffff82c48011c519>] cpumask_raise_softirq+0x89/0
xa0 > > (XEN) [<ffff82c480118351>] csched_vcpu_wake+0x101/0x1b0 > > (XEN) [<ffff82c48014717d>] vcpu_kick+0x1d/0x80 > > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70 > > (XEN) [<ffff82c48015a1d8>] get_page+0x28/0xf0 > > (XEN) [<ffff82c48015ed72>] do_update_descriptor+0x1d2/0x210 > > (XEN) [<ffff82c480113d7e>] do_multicall+0x14e/0x340 > > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) Xen BUG at mem_sharing.c:441 > > (XEN) **************************************** > > (XEN) > > (XEN) Manual reset required ('noreboot' specified) > > > > > Date: Mon, 17 Jan 2011 17:02:02 +0800 > > > Subject: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare
> > > From: juihaochiang@xxxxxxxxx > > > To: tinnycloud@xxxxxxxxxxx > > > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; tim.deegan@xxxxxxxxxx > > > > > > Hi, tinnycloud: > > > > > > Do you have xenpaging tools running properly? > > > I haven't gone through that one, but it seems you have run out of memory. > > > When this case happens, mem_sharing will request memory to the > > > xenpaging daemon, which tends to page out and free some memory. > > > Otherwise, the allocation would fail. > > > Is this your scenario? > > > > > > Bests, > > > Jui-Hao > > > > > > 2011/1/17 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>: > > > > Another failure on BUG() in mem_sharing_alloc_page() > > > > > > > > memset(&req, 0, sizeof(req)); > > > >
if(must_succeed) > > > > { > > > > /* We do not support 'must_succeed' any more. External operations > > > > such > > > > * as grant table mappings may fail with OOM condition! > > > > */ > > > > BUG();===================>bug here > > > > } > > > > else > > > > { > > > > /* All foreign attempts to unshare pages should be handled through > > > > * 'must_succeed' case. */ > > > > ASSERT(v->domain->domain_id == d->domain_id); > > > > vcpu_pause_nosync(v); > > > > req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; > > > > } > > > > > > -- > Tim Deegan <Tim.Deegan@xxxxxxxxxx> > Principal Software Engineer, Xen Platform Team > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|