xen-devel
[Xen-devel] RE: [PATCH] mem_sharing: fix race condition of nominate and
To: |
<tim.deegan@xxxxxxxxxx> |
Subject: |
[Xen-devel] RE: [PATCH] mem_sharing: fix race condition of nominate and unshare |
From: |
MaoXiaoyun <tinnycloud@xxxxxxxxxxx> |
Date: |
Thu, 20 Jan 2011 17:37:56 +0800 |
Cc: |
xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, juihaochiang@xxxxxxxxx |
Delivery-date: |
Thu, 20 Jan 2011 01:38:42 -0800 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
Importance: |
Normal |
In-reply-to: |
<20110120091934.GG8286@xxxxxxxxxxxxxxxxxxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<BLU157-w1861EFE53CB51FC710011FDAF10@xxxxxxx>, <AANLkTimOz_uauDEnu_XaPEgwD1EZJWEgOO1oiFccFNs1@xxxxxxxxxxxxxx>, <20110113092427.GJ5651@xxxxxxxxxxxxxxxxxxxxxxx>, <AANLkTinSga8xDkuH0BsqbhbBtvgwgbn=T0qmg9y9CeGr@xxxxxxxxxxxxxx>, <20110113155344.GN5651@xxxxxxxxxxxxxxxxxxxxxxx>, <BLU157-w507CBBB94539BFDB92B339DAF30@xxxxxxx>, <AANLkTimC9OYTiHYaeeytH5OCN-EF2v6L=KDVMwBjtB0z@xxxxxxxxxxxxxx>, <BLU157-w995512A939ADE678A6401DAF40@xxxxxxx>, <AANLkTikj6gJ2we+3FcfmqdeSkrvBFyj=p6R47UcmS3Rk@xxxxxxxxxxxxxx>, <BLU157-w352F69CD38F5FBFCA60477DAF90@xxxxxxx>, <20110120091934.GG8286@xxxxxxxxxxxxxxxxxxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
I'll do the check. Thanks.
Well, during the test, I still have another two failures
1) when all domain are destroyed, the handle in hash table are not decrease to 0 sometimes.
I print the handle count, most of time it is 0 after all domain destroyed.
(XEN) ===>total handles 2 total gfns 2 next_handle: 713269
2) set_shared_p2m_entry failed
745 list_for_each_safe(le, te, &ce->gfns) 746 { 747 gfn = list_entry(le, struct gfn_info, list); 748 /* Get the source page and type, this should never fail 749 * because we are under shr lock, and got non-null se */ 750 BUG_ON(!get_page_and_type(spage, dom_cow, PGT_shared_page)); 751 /* Move the gfn_info from ce list to se list */ 752 list_del(&gfn->list); 753 d = get_domain_by_id(gfn->domain); 754 // mem_sharing_debug_gfn(d, gfn->gfn); 755 &n
bsp; BUG_ON(!d); 756 BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn) == 0); 757 put_domain(d); 758 list_add(&gfn->list, &se->gfns); 759 put_page_and_type(cpage); 760 // mem_sharing_debug_gfn(d, gfn->gfn);
(XEN) printk: 33 messages suppressed. (XEN) p2m.c:2442:d0 set_mmio_p2m_entry: set_p2m_entry failed! mfn=0023dbb7 (XEN) Xen BUG at mem_sharing.c:756 (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c4801bfd90>] mem_sharing_share_pages+0x370/0x3d0 (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff83040ed20000 rcx: 0000000000000092 (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff82c48021eac4 (XEN) rbp: ffff8305a4bbe1b0 rsp: ffff82c48035fc58 r8: 0000000000000001 (XEN) r9: 0000000000000000 r10: 00000000fffffffb r11: ffff82c4801318d0 (XEN) r12: ffff8305a4bbe1a0 r13: ffff8305a61d42a0 r14: ffff82f6047b76e0 (XEN) r15: ffff8304e5e918c8 cr0: 0000000
080050033 cr4: 00000000000026f0 (XEN) cr3: 00000005203fc000 cr2: 00000000027b8000
> Date: Thu, 20 Jan 2011 09:19:34 +0000 > From: Tim.Deegan@xxxxxxxxxx > To: tinnycloud@xxxxxxxxxxx > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; juihaochiang@xxxxxxxxx > Subject: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare > > At 07:19 +0000 on 20 Jan (1295507976), MaoXiaoyun wrote: > > Hi: > > > > The latest BUG in mem_sharing_alloc_page from mem_sharing_unshare_page. > > I printed heap info, which shows plenty memory left. > > Could domain be NULL during in unshare, or should it be locked by rcu_lock_domain_by_id ? > > > > 'd' probably isn't NULL; more likely is that the domain is not allowed > to have any more memory. You should look at the values of d->max_pages > and d->tot_pages when the failure happens. > > Cheers. > > Tim. > > > -----------code------------ > > 422 extern void pa
gealloc_info(unsigned char key); > > 423 static struct page_info* mem_sharing_alloc_page(struct domain *d, > > 424 unsigned long gfn, > > 425 int must_succeed) > > 426 { > > 427 struct page_info* page; > > 428 struct vcpu *v = current; > > 429 mem_event_request_t req; > > 430 > > 431 page = alloc_domheap_page(d, 0); > > 432 if(page != NULL) return page; > > 433 > > 434 memset(&req, 0, sizeof(req)); > > 435 if(must_succeed) > > 436 { > > 437 /* We do not support 'must_succeed' any more. External operations such > > 438 * as grant table mappings may fail with OOM condition! > > 439 */ > > 440 pagealloc_info('m'); > > 441 BUG(); > > 442 } > > > > -------------serial output------- > > (XEN) Physical memory information: > > (XEN) Xen heap: 0kB free > > (X
EN) heap[14]: 64480kB free > > (XEN) heap[15]: 131072kB free > > (XEN) heap[16]: 262144kB free > > (XEN) heap[17]: 524288kB free > > (XEN) heap[18]: 1048576kB free > > (XEN) heap[19]: 1037128kB free > > (XEN) heap[20]: 3035744kB free > > (XEN) heap[21]: 2610292kB free > > (XEN) heap[22]: 2866212kB free > > (XEN) Dom heap: 11579936kB free > > (XEN) Xen BUG at mem_sharing.c:441 > > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82c4801c0531>] mem_sharing_unshare_page+0x681/0x790 > > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > > (XEN) rax: 0000000000000000 rbx: ffff83040092d808 rcx: 0000000000000096 > > (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff82c48021eac4 > > (XEN) rbp: 0000000000000000 rsp: ffff82c48035f5e8 r8: 0000000000000001 > > (XE
N) r9: 0000000000000001 r10: 00000000fffffff5 r11: 0000000000000008 > > (XEN) r12: ffff8305c61f3980 r13: ffff83040eff0000 r14: 000000000001610f > > (XEN) r15: ffff82c48035f628 cr0: 000000008005003b cr4: 00000000000026f0 > > (XEN) cr3: 000000052bc4f000 cr2: ffff880120126e88 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > > (XEN) Xen stack trace from rsp=ffff82c48035f5e8: > > (XEN) ffff8305c61f3990 00018300bf2f0000 ffff82f604e6a4a0 000000002ab84078 > > (XEN) ffff83040092d7f0 00000000001b9c9c ffff8300bf2f0000 000000010eff0000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000d0000010f ffff8305447ec000 000000000001610f > > (XEN) 0000000000273525 ffff82c48035f724 ffff830502c705a0 ffff82f602c89a00 > > (XEN) ffff83040eff0000 ffff82c48010bfa9 ffff830572c5dbf0 000000000029e07f > > (XEN) 0000000000000
000 ffff830572c5dbf0 000000008035fbe8 ffff82c48035f6f8 > > (XEN) 0000000100000002 ffff830572c5dbf0 ffff83063fc30000 ffff830572c5dbf0 > > (XEN) 0000035900000000 ffff88010d14bbe0 ffff880159e09000 00003f7e00000002 > > (XEN) ffffffffffff0032 ffff88010d14bbb0 ffff830438dfa920 0000000d8010a650 > > (XEN) 0000000000000100 ffff83063fc30000 ffff8305f9203730 ffffffffffffffea > > (XEN) ffff88010d14bb70 0000000000000000 ffff88010d14bc10 ffff88010d14bbc0 > > (XEN) 0000000000000002 ffff82c48010da9b 0000000000000202 ffff82c48035fec8 > > (XEN) ffff82c48035f7c8 00000000801880af ffff83063fc30010 0000000000000000 > > (XEN) ffff82c400000008 ffff82c48035ff28 0000000000000000 ffff88010d14bbc0 > > (XEN) ffff880159e08000 0000000000000000 0000000000000000 00020000000002d7 > > (XEN) 00000000003f2b38 ffff8305b1f4b6b8 ffff8305b30f0000 ffff880159e09000 > > (XEN) 0000000000000000 0000000000000000 000200000000
008a 00000000003ed1f9 > > (XEN) ffff83063fc26450 ffff8305b30f0000 ffff880159e0a000 0000000000000000 > > (XEN) 0000000000000000 00020000000001fa 000000000029e2ba ffff83063fc26fd0 > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801c0531>] mem_sharing_unshare_page+0x681/0x790 > > (XEN) [<ffff82c48010bfa9>] gnttab_map_grant_ref+0xbf9/0xe30 > > (XEN) [<ffff82c48010da9b>] do_grant_table_op+0x14b/0x1080 > > (XEN) [<ffff82c48010fb44>] do_xen_version+0xb4/0x480 > > (XEN) [<ffff82c4801b8215>] set_p2m_entry+0x85/0xc0 > > (XEN) [<ffff82c4801bc92e>] set_shared_p2m_entry+0x1be/0x2f0 > > (XEN) [<ffff82c480121c4c>] xmem_pool_free+0x2c/0x310 > > (XEN) [<ffff82c4801bfaf8>] mem_sharing_share_pages+0xd8/0x3d0 > > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70 > > (XEN) [<ffff82c48011c519>] cpumask_raise_softirq+0x89/0
xa0 > > (XEN) [<ffff82c480118351>] csched_vcpu_wake+0x101/0x1b0 > > (XEN) [<ffff82c48014717d>] vcpu_kick+0x1d/0x80 > > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70 > > (XEN) [<ffff82c48015a1d8>] get_page+0x28/0xf0 > > (XEN) [<ffff82c48015ed72>] do_update_descriptor+0x1d2/0x210 > > (XEN) [<ffff82c480113d7e>] do_multicall+0x14e/0x340 > > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) Xen BUG at mem_sharing.c:441 > > (XEN) **************************************** > > (XEN) > > (XEN) Manual reset required ('noreboot' specified) > > > > > Date: Mon, 17 Jan 2011 17:02:02 +0800 > > > Subject: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare
> > > From: juihaochiang@xxxxxxxxx > > > To: tinnycloud@xxxxxxxxxxx > > > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; tim.deegan@xxxxxxxxxx > > > > > > Hi, tinnycloud: > > > > > > Do you have xenpaging tools running properly? > > > I haven't gone through that one, but it seems you have run out of memory. > > > When this case happens, mem_sharing will request memory to the > > > xenpaging daemon, which tends to page out and free some memory. > > > Otherwise, the allocation would fail. > > > Is this your scenario? > > > > > > Bests, > > > Jui-Hao > > > > > > 2011/1/17 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>: > > > > Another failure on BUG() in mem_sharing_alloc_page() > > > > > > > > memset(&req, 0, sizeof(req)); > > > >
if(must_succeed) > > > > { > > > > /* We do not support 'must_succeed' any more. External operations > > > > such > > > > * as grant table mappings may fail with OOM condition! > > > > */ > > > > BUG();===================>bug here > > > > } > > > > else > > > > { > > > > /* All foreign attempts to unshare pages should be handled through > > > > * 'must_succeed' case. */ > > > > ASSERT(v->domain->domain_id == d->domain_id); > > > > vcpu_pause_nosync(v); > > > > req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; > > > > } > > > > > > -- > Tim Deegan <Tim.Deegan@xxxxxxxxxx> > Principal Software Engineer, Xen Platform Team > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|