WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [memory sharing] bug on get_page_and_type

Hi Tim:
 
         Thanks for  both your advice and quick reply. I will follow.
 
         So at last we should replace shr_lock with p2m_lock.
         But more complicate, it seems both the
         *check action* code and *nominate page* code need to be locked ,right?
         If so, quite a lot of  *check action* codes need to be locked.
 
         Looking forward to your suggestion on the fix, I think Juihao and I can help to do analyse and test. 
 
> Date: Tue, 1 Feb 2011 14:28:16 +0000
> From: Tim.Deegan@xxxxxxxxxx
> To: George.Dunlap@xxxxxxxxxxxxx
> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx; juihaochiang@xxxxxxxxx
> Subject: Re: [Xen-devel] [memory sharing] bug on get_page_and_type
>
> At 12:47 +0000 on 31 Jan (1296478031), George Dunlap wrote:
> > At first glance, it does look like a race, of the type that would
> > normally be done by grabbing the p2m lock. I'd have to take a closer
> > look to be sure, but I won't be able to for at least another day or
> > two. Maybe Tim can comment.
>
> Yes, this is a race; it would be fixed by getting rid of the shr_lock
> and using the p2m lock to cover page-sahring operations, which I think
> is a good idea anyway.
>
> I think I'll have time to look at the locking properly some time this
> month. Of course, I've been wrong before. :)
> < BR>> Tim.
>
> > 2011/1/31 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> > > Hi Georage:
> > >
> > > Thanks for your help, I think I really need to learn hg a bit.
> > > Well, when looking into memory sharing code, a confusion
> > > really brother me a lot.
> > > I think maybe you can enlighten me.
> > >
> > > There has been *check action code* do like below steps
> > > 1) Use gfn_to_mfn to get p2m_type_t p2mt
> > > 2) check the type of p2mt,
> > > 3) do some stuffes base if check on step 2 is failed
> > >
> > > While when *nominate page* to be shared, it does these follow steps
> > > a) check page whether can be shared
> > > b) call page_make_sharable to make sharable
> > > c) if 2) is success, call p2m_change_type to update page type
> > > p2m_ram_shared
> > >
> > > My confusion is if *nominate page* code reach step b, and before it go
> > > to step c
> > > *check action code* reach its step2, it will find page type is
> > > not p2m_ram_shared
> > > and go to step3. And later "nominate page" code go to step (c)
> > >
> > > Take xen/common/memory.c for example,
> > > line 179 is much like step 2, it will goto 196 to clear the _PGC_allocated
> > > flag
> > > if check failed, so it might indicates, *nominate page* code change the page
> > > type
> > > into p2m_ram_shared, but the page actually has alreay been freed.
> > >
> > > Am I right?
> > >
> > >
> > > 155 int guest_remove_page(struct domain *d, unsigned long gmfn)
> > > 156 {
> > > 157 struct page_info *page;
> > > 158 #ifdef C ONFIG_X86
> > > 159 p2m_type_t p2mt;
> > > 160 #endif
> > > 161 unsigned long mfn;
> > > 162
> > > 163 #ifdef CONFIG_X86
> > > 164 mfn = mfn_x(gfn_to_mfn(d, gmfn, &p2mt));
> > > 165 #else
> > > 166 mfn = gmfn_to_mfn(d, gmfn);
> > > 167 #endif
> > > 168 if ( unlikely(!mfn_valid(mfn)) )
> > > 169 {
> > > 170 gdprintk(XENLOG_INFO, "Domain %u page number %lx invalid\n",
> > > 171 d->domain_id, gmfn);
> > > 172 return 0;
> > > 173 }
> > > 174
> > > 175&nbsp ; page = mfn_to_page(mfn);
> > > 176 #ifdef CONFIG_X86
> > > 177 /* If gmfn is shared, just drop the guest reference (which may or
> > > may not
> > > 178 * free the page) */
> > > 179 if(p2m_is_shared(p2mt))
> > > 180 {
> > > 181 put _page_and_type(page);
> > > 182 guest_physmap_remove_page(d, gmfn, mfn, 0);
> > > 183 return 1;
> > > 184 }
> > > 185
> > > 186 #endif /* CONFIG_X86 */
> > > 187 if ( unlikely(!get_page(page, d)) )
> > > 188 {
> > > 189 gdprintk(XENLOG_INFO, "Bad page free for domain %u\n",
> > > d->domain_id);
> > > 190 return 0;
> > > 189 gdprintk(XENLOG_INFO, "Bad page free for domain %u\n",
> > > d->domain_id);
> > > 190 return 0;
> > > 191 }
> > > 192
> > > 193 if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
> > > 194 put_page_and_type(page);
> > > 195
> > > 196 if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
> > > 197 put_page(page);
> > > 198
> > > 199 guest_physmap_remove_page (d, gmfn, mfn, 0);
> > > 200
> > > 201 put_page(page);
> > > 202
> > > 203 return 1;
> > > 204 }
> > >
> > >> Date: Mon, 31 Jan 2011 10:49:37 +0000
> > >> Subject: Re: [Xen-devel] [memory sharing] bug on get_page_and_type
> > >> From: George.Dunlap@xxxxxxxxxxxxx
> > >> To: tinnycloud@xxxxxxxxxxx
> > >> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; tim.deegan@xxxxxxxxxx;
> > >> juihaochiang@xxxxxxxxx
> > >>
> > >> Xiaoyun,
> > >>
> > >> Thanks for all of your work getting page sharing working. When
> > >> submitting patches, please break them down into individual chunks,
> > >> each of which does one thing. Each patch should also include a
> > >> comment saying what the patch does and why, and a Signed-off-by line
> > ; >> indicating that you certify that the copyright holder (possibly you)
> > >> is placing the code under the GPL.
> > >>
> > >> Using mercurial queues:
> > >> http://mercurial.selenic.com/wiki/MqExtension
> > >> and the mercurial patchbomb extension:
> > >> http://mercurial.selenic.com/wiki/PatchbombExtension
> > >> are particularly handy for this process.
> > > &g t;
> > >> Quick comment: The ept-locking part of the patch is going to be
> > >> NACK-ed, as it will cause circular locking dependencies with the
> > >> hap_lock. c/s 22526:7a5ee380 has an explanation of the circular
> > >> dependency, and a fix which doesn't introduce a circular dependency.
> > >> (It may need to be adapted a bit to apply to 4.0-testing)
> > >>
> > >> -George
> > >>
> > >> On Mon, Jan 31, 2011 at 10:13 AM, tinnycloud <tinnycloud@xxxxxxxxxxx>
> > >> wrote:
> > >> > Hi:
> > >> >
> > >> >
> > >> > Attached is the whole patch suit for latest xen-4.0-testing,changeset
> > >> > 21443,
> > >> >
> > >> > It comes from George, Tim and JuiHao, also, has some extra debug info.
> > >> >
> > >> >
> > >> >
> > >> > On most occasion, memory sharing works fine, but still exists a bug.
> > >> >
> > >> > I?ve been tracing this problem for a while.
> > >> >
> > >> > &nbs p; There is a bug on get_page_and_type() in
> > >> > mem_sharing_share_pages()
> > >> >
> > >> >
> > >&g t; >
> > >> >
> > >> > --------------------------------------------mem_sharing_share_pages()-----------------------
> > >> >
> > >> > 789 if(!get_page_and_type(spage, dom_cow, PGT_shared_page)){
> > >> >
> > >> > 790 mem_sharing_debug_gfn(cd, gfn->gfn);
> > >> >
> > >> > 791 mem_sharing_debug_gfn(sd, sgfn->gfn);
> > >> >
> > >> > 792 printk("c->dying %d s->dying %d spage %p se->mfn %lx\n",
> > >> > cd->is_dying, sd->is_dying, spage, se->mfn);
> > >> >
> > >> > 793 printk("Debug page: MFN=%lx is ci=%lx, ti=%lx,
> > >> > owner_id=%d\n",
> > >> >
> > >> > 794 mfn_x(page_to_mfn(spage)),
> > >> >
> > >> > 795 spage->count_info,
&g t; > >> >
> > >> > 796 spage->u.inuse.type_info,
> > >> >
> > >> > 797 page_get_owner(spage)->domain_id);
> > >> >
> > >> > 798 BUG();
> > >> >
> > >> > 799 }
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Below painc log contains the debug info from line 790-798.
> > >> >
> > >> > We saw that 180000000000000 which is PGC_state_free,
> > >> >
> > >> > So it looks like a shared page has been freed unexpectly.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > (XEN) teardown 64
> > >> >
> > >> > (XEN) teardow n 66
> > >> >
> > >> > blktap_sysfs_destroy
> > >> >
> > >> > blktap_sysfs_create: adding attributes for dev ffff880109d0c200
> > >> >
> > >> > blktap_sysfs_destroy
> > >> >
> > >> > __ratelimit: 1 callbacks suppressed
> > >> >
> > >> > blktap_sysfs_destroy
> > >> >
> > >> > (XEN) Debug for domain=83, gfn=1e8dc, Debug page: MFN=1306dc is
> > >> > ci=8000000000000005, ti=8400000000000001, owner_id=32755
> > >> >
> > >> > (XEN) Debug for domain=79, gfn=1dc95, Invalid MFN=ffffffffffffffff
> > >> >
> > >> > (XEN) c->dying 0 s->dying 0 spage ffff82f60a0f12a0 se->mfn 507895
> > >> >
> > >> > (XEN) Debug page: MFN=507895 is ci= 1800000000 00000,
> > >> > ti=8400000000000001,
> > >> > owner_id=32755
> > >> >
> > >> > (XEN) Xen BUG at mem_sharing.c:798
> > >> >
> > >> > (XEN) ----[ Xen-4.0.2-rc2-pre x86_64 debug=n Not tainted ]----
> > >> >
> > >> > (XEN) CPU: 0
> > >> >
> > >> > (XEN) RIP: e008:[<ffff82c4801c3760>]
> > >> > mem_sharing_share_pages+0x5b0/0x5d0
> > >> >
> > >> > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor
> > >> >
> > >> > (XEN) rax: 0000000000000000 rbx: ffff83044ef76000 rcx:
> > >> > 0000000000000092
> > >> >
> > >> > (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi:
> > >> > ffff82c4802237c4
> > >> >
> > >> > (XEN) rbp: ffff83030d99e310 rsp: ffff82c48035fc48 r8:
> > >> > 0000000000000001
> > >> >
> > >> > (XEN) r9: 0000000000000001 r10: 00000000fffffff8 r11:
> > >> > 0000000000000005
> > >> >
> > >> > (XEN) r12: ffff83050b558000 r13: ffff830626ec5740 r14:
> > >> > 0000000000347967
> > >> >
> > >> > (XEN) r15: ffff83030d99e068 cr0: 0000000080050033 cr4:
> > >> > 00000000000026f0
> > >> >
> > >> > (XEN) cr3: 000000010e642000 cr2: 00002abdc4809000
> > >> >
> > >> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> > >> >
> > >> > (XEN) Xen stack trace from rsp=ffff82c48035fc48:
> > >> >
> > >> > (XEN) 000000000001e8dc ffff83030d99e068 00000000005078 95
> > >> > ffff83030d99e050
> > >> >
> > >> > (XEN) ffff82f60a0f12a0 ffff82f60260db80 ffff83030d99e300
> > >> > ffff830626afbbd0
> > >> >
> > >> > (XEN)&nb sp; 0000000080372980 ffff82c48035fe38 ffff83023febe000
> > >> > 00000000008f7000
> > >> >
> > >> > (XEN) 0000000000305000 0000000000000006 0000000000000006
> > >> > ffff82c4801c44a4
> > >> >
> > >> > (XEN) ffff82c480258188 ffff82c48011cb89 000000000001e8dc
> > >> > fffffffffffffff3
> > >> >
> > >> > (XEN) ffff82c48035fe28 ffff82c480148503 000003d932971281
> > >> > 0000000000000080
> > >> >
> > >> > (XEN) 000003d9329611d6 ffff82c48018aedf ffff82c4803903a8
> > >> > ffff82c48035fd38
> > >> >
> > >> > (XEN) ffff82c480390380 ffff82c48035fe28 0000000000000002
> > >> > 0000000000000000
> > >> >
> > >> > (XEN) 0000000000000002 0000000000000000 ffff83023febe000
> > >> > ffff83023ff80080
> > >> >
> > >> > (XEN) ffff82c48035fe58 0000000000000000 ffff82c48022ca80
> > >> > 0000000000000000
> > >> >
> > >> > (XEN) 0000000000000002 ffff82c48018a 452 0000000000000000
> > >> > ffff82c48016f6bb
> > >> >
> > >> > (XEN) ffff82c48035fe58 ffff82c4801538ea 000000023ab79067
> > >> > fffffffffffffff3
> > >> >
> > >> > (XEN) ffff82c48035fe28 00000000008f7000 0000000000305000
> > >> > 0000000000000006
> > >> >
> > >> > (XEN) 0 000000000000006 ffff82c4801042b3 0000000000000000
> > >> > ffff82c48010d2a5
> > >> >
> > >> > (XEN) ffff830477fb51e8 0000000000000000 00007ff000000200
> > >> > ffff8300bf76e000
> > >> >
> > >> > (XEN) 0000000600000039 0000000000000000 00007fc09d744003
> > >> > 0000000000325258
> > >> >
> > >> > (XEN) 0000000000347967 ffffffffff600429 000000004d463b17
> > >> > 0000000000062fe2
> > >> >
> > >> > (XEN) 0000000000000000 00007fc09d744070 00007fc09d744000
> > >> > 00007fffcfec0d20
> > >> >
> > >> > (XEN) 00007fc09d744078 0000000000430fd8 00007fffcfec0d88
> > >> > 00000000019f1ca8
> > >> >
> > >> > (XEN) 0000f05c00000000 00007fc09f4c2718 0000000000000000
> &g t; >> > 0000000000000246
> > >> >
> > >> > (XEN) Xen call trace:
> > >> >
> > >> > (XEN) [<ffff82c4801c3760>] mem_sharing_share_pages+0x5b0/0x5d0
> > >> >
> > >> > (XEN) [<ffff82c4801c44a4>] mem_sharing_domctl+0xe4/0x130
> > >> >
> > >> > (XEN) [<ffff82c48011cb89>] cpumask_raise_softirq+0x89/0xa0
> > >> >
> > >> > (XEN) [<ffff82c480148503>] arch_do_domctl+0x14f3/0x22d0
> > >> >
> > >> > (XEN) [<ffff82c48018aedf>] handle_hpet_broadcast+0x16f/0x1d0
> > >> >
> > >> > (XEN) [<ffff82c48018a452>] hpet_legacy_irq_tick+0x42/0x50
> > >> >
> > >> > (XEN) [<ffff82c48016f6bb>] timer_interrupt+0xb/0x130
> > >> >
> > >> > (XEN) [<ffff82c4801538ea> ] ack_edge_ioapic_irq+0x2a/0x70
> > >> >
> > >> > (XEN) [<ffff82c4801042b3>] do_domctl+0x163/0xfe0
> > >> >
> > >> > (XEN) [<ffff82c48010d2a5>] do_grant_table_op+0x75/0x1ad0
> > >> >
> > >> > (XEN) [<ffff82c4801e7169>] syscall_enter+0xa9/0xae
> > >> >
> > >> > (XEN)
> > >> >
> > >> > (XEN)
> > >> >
> > >> > (XEN) ****************************************
> > >> >
> > >> > (XEN) Panic on CPU 0:
> > >> >
> > >> > (XEN) Xen BUG at mem_sharing.c:798
> > >> >
> > >> > (XEN) ****************************************
> > >> >
> > >> > (XEN)
> > >> >
> > & gt;> > (XEN) Manual reset required ('noreboot' specified)
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > Xen-devel mailing list
> > >> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > >> > http://lists.xensource.com/xen- devel
> > >> >
> > >> >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-devel
> > >
> > >
>
> --
> Tim Deegan <Tim.Deegan@xxxxxxxxxx>
> Principal Software Engineer, Xen Platform Team
> Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel