[Xen-devel] Re: One (possible) x86 get_user_pages bug

To:	Jan Beulich <JBeulich@xxxxxxxxxx>
Subject:	[Xen-devel] Re: One (possible) x86 get_user_pages bug
From:	Nick Piggin <npiggin@xxxxxxxxx>
Date:	Fri, 28 Jan 2011 08:24:58 +1100
Cc:	Kaushik Barde <kbarde@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Kenneth Lee <liguozhu@xxxxxxxxxx>, Nick Piggin <npiggin@xxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, wangzhenguo@xxxxxxxxxx, Xiaowei Yang <xiaowei.yang@xxxxxxxxxx>, linqaingmin <linqiangmin@xxxxxxxxxx>, fanhenglong@xxxxxxxxxx, Wu Fengguang <fengguang.wu@xxxxxxxxx>, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Delivery-date:	Thu, 27 Jan 2011 19:09:14 -0800
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=OyZXEAEfW1DK2bl9uzS5r0JJHC4/D6bc2z9KErujYcI=; b=dUhUUnl1Y7ss+mCpHVOzYYHnXx8QaVYv0KItKx/V2GgDaa5L45YOBG/X46OfuiZAPc jhpFW1VQQGMpBZTnaiMsLZLiPei0lJ47wWTUCWCgdLqMq7shyk2YIBgGFuwRtEh66j+Q 41CBax+Tu7SboieutvFZDgKEbquXp6TsOVJbg=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=BEs+Wkb6c8iLCz/p59Ci5gZgMgOSclYzizikGO48WoYbw9JkKd46e6RqMH5Pnye53a h3hWQ8AUNezpPprDh0g3kaLKWMZKx1Va2m1TLDpuYOdjuvuGJK6srEzyo/OXl3QQivEM CXWuoz5XBpbZcmbD9PfwwScvOWlTszWiWicWY=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<4D41A651020000780002ED36@xxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<4D416D9A.9010603@xxxxxxxxxx> <4D41A651020000780002ED36@xxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Fri, Jan 28, 2011 at 3:07 AM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote:
>>>> On 27.01.11 at 14:05, Xiaowei Yang <xiaowei.yang@xxxxxxxxxx> wrote:
>> We created a scenario to reproduce the bug:
>> ----------------------------------------------------------------
>> // proc1/proc1.2 are 2 threads sharing one page table.
>> // proc1 is the parent of proc2.
>>
>> proc1               proc2          proc1.2
>> ...                 ...            // in gup_pte_range()
>> ...                 ...            pte = gup_get_pte()
>> ...                 ...            page1 = pte_page(pte)  // (1)
>> do_wp_page(page1)   ...            ...
>> ...                 exit_map()     ...
>> ...                 ...            get_page(page1)        // (2)
>> -----------------------------------------------------------------
>>
>> do_wp_page() and exit_map() cause page1 to be released into free list
>> before get_page() in proc1.2 is called. The longer the delay between
>> (1)&(2), the easier the BUG_ON shows.
>
> Other than responded initially, I don't this can happen outside
> of Xen: do_wp_page() won't reach page_cache_release() when
> gup_pte_range() is running for the same mm on another CPU,
> since it can't get past ptep_clear_flush() (waiting for the CPU
> in get_user_pages_fast() to re-enable interrupts).

Yeah, this cannot happen on native.


>> An experimental patch is made to prevent the PTE being modified in the
>> middle of gup_pte_range(). The BUG_ON disappears afterward.
>>
>> However, from the comments embedded in gup.c, it seems deliberate to
>> avoid the lock in the fast path. The question is: if so, how to avoid
>> the above scenario?
>
> Nick, based on your doing of the initial implementation, would
> you be able to estimate whether disabling get_user_pages_fast()
> altogether for Xen would be performing measurably worse than
> adding the locks (but continuing to avoid acquiring mm->mmap_sem)
> as suggested by Xiaowei? That's of course only if the latter is correct
> at all, of which I haven't fully convinced myself yet.

You must have some way to guarantee existence of Linux page
tables when you walk them in order to resolve a TLB refill.

x86 does this with IPI and hardware fill that is atomic WRT interrupts.
So fast gup can disable interrupts to walk page tables, I don't think it
is fragile it is absolutely tied to the system ISA (of course that can
change, but as Peter said, other things will have to change).

Other architectures use RCU for this, so fast gup uses a lockless-
pagecache-alike protcol for that.

If Xen is not using IPIs for flush, it should use whatever locks or
synchronization its TLB refill is using.

Thanks,
Nick

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: One (possible) x86 get_user_pages bug