On Thu, Jun 11, 2009 at 10:02:18AM +0100, Ian Campbell wrote:
> On Tue, 2009-06-09 at 13:28 -0400, Jeremy Fitzhardinge wrote:
> > Ian Campbell wrote:
> > > I wonder how this interacts with the logic in
> > > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting for
> > > the (deferred) pin multicall to occur? Hmm, no this is about the
> > > PagePinned flag on the struct page which is out of date WRT the actual
> > > pinned status as Xen sees it -- we update the PagePinned flag early in
> > > xen_pin_page() long before Xen the pin hypercall so this window is the
> > > other way round to what would be needed to trigger this bug.
> > >
> >
> > Yes, it looks like you could get a bad mapping here. An obvious fix
> > would be to defer clearing the pinned flag in the page struct until
> > after the hypercall has issued. That would make the racy
> > kmap_atomic_pte map RO, which would be fine unless it actually tries to
> > modify it (but I can't imagine it would do that unlocked).
>
> But would it redo the mapping after taking the lock? It doesn't look
> like it does (why would it). So we could end up writing to an unpinned
> pte via a R/O mapping.
>
> As an experiment I tried the simple approach of flushing the multicalls
> explicitly in xen_unpin_page and then clearing the Pinned bit and it all
> goes a bit wrong. eip is "ptep->pte_low = 0" so I think the unpinned but
> R/O theory holds...
> BUG: unable to handle kernel paging request at f57ab240
> IP: [<c0486f8b>] unmap_vmas+0x32e/0x5bb
> *pdpt = 00000001002d6001
> Oops: 0003 [#1] SMP
> last sysfs file:
> Modules linked in:
>
> Pid: 719, comm: init Not tainted (2.6.30-rc6-x86_32p-highpte-tip #15)
> EIP: 0061:[<c0486f8b>] EFLAGS: 00010202 CPU: 0
> EIP is at unmap_vmas+0x32e/0x5bb
> EAX: 1dcfb025 EBX: 00000001 ECX: f57ab240 EDX: 00000001
> ESI: 00000001 EDI: 08048000 EBP: e18d8d54 ESP: e18d8cd4
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process init (pid: 719, ti=e18d8000 task=e24d8cc0 task.ti=e18d8000)
> Stack:
> 00000002 e18ce000 e3204a50 00000000 00000000 e18ad058 e18d8d70
> 003ff000
> 08048000 00000001 00000000 00000001 e25d3e00 08050000 e18ce000
> e25d3e00
> f57ab240 00000000 00000000 c17f03ec 00000000 00000000 008d8d4c
> e18bf200
> Call Trace:
> [<c048a752>] ? exit_mmap+0x74/0xba
> [<c0436b92>] ? mmput+0x37/0x81
> [<c04a0e69>] ? flush_old_exec+0x3bc/0x635
> [<c04a0274>] ? kernel_read+0x34/0x46
> [<c04c7594>] ? load_elf_binary+0x329/0x1189
> [<c049c2da>] ? fsnotify_access+0x4f/0x5a
>
> kmap_atomic_pte doesn't get passed the mm so there is no way to get at
> the ptl we would need to do something like clearing the pinned flag
> under the lock in xen_unpin_page and holding the lock in
> xen_kmap_atomic_pte. (I don't know if that would be valid anyway under
> the locking scheme).
>
> The experimental patch:
>
So should I try with only this patch applied?
-- Pasi
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 1729178..e997813 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1108,9 +1108,7 @@ static void __init xen_mark_init_mm_pinned(void)
> static int xen_unpin_page(struct mm_struct *mm, struct page *page,
> enum pt_level level)
> {
> - unsigned pgfl = TestClearPagePinned(page);
> -
> - if (pgfl && !PageHighMem(page)) {
> + if (PagePinned(page) && !PageHighMem(page)) {
> void *pt = lowmem_page_address(page);
> unsigned long pfn = page_to_pfn(page);
> spinlock_t *ptl = NULL;
> @@ -1136,10 +1134,12 @@ static int xen_unpin_page(struct mm_struct *mm,
> struct page *page,
> pfn_pte(pfn, PAGE_KERNEL),
> level == PT_PGD ? UVMF_TLB_FLUSH : 0);
>
> - if (ptl) {
> - /* unlock when batch completed */
> - xen_mc_callback(xen_pte_unlock, ptl);
> - }
> + xen_mc_flush();
> +
> + ClearPagePinned(page);
> +
> + if (ptl)
> + xen_pte_unlock(ptl);
> }
>
> return 0; /* never need to flush on unpin */
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|