Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 p

On 06/11/09 02:18, Ian Campbell wrote:
> Pasi, to validate the theory that you are seeing races between unpinning
> and kmap_atomic_pte can you give this biguglystick approach to solving
> it a go.
>   

I gave up trying to solve this in any kind of clever way and just
decided to go with a slightly cleaned-up version of this patch. 
Unfortunately, I don't think it actually solves the problem because it
doesn't prevent unpin from happening while the page is still kmapped; it
just ends up using the spinlock as a barrier to move the race around to
some timing which is presumably mostly avoids the problem.

In principle the fix is to take the lock in xen_kmap_atomic and release
it in xen_kunmap_atomic.  Unfortunately this is pretty ugly and complex
because kmaps are 1) inherently per-cpu, and 2) there can be 2 levels of
kmapping at once.  This means that we'd need 2 per-cpu locks to allow us
to hold these locks for the mapping duration without introducing
deadlocks.  However unpin (and pin, in principle) need to take *all*
these locks to exclude kmap on all cpus.

This is total overkill, since we only really care about excluding kmap
and unpin from a given pagetable, which suggests that the locks should
be part of the mm rather than global.  Unfortunately k(un)map_atomic_pte
don't get the mm of the pagetable they're trying to pin, and I don't
think we can assume its the current mm.

Another (pretty hideous) approach might be to make unpin check the state
of the KMAP_PTE[01] slots in each CPU's kmap fixmaps and see if a
mapping exists for a page that its currently unpinning.  This also has
the problem of being inherently racy; if we unpin the page, there's
going to be a little window after the unpin and before the kmap pte
update (even if they're back-to-back batched hypercalls).

I guess we could have a global rw spinlock; kmap/unmap takes it for
reading, and so can be concurrent with all other kmaps, but pin/unpin
takes it for writing to exclude them.  That would work, but runs the
risk of pin/unpin from being livelocked out (I don't think rwspins will
block new readers if there's a pending writer).  Ugly, but its the only
thing I can think of which actually solves the problem.

Oh, crap, we don't have a kunmap_atomic_pte hook.

Thoughts?  Am I overthinking this and missing something obvious?

(PS: avoid CONFIG_HIGHPTE)

    J

> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 1729178..beeb8e8 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1145,9 +1145,12 @@ static int xen_unpin_page(struct mm_struct *mm, struct 
> page *page,
>       return 0;               /* never need to flush on unpin */
>  }
>  
> +static DEFINE_SPINLOCK(hack_lock); /* Hack to sync unpin against 
> kmap_atomic_pte */
> +
>  /* Release a pagetables pages back as normal RW */
>  static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)
>  {
> +     spin_lock(&hack_lock);
>       xen_mc_batch();
>  
>       xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
> @@ -1173,6 +1176,7 @@ static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t 
> *pgd)
>       __xen_pgd_walk(mm, pgd, xen_unpin_page, USER_LIMIT);
>  
>       xen_mc_issue(0);
> +     spin_unlock(&hack_lock);
>  }
>  
>  static void xen_pgd_unpin(struct mm_struct *mm)
> @@ -1521,6 +1525,9 @@ static void xen_pgd_free(struct mm_struct *mm, pgd_t 
> *pgd)
>  static void *xen_kmap_atomic_pte(struct page *page, enum km_type type)
>  {
>       pgprot_t prot = PAGE_KERNEL;
> +     void *ret;
> +
> +     spin_lock(&hack_lock);
>  
>       if (PagePinned(page))
>               prot = PAGE_KERNEL_RO;
> @@ -1530,7 +1537,11 @@ static void *xen_kmap_atomic_pte(struct page *page, 
> enum km_type type)
>                      page_to_pfn(page), type,
>                      (unsigned long)pgprot_val(prot) & _PAGE_RW ? "WRITE" : 
> "READ");
>  
> -     return kmap_atomic_prot(page, type, prot);
> +     ret = kmap_atomic_prot(page, type, prot);
> +
> +     spin_unlock(&hack_lock);
> +
> +     return ret;
>  }
>  #endif
>  
>
>
>   


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops do