On Thu, Jun 11, 2009 at 08:18:15AM -0700, Jeremy Fitzhardinge wrote:
> On 06/11/09 02:02, Ian Campbell wrote:
> >On Tue, 2009-06-09 at 13:28 -0400, Jeremy Fitzhardinge wrote:
> >
> >>Ian Campbell wrote:
> >>
> >>>I wonder how this interacts with the logic in
> >>>arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting for
> >>>the (deferred) pin multicall to occur? Hmm, no this is about the
> >>>PagePinned flag on the struct page which is out of date WRT the actual
> >>>pinned status as Xen sees it -- we update the PagePinned flag early in
> >>>xen_pin_page() long before Xen the pin hypercall so this window is the
> >>>other way round to what would be needed to trigger this bug.
> >>>
> >>>
> >>Yes, it looks like you could get a bad mapping here. An obvious fix
> >>would be to defer clearing the pinned flag in the page struct until
> >>after the hypercall has issued. That would make the racy
> >>kmap_atomic_pte map RO, which would be fine unless it actually tries to
> >>modify it (but I can't imagine it would do that unlocked).
> >>
> >
> >But would it redo the mapping after taking the lock? It doesn't look
> >like it does (why would it). So we could end up writing to an unpinned
> >pte via a R/O mapping.
> >
>
> Hm, yep. One thing I noticed is that set_pte() is used very rarely, so
> it would be no cost to always use a hypercall in that case. But
> xen_set_pte_at() ends up calling xen_set_pte() as well, and I think
> that's more common. Certainly we need to make sure that we're actually
> taking advantage of late-pin by direct writing unpinned ptes.
>
> I've been thinking of rearranging the set_pte(_at) pvops a little bit
> anyway; its not obvious we're really getting much benefit from using the
> update_va_mapping hypercall, and if we're not using it, then the
> set_pte_at pvop is taking a lot of unused parameters.
>
> If we switch to just using mmu_update, then we can just pass the address
> and pte value. But we could also pass the struct page * (which makes a
> bit of conceptual sense), so we could easy directly test whether the pte
> is pinned, and either use a direct write or hypercall accordingly.
>
> >As an experiment I tried the simple approach of flushing the multicalls
> >explicitly in xen_unpin_page and then clearing the Pinned bit and it all
> >goes a bit wrong. eip is "ptep->pte_low = 0" so I think the unpinned but
> >R/O theory holds...
> >
>
> Yes, I think the theory is sound. But I'm curious why Pasi seems to be
> able to hit the race easily, but we have not...
>
Yeah, I've been thinking about that too..
My hardware is ~5 years old, but it has been running stable with multiple
distributions and kernel versions, on various types of loads. I think the
hardware should be all fine.
Atm I've been running Fedora 10 and Fedora 11 on it, both seem stable with
the distro-provided kernels.
ie. I'm only seeing the problem on pv_ops dom0 kernel.
My installation is pretty basic/standard.. root-fs on LVM-volume. Can't
really think of anything special..
And the problem seems to be _always_ reproducible with a simple
"make clean && make bzImage && make modules" command on dom0 ..
Anyway, I'll continue testing. Hopefully we get this hunted down :)
-- Pasi
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|