On Tue, 2009-06-09 at 13:28 -0400, Jeremy Fitzhardinge wrote:
> Ian Campbell wrote:
> > I wonder how this interacts with the logic in
> > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting for
> > the (deferred) pin multicall to occur? Hmm, no this is about the
> > PagePinned flag on the struct page which is out of date WRT the actual
> > pinned status as Xen sees it -- we update the PagePinned flag early in
> > xen_pin_page() long before Xen the pin hypercall so this window is the
> > other way round to what would be needed to trigger this bug.
> >
>
> Yes, it looks like you could get a bad mapping here. An obvious fix
> would be to defer clearing the pinned flag in the page struct until
> after the hypercall has issued. That would make the racy
> kmap_atomic_pte map RO, which would be fine unless it actually tries to
> modify it (but I can't imagine it would do that unlocked).
But would it redo the mapping after taking the lock? It doesn't look
like it does (why would it). So we could end up writing to an unpinned
pte via a R/O mapping.
As an experiment I tried the simple approach of flushing the multicalls
explicitly in xen_unpin_page and then clearing the Pinned bit and it all
goes a bit wrong. eip is "ptep->pte_low = 0" so I think the unpinned but
R/O theory holds...
BUG: unable to handle kernel paging request at f57ab240
IP: [<c0486f8b>] unmap_vmas+0x32e/0x5bb
*pdpt = 00000001002d6001
Oops: 0003 [#1] SMP
last sysfs file:
Modules linked in:
Pid: 719, comm: init Not tainted (2.6.30-rc6-x86_32p-highpte-tip #15)
EIP: 0061:[<c0486f8b>] EFLAGS: 00010202 CPU: 0
EIP is at unmap_vmas+0x32e/0x5bb
EAX: 1dcfb025 EBX: 00000001 ECX: f57ab240 EDX: 00000001
ESI: 00000001 EDI: 08048000 EBP: e18d8d54 ESP: e18d8cd4
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process init (pid: 719, ti=e18d8000 task=e24d8cc0 task.ti=e18d8000)
Stack:
00000002 e18ce000 e3204a50 00000000 00000000 e18ad058 e18d8d70 003ff000
08048000 00000001 00000000 00000001 e25d3e00 08050000 e18ce000 e25d3e00
f57ab240 00000000 00000000 c17f03ec 00000000 00000000 008d8d4c e18bf200
Call Trace:
[<c048a752>] ? exit_mmap+0x74/0xba
[<c0436b92>] ? mmput+0x37/0x81
[<c04a0e69>] ? flush_old_exec+0x3bc/0x635
[<c04a0274>] ? kernel_read+0x34/0x46
[<c04c7594>] ? load_elf_binary+0x329/0x1189
[<c049c2da>] ? fsnotify_access+0x4f/0x5a
kmap_atomic_pte doesn't get passed the mm so there is no way to get at
the ptl we would need to do something like clearing the pinned flag
under the lock in xen_unpin_page and holding the lock in
xen_kmap_atomic_pte. (I don't know if that would be valid anyway under
the locking scheme).
The experimental patch:
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 1729178..e997813 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1108,9 +1108,7 @@ static void __init xen_mark_init_mm_pinned(void)
static int xen_unpin_page(struct mm_struct *mm, struct page *page,
enum pt_level level)
{
- unsigned pgfl = TestClearPagePinned(page);
-
- if (pgfl && !PageHighMem(page)) {
+ if (PagePinned(page) && !PageHighMem(page)) {
void *pt = lowmem_page_address(page);
unsigned long pfn = page_to_pfn(page);
spinlock_t *ptl = NULL;
@@ -1136,10 +1134,12 @@ static int xen_unpin_page(struct mm_struct *mm, struct
page *page,
pfn_pte(pfn, PAGE_KERNEL),
level == PT_PGD ? UVMF_TLB_FLUSH : 0);
- if (ptl) {
- /* unlock when batch completed */
- xen_mc_callback(xen_pte_unlock, ptl);
- }
+ xen_mc_flush();
+
+ ClearPagePinned(page);
+
+ if (ptl)
+ xen_pte_unlock(ptl);
}
return 0; /* never need to flush on unpin */
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|