On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> Olaf,
> > On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> >
> >> After a bit of thinking, things are far more complicated. I don't think
> >> this is a "race." If the pager removed a page that later gets scheduled
> >> by
> >> the guest OS for IO, qemu will want to foreign-map that. With the
> >> hypervisor returning ENOENT, the foreign map will fail, and there goes
> >> qemu.
> >
> > The tools are supposed to catch ENOENT and try again.
> > linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
> > appears to do that as well. What code path uses qemu that leads to a
> > crash?
>
> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> it isn't on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using?
I'm running SLES11 as dom0. Now thats really odd that there is no ENOENT
handling in mainline, I will go and check the code.
> And for backend drivers implemented in the kernel (netback, etc), there is
> no retrying.
A while ago I fixed the grant status handling, perhaps that change was
never forwarded to pvops, at least I didnt do it at that time.
> I'm using 24066:54a5e994a241. I start windows 7, make xenpaging try to
> evict 90% of the RAM, qemu lasts for about two seconds. Linux fights
> harder, but qemu also dies. No pv drivers. I haven't been able to trace
> back the qemu crash (segfault on a NULL ide_if field for a dma callback)
> to the exact paging action yet, but no crashes without paging.
If the kernel is pvops it may need some audit to check the ENOENT
handling.
Olaf
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|