Olaf,
> On Wed, Nov 09, Andres Lagar-Cavilla wrote:
>
>> After a bit of thinking, things are far more complicated. I don't think
>> this is a "race." If the pager removed a page that later gets scheduled
>> by
>> the guest OS for IO, qemu will want to foreign-map that. With the
>> hypervisor returning ENOENT, the foreign map will fail, and there goes
>> qemu.
>
> The tools are supposed to catch ENOENT and try again.
> linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
> appears to do that as well. What code path uses qemu that leads to a
> crash?
The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
it isn't on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using?
And for backend drivers implemented in the kernel (netback, etc), there is
no retrying.
All those ram_paging types and their interactions give me a headache, but
I'll trust you that only one event is put in the ring.
I'm using 24066:54a5e994a241. I start windows 7, make xenpaging try to
evict 90% of the RAM, qemu lasts for about two seconds. Linux fights
harder, but qemu also dies. No pv drivers. I haven't been able to trace
back the qemu crash (segfault on a NULL ide_if field for a dma callback)
to the exact paging action yet, but no crashes without paging.
Andres
>
>> I guess qemu/migrate/libxc could retry until the pager is done and the
>> mapping succeeds. It will be delicate. It won't work for pv backends. It
>> will flood the mem_event ring.
>
> There will no flood, only one request is sent per gfn in
> p2m_mem_paging_populate().
>
> Olaf
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|