[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

To:	Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
Subject:	[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
From:	Olaf Hering <olaf@xxxxxxxxx>
Date:	Thu, 10 Nov 2011 11:18:28 +0100
Cc:	keir.xen@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Thu, 10 Nov 2011 02:21:14 -0800
Dkim-signature:	v=1; a=rsa-sha1; c=relaxed/relaxed; t=1320920326; l=1634; s=domk; d=aepfle.de; h=In-Reply-To:Content-Type:MIME-Version:References:Subject:Cc:To:From: Date:X-RZG-CLASS-ID:X-RZG-AUTH; bh=MjFXFvCXKDKfavzefS4AR3TbjSg=; b=YNpm1d0EUKqoVhpBh7Srq5W2b54+myZR8VGzga8ywBGWMMVxW1jLk9+I24qnfsTTxFe MLUnlZRXnCtxzs4CSzXs6IxjLiaZjYYZuB+zZO4teqAmozairWMPL7+6quUcdEySolUOf 2BU4QkFeT/fl7rjmfhrfypyGLpAd9Tr1kxA=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<5d7d38b18271fcc7aa750604eeb52bbd.squirrel@xxxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<20111108224414.83985CF73A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <3c097da8e49a42af1210e4ffcd39fd48.squirrel@xxxxxxxxxxxxxxxxxxxxxxxx> <20111109070927.GB26154@xxxxxxxxx> <0bb01a4d216a68c4ae8441b037927f61.squirrel@xxxxxxxxxxxxxxxxxxxxxxxx> <20111109221148.GA17166@xxxxxxxxx> <5d7d38b18271fcc7aa750604eeb52bbd.squirrel@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Mutt/1.5.21.rev5535 (2011-07-01)

On Wed, Nov 09, Andres Lagar-Cavilla wrote:

> Olaf,
> > On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> >
> >> After a bit of thinking, things are far more complicated. I don't think
> >> this is a "race." If the pager removed a page that later gets scheduled
> >> by
> >> the guest OS for IO, qemu will want to foreign-map that. With the
> >> hypervisor returning ENOENT, the foreign map will fail, and there goes
> >> qemu.
> >
> > The tools are supposed to catch ENOENT and try again.
> > linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
> > appears to do that as well. What code path uses qemu that leads to a
> > crash?
> 
> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> it isn't on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using?

I'm running SLES11 as dom0. Now thats really odd that there is no ENOENT
handling in mainline, I will go and check the code.

> And for backend drivers implemented in the kernel (netback, etc), there is
> no retrying.

A while ago I fixed the grant status handling, perhaps that change was
never forwarded to pvops, at least I didnt do it at that time.

> I'm using 24066:54a5e994a241. I start windows 7, make xenpaging try to
> evict 90% of the RAM, qemu lasts for about two seconds. Linux fights
> harder, but qemu also dies. No pv drivers. I haven't been able to trace
> back the qemu crash (segfault on a NULL ide_if field for a dma callback)
> to the exact paging action yet, but no crashes without paging.

If the kernel is pvops it may need some audit to check the ENOENT
handling.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature