WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

To: keir.xen@xxxxxxxxx
Subject: [Xen-devel] Re: Need help with fixing the Xen waitqueue feature
From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>
Date: Tue, 8 Nov 2011 19:52:31 -0800
Cc: olaf@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 08 Nov 2011 19:53:03 -0800
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=lagarcavilla.org; h= message-id:in-reply-to:references:date:subject:from:to:cc :reply-to:mime-version:content-type:content-transfer-encoding; s=lagarcavilla.org; bh=b6dtQ0DfcYHqze6G4OtBFtbhd6g=; b=tVXQyTZG gjyFOgQevqEYZHjP5UtihmOfelOc7GAzvuKj32xk7J4UX+gXq5rBve+dXPfn8hBy YYuadgyUUSZGoqZHCzMTDWSTaVP7bEe12CqFEievpVJ5N2+RGb9cw2iHhUWhLiQJ xpYlpmIJn4yoVxqz8XbfZ+qj+RmXPQa3320=
Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=HW60HBAjPi84t/0/9Wc0td4sisww61Ka265EKUBXKTsG Tnq96nirJy+zj0Fwatfe+oVH3C6K6gamg3x2PfyLIT0S+mmvavH4ywsScu3oswGI kWocWu6QTvguYY4+OTk9N92aFiy56zKy77y0LA7gsQJRPXzdRJ+rJISWTYSvQ9s=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20111108224414.83985CF73A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20111108224414.83985CF73A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Reply-to: andres@xxxxxxxxxxxxxxxx
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: SquirrelMail/1.4.21
> Date: Tue, 08 Nov 2011 22:05:41 +0000
> From: Keir Fraser <keir.xen@xxxxxxxxx>
> Subject: Re: [Xen-devel] Need help with fixing the Xen waitqueue
>       feature
> To: Olaf Hering <olaf@xxxxxxxxx>,     <xen-devel@xxxxxxxxxxxxxxxxxxx>
> Message-ID: <CADF5835.245E1%keir.xen@xxxxxxxxx>
> Content-Type: text/plain;     charset="US-ASCII"
>
> On 08/11/2011 21:20, "Olaf Hering" <olaf@xxxxxxxxx> wrote:
>
>> Another thing is that sometimes the host suddenly reboots without any
>> message. I think the reason for this is that a vcpu whose stack was put
>> aside and that was later resumed may find itself on another physical
>> cpu. And if that happens, wouldnt that invalidate some of the local
>> variables back in the callchain? If some of them point to the old
>> physical cpu, how could this be fixed? Perhaps a few "volatiles" are
>> needed in some places.
>
>>From how many call sites can we end up on a wait queue? I know we were
>> going
> to end up with a small and explicit number (e.g., in __hvm_copy()) but
> does
> this patch make it a more generally-used mechanism? There will unavoidably
> be many constraints on callers who want to be able to yield the cpu. We
> can
> add Linux-style get_cpu/put_cpu abstractions to catch some of them.
> Actually
> I don't think it's *that* common that hypercall contexts cache things like
> per-cpu pointers. But every caller will need auditing, I expect.

Tbh, for paging to be effective, we need to be prepared to yield on every
p2m lookup.

Let's compare paging to PoD. They're essentially the same thing: pages
disappear, and get allocated on the fly when you need them. PoD is a
highly optimized in-hypervisor optimization that does not need a
user-space helper -- but the pager could do PoD easily and remove all that
p2m-pod.c code from the hypervisor.

PoD only introduces extraneous side-effects when there is a complete
absence of memory to allocate pages. The same cannot be said of paging, to
put it mildly. It returns EINVAL all over the place. Right now, qemu can
be crashed in a blink by paging out the right gfn.

To get paging to where PoD is, all these situations need to be handled in
a manner other than returning EINVAL. That means putting the vcpu on a
waitqueue on every location p2m_pod_demand_populate is called, not just
__hvm_copy.

I don't know that that's gonna be altogether doable. Many of these gfn
lookups happen in atomic contexts, not to mention cpu-specific pointers.
But at least we should aim for that.

Andres
>
> A sudden reboot is very extreme. No message even on a serial line? That
> most
> commonly indicates bad page tables. Most other bugs you'd at least get a
> double fault message.
>
>  -- Keir
>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel