Re: [Xen-devel] [PATCH 0/5] x86: properly propagate errors to hy

To:	Jan Beulich <JBeulich@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] [PATCH 0/5] x86: properly propagate errors to hypercall callee
From:	Keir Fraser <keir.xen@xxxxxxxxx>
Date:	Wed, 09 Mar 2011 13:44:11 +0000
Cc:
Delivery-date:	Wed, 09 Mar 2011 05:45:10 -0800
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:user-agent:date:subject:from:to:message-id :thread-topic:thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; bh=QjnERJH8Jc40an6ce2mhtmK+ACtdkDw3IHqbl9BVqNw=; b=Vd8eEIMEqF/AL0EepZJnzj1AgpdE7DE4PppHGSS4MQWCBDpqJKbhg1TEdds5tUH99f kl2Wgvo8XdT/cm/w2hbYU7v0DeLb0Ml6iwu/UHP2WJIKBsNjDs2rqEudKvL689Ji6el3 26LTOf5Y5woZ/QLTm5OxaMFTG5Y07HnThrQrg=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=user-agent:date:subject:from:to:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; b=hWotKHJJeOt3gJYRu/w7t1Mp7zQOAe9EmydVns4Q/mnTbanCa0/JWBBlfyVeC20KQh kcbT3qJci1t89n1Nuq4iCPU78JIFrIZHNJxWcN0O0mOeBeHE52XpHGDvhJ6H1M0fA2+N gnIOVjRdGisRQlC/DbExfJEsna5OTNbNhav58=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<4D7770D802000078000357C9@xxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcveYBH5v4kjIIxUvUWShGGPe8eC4g==
Thread-topic:	[Xen-devel] [PATCH 0/5] x86: properly propagate errors to hypercall callee
User-agent:	Microsoft-Entourage/12.28.0.101117

On 09/03/2011 11:21, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:

>>>> On 09.03.11 at 12:07, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
>> On 09/03/2011 10:53, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>> 
>>> This patch set makes it so that not only the offending BUG() gets
>>> eliminated, but also properly propagates the error to the guest,
>>> so that the latter can take action (which will itself require quite
>>> some changes to prevent crashing the guest in that situation,
>>> particularly where utilizing Xen's writeable page table support).
>> 
>> Presumably this is from shattering superpage mappings when per-page cache
>> attributes change in response to a guest mapping a page with, for example,
>> non-WB attributes?
> 
> Correct - observed with the radeon drm driver.
> 
>> It seems unfortunate to propagate this to guests. Perhaps we should be
>> making a memory pool for Xen's 1:1 mappings, big enough to allow a 4kB
>> mapping of every page of RAM in the system, and allocate/free pagetables to
>> that pool? The overhead of this would be no more than 0.2% of system memory,
>> which seems reasonable to avoid an error case that is surely hard for a
>> guest to react to or fix.
> 
> I considered this too, but wasn't convinced that's a good thing to
> do, no matter that the overhead is only a very small percentage.
> After all, one of the two points to make use of superpages is to
> not waste memory needlessly on page tables, the more that on a
> typical server you'd unlikely see many RAM pages get non-WB
> caching attributes set on them.
> 
> That said, I nevertheless agree (and attempted to indicate that way
> in the description) that the kernel side changes may be non-trivial.
> Otoh, keeping the logic simple in Xen may also be beneficial.

The further problem is that this would change the guest interface from
pagetable updates from 'guaranteed to succeed if you have not made a
guest-side mistake' to 'may spuriously fail in rare cases'. That's a big
difference!

I wonder what the scope of the problem really is. Mostly this cacheattr
stuff applies to memory allocated by a graphics driver I suppose, and
probably at boot time in dom0. I wonder how the bug was observed during dom0
boot given that Xen chooses a default dom0 memory allocation that leaves
enough memory free for a decent-sized dom0 SWIOTLB plus some extra slack on
top of that. Any idea how the Xen memory pool happened to be entirely empty
at the time radeon drm driver caused the superpage shattering to occur?

I'm not against turning the host crash into a guest crash (which I think is
typically what is going to happen, although I suppose at least some Linux
driver-related mapping/remapping functions can handle failure) as this might
be an improvement when starting up non-dom0 driver domains for example. But
I think we should consider punting a resource error up to the guest as a
very bad thing and still WARN_ON or otherwise log the situation to Xen's own
console.

 -- Keir

> And, not the least, even if you indeed want to go with the pool
> approach you suggest, the changes in this patch set are likely
> good to have independently - just that they wouldn't need
> backporting to 4.1 and 4.0 if they turn out to be mere cleanup.
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [PATCH 0/5] x86: properly propagate errors to hypercall