Re: [Xen-devel] Readonly memory for guest domain

To:	"Ian Campbell" <Ian.Campbell@xxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] Readonly memory for guest domain
From:	"Peter Teoh" <htmldeveloper@xxxxxxxxx>
Date:	Thu, 13 Sep 2007 22:46:58 +0800
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Thu, 13 Sep 2007 07:47:20 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=TcZacVQApCZ7uj3B0BtFXbvNGGyC4p39GBC6eLmFjik=; b=p72B5jM8H51NsLVwZEWRQgZPBZKVu8f+DdCLlKXjTO8u10VJi/z1ZK9iHQLrPqpkPO7h1Ui3oAkpZOt07oi2nSOEtWRJG1bV8Vsj5hPLaEDL8auWbR4pLkgmQXBlwVK4nAwqKqgkyS05QcV/wycR13Mwn6dfM5NxG8D3GkUNJt0=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=XrGDguYYoaCcHk4eQeS01Xymn0UMisMi462Cl2XtmcB3xnQW1pOoIK3J8NB2jmO9f49qRM/lDrIIrnQtuFywwwiN87qTECuzZddVm92q6S+NsmIXFx0kAoXtAJWCYePF+o8xxj2R4PFgvIim4TDBUHHs2eNj+9wuTvYG3wA5HaU=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<1189669425.3951.31.camel@xxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<00ca01c7f4db$69d991f0$9a010a0a@eeyore> <C30D53BF.D73B%Keir.Fraser@xxxxxxxxxxxx> <804dabb00709121859te561d2cjdfeac95876b9778@xxxxxxxxxxxxxx> <6bc632150709122140j7a8e1dddq9e8d72b7dd74a5c6@xxxxxxxxxxxxxx> <804dabb00709122336t13bad588pbec028db12a8f576@xxxxxxxxxxxxxx> <1189669425.3951.31.camel@xxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Thank you Ian, Pradeep, and Keir for all the answers. Just a few more questions to confirm my understanding:

On 9/13/07, Ian Campbell < Ian.Campbell@xxxxxxxxxxxxx> wrote:

>         > Thank you for the answer.   In the first place, we will not
>         know what is
>         > pagetable or non-pagetable memory.   For example, during
>         dom0/domU
>         > initialisation, the guest OS will query the e820 bios
>         mechanism for physical
>         > memory  availability, and the guest OS (paravirt or HVM)
>         will then assign
>         > different parts of the physical memory for pagetable
>         construction.

I guessed this part is wrong - ie, PV will not have the luxury of having the entire range of contiguous physical memory. Since the actual pagetable to be used will be stored in guest memory, to minimize copying, what the guest see in the pagetable, will also be the real value to be used in MMU operation. Correct?

Then

>         > after all the pagetable is completely constructed, the CR3
>         is loaded, which
>         > started the hardware MMU operation.    So therefore, before
>         the CR3 is
>         > loaded the entire physical memory is marked as readonly, and
>         after the CR3
>         > is loaded, only those memory not involved in pagetable
>         mapping are unmarked
>         > readonly?
>         >
>         > Does not seem right, as guest OS can change the CR3 anytime
>         subsequently as
>         > well.
>
>         Any writes to CR3 'll be trapped to the Xen itself AFAIK. So,
>         yes any
>         guest can change the CR3 anytime but there is always Xen to
>         see what
>         it is writing in the CR3 .Anything beyond the memory assigned
>         to
>         domain is illegal, xen knows the limits of the domains.
>
> This part I fully understand.   But the guest OS, knowing that he owns
> the entire memory range, will attempt to partition the entire blocks
> of memory in any design he wants to - whether it be pagetable memories
> or not.   And so the contents in memory can be anything, there is no
> concept of "invalid frame number" to the guest OS, and will remain as
> what the guest OS has written - no change, ie hypervisor cannot change
> its content.
>
> But the hypervisor will implement a shadow memory (apologies if I am
> wrong, just describing based on the all the materials I have read so
> far) - this construction (done in hypervisor) is triggered immediately
> upon loading of CR3 by the guest.   And the purpose of the shadow
> memory is to rewrite all the pagetable entries in the guest to its
> real/physical values, so that it can be used for pagetable mapping by
> MMU.    This rewriting process is done in hypervisor, based on the
> memory assigned to the guest, and so it has to be ALWAYS valid values.
> It is needed because hypervisor cannot change the content of the guest
> pagetable.   The guest should always be able to write ANYTHING he
> wants to, to his own guest memory.   And the hypervisor will always
> generate the VALID mapping values to put into the shadow memory.
>
> So throughout the entire chain of reasoning, there is no way for the
> guest to corrupt the shadow table in the hypervisor.   The only reason
> I can think of, that pagetable in guest must be made readonly, is so
> that it will trigger the corresponding pagetable update in the shadow
> memory in the hypervisor.   Nothing to do with valid/invalid frames
> numbers here, or "unsafe" values either.   Does it sound logical?
>
> Please correct me if I am wrong.

You need to make it clear whether you are talking about paravirutalised
(PV) or fully-virtualised (HVM) mode guests, they are very different in
this regard.

Apologies for this deep probing again. I don't quite understand why it has to be PV or HVM. As the "load cr3" instruction is a privileged insn, running it at ring1 (PV) will trigger a exception condition, which can be used to update the hypervisor shadow table, if it is implemented, irregardless of HVM (which is SVM or VMX) available or not. Similarly for guest readonly pagetable enforcement - no HVM features is needed here, because it is still running at ring1, and subject to ring0's host control. Please englighten :-).

Perhaps some other operations subsequent to this make the shadow table implementation for PV infeasible? From the paper[0] quoted below, is it due to the high overheads of shadow table implementation in PV scenario?

What you say is roughly true for HVM guests but not PV guests where
there is no shadow mode.

This is something new to me, thanks you for the info.

In the HVM case the shadowing code ensures that guest page-table pages
are marked read-only in the shadowed page tables (the ones actually
loaded into cr3) in order to trap and propagate updates.

For PV guests the guest is required to perform the psuedo-physical to
machine address translation itself. The hypervisor enforces the
invariant that the guest cannot have a writable mapping to a page table
page using the algorithm described in the Xen paper[0], section 3.3.3.
On startup the initial pagetables are marked readonly and the guest has
to make other pages read-only if it wishes to use them as page tables.

[0] http://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Readonly memory for guest domain