This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] paging mechanism clarification

To: "Petersson, Mats" <Mats.Petersson@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] paging mechanism clarification
From: "Pradeep Singh, TLS-Chennai" <pradeep_s@xxxxxx>
Date: Mon, 12 Mar 2007 18:08:34 +0530
Delivery-date: Mon, 12 Mar 2007 05:41:51 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <907625E08839C4409CE5768403633E0B018E1A4A@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcdkaxdIRFOb74LzRkumlKpf+mzsXwAI19lgAAJ5aFkAALASYAABx5El
Thread-topic: [Xen-devel] paging mechanism clarification

-----Original Message-----
From: Petersson, Mats [mailto:Mats.Petersson@xxxxxxx]
Sent: Mon 12-Mar-07 5:35 PM
To: Pradeep Singh, TLS-Chennai; xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] paging mechanism clarification

> -----Original Message-----
> From: Pradeep Singh, TLS-Chennai [mailto:pradeep_s@xxxxxx]
> Sent: 12 March 2007 11:34
> To: Petersson, Mats; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] paging mechanism clarification
> -----Original Message-----
> From: Petersson, Mats [mailto:Mats.Petersson@xxxxxxx]
> Sent: Mon 12-Mar-07 4:13 PM
> To: Pradeep Singh, TLS-Chennai; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] paging mechanism clarification
> > -----Original Message-----
> > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > Pradeep Singh, TLS-Chennai
> > Sent: 12 March 2007 05:56
> > To: xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: [Xen-devel] paging mechanism clarification
> >
> >
> > Hi All,
> >
> > The Xen uses 2 level Paging Mechanism to resolve the Virtual
> > Addresses into frame numbers from domU.The first level is
> > taken care by the MMU for the domU i.e translation from
> > virtual address to the physical address just like normal
> > paging mechanism. The second level of translation is done by
> > the Xen hypervisor.It translates the pseduo physical address
> > recieved from the domU and treats it as a normal virtual
> > address and finds the page frame using regualr paging mechanism.
> No, not in the current model.
> Does it include xen-3.0.3 also?
> I hope it is true for whole xen3 series?

Yes, 3.0.3 is very similar (if not exactly the same) as 3.0.5 (currently
called "unstable").
> The paging in HVM (fully virtualized domains) is managed by the Shadow
> paging, which simplified works like this:
> When paging is disabled in the guest, still enable paging in the
> processor and give a CR3 to the processor that points to a
> map of where
> the guest memory is.
> Mats, when paging is disabled in guest( which i guess is the
> case during booting of the Xen domU ), how is this related to
> the paging on the processor?
> Even if guest is booting or paging is disabled in the guest,
> hypervisor should be free from this non-paging guest
> instance.How is paging in the hypervisor dependent on the
> paging on the guest?

When the system boots, the processor is normally in "real-mode", and
it's definitely not got paging enabled. So we have to "make the guest OS
believe this is the case". But at the same time, the guest OS is most
likely not loaded at address zero in memory, so we need paging enabled
to remap the GUEST PHYSICAL address to match the machine physical
address. So we have a "linear map" to translate the "address zero" to
the "start of guest memory", and so on for every page of memory in the

This is not hard to do, since the AMD-V/VT feature of the processor
expects the paging-bit to be different between what the guest "thinks"
and the actual case. In the AMD-V, there's even support to run real-mode
with paging enabled, so all the BIOS-code and such will be running in
this mode. VT has to do a bunch of tricky stuff to work around that

Ok fine, does this argument holds true for even non-VT and non-Pacifica enabled processors?
I doubt it.

> I hope i made myself clear.
> Please enlighten me :-).
> When paging is enabled, we use a shadow page-table, which is
> essentially
> that the GUEST sees one page-table, and the processor another
> (thanks to
> the fact that the hypervisor intercepts the CR3 read/write operations,
> and when CR3 is read back by the guest, we don't send back the value
> it's ACTUALLY POINTING TO IN THE PROCESSOR, but the value that was set
> by the guest). So there are two page-tables.
> Got this well, thanks Mats :).
> To make the page-table updates by the guest visible to the hypervisor,
> all of the guest-page-tables are made read-only (by scanning
> the new CR3
> value whenever one is set).
> I didn't get this either well :(
> sorry, but do you mean CR3 for the guest or for the
> processor? i hope you mean guest?

Yes, scan the guest-CR3 to see where it placed the page-tables.

> Whenever a page-fault happens, the hypervisor has "first look", and
> determines if the update is for a page-table or not. If it is a
> page-table update, the guest operation is emulated (in x86_emulate.c),
> and the result is written to the shadow-page-table AND the
> Why do we need emulation?some peculiar reason for emulating?
> Do you mean to say if i am running a 32 bit domU on top of a
> 64 bit processor, the guest operation for updating the page
> table is emulated by the hypervisor.am i right?

No, it's simply because we need to see the result of the instruction and
write it to two places (with some modification in one of those places).
So if the code is doing, for example: "*pte |= 1;" (set a
page-table-entry to "present"), we need to mark both the
guest-page-table-entry to "present", and mark our shadow-entry "present"
(and perhaps do some other work too, but that's the minimum work

This brings one more question in my mind.Why do we use pinning then?
As i see at it.To avoid shadow page tables to be swapped out before the page tables they actually point to are swapped.Am i right?

But according to interface manual,-> to bind a vcpu to a specific CPU in a SMP environment we use pining.But these two look pretty orthogonal statements to me, which means i may be wrong :(.
Can somebody help me in this regard?

Pointers to actual code will be of great help.

Thanks a lot Mats.
Thank you all.

> Does this means on a x86 platform this overkill or this
> emulation is skipped altogether?
> Please bear with me as i am an absolute Xen newbie out here :-).

No, it's ALWAYS used for all page-table writes, as far as I understand.

> guest-page-table, but in the shadow-page-table, the value is
> modified to
> reflect the actual address in machine-space, rather than what
> the guest
> thinks it should be.
> In futuer versions of AMD processors (and I believe Intel are
> working on
> something very similar if not the same), there will be a mode
> where the
> processor is able to work in "nested paging mode", which means that
> there are two "parallel" page-tables. The first one is the
> "guest-page-table", the second one is the "host-page-table". In this
> case, every lookup in the guest-page-table will be done through the
> host-page-table. So we have a "simple" way to just take the
> guest-page-table and translate it to machine-physical-address
> - with the
> good thing that the host-page-table needn't change, since the
> pages that
> the host consists of is pretty much static for the duration of the
> guest.
> Yes, read about about this in an article mention how Pacifica
> is better than VT.
> Say for example, we have a guest that lives at 256-512MB. The
> guest-page-table would contain, for example, a mapping for
> 0x12200000 ->
> guest-physical 0x100000 (1MB). The host-page-table translates this to
> 0x10100000 because the 1MB entry in guest-address is 256+1MB in
> machine-address.
> Exactly, got this well on spot :).
> [In reality, it's very likely that the guest never gets all
> the space in
> one big chunk, but rather a few pages here and a few pages there. If
> there are big chunks, we could use large pages to map those!].
> Thanks a ton Mats and all.
> --pradeep
> The support for nested paging (called HAP, Hardware Assisted
> Paging) is
> in the Unstable version of Xen since a few days back.
> --
> Mats
> >
> > And this whole 2 level paging consitutes Xen's shadow page
> > tables. Right?
> >
> > Is my understanding of Xen's paging mechanism correct?or am i
> > missing something?
> >
> > Thank you
> >
> > -pradeep
> >
> >


The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the 
opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is 
strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any mail and 
attachments please check them for viruses and defect.

Xen-devel mailing list