This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] [PATCH] Under Xen, consider E820 non-RAM and E820 gaps as id

To: linux-kernel@xxxxxxxxxxxxxxx, Jeremy Fitzhardinge <jeremy@xxxxxxxx>, hpa@xxxxxxxxx, Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Subject: [Xen-devel] [PATCH] Under Xen, consider E820 non-RAM and E820 gaps as identity (1-1) mappings in the P2M.
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Mon, 10 Jan 2011 12:17:32 -0500
Cc: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Jan Beulich <JBeulich@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
Delivery-date: Mon, 10 Jan 2011 09:22:20 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Please see attached the patches that augment how Xen MMU deals with
PFNs that point to physical devices (PCI BARS, and such).

Short summary: No need to troll through code to add VM_IO on mmap paths

Long summary:
Under Xen MMU we would distinguish two different types of PFNs in
the P2M tree: real MFN, INVALID_P2M_ENTRY (missing PFN - used for ballooning).
If there was a device which PCI BAR was within the P2M, we would look
at the flags and if _PAGE_IOMAP was passed we would just return the PFN without
consulting the P2M. We have a patch (and some auxiliary for other subsystems)
that sets this:
 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas

This patchset proposes a different way of doing this where the patch
above and the other auxiliary ones will not be necessary.

This approach is the one that H. Peter Anvin, Jeremy Fitzhardinge, Ian Campbell
suggested. The mechanism is to think of the E820 non-RAM entries and E820 gaps
in the P2M tree structure as identity (1-1) mapping. Many thanks
to Ian Campbell for looking in details  at the patches and asking quite 

In the past we used to think of those regions as "missing" and under the 
of the balloon code. But the balloon code only operates on a specific regions. 
region is in last E820 RAM page (basically any region past nr_pages is 
considered balloon
type page). [Honesty compels me to say that during run-time the balloon code
could own pages in different regions, but we do not have to worry about that as 
works OK and we only have to worry about the bootup-case]

Gaps in the E820 (which are usually considered to PCI BAR spaces) would end up
with the void entries and point to the "missing" pages.

This patchset finds the ranges of non-RAM E820 entries and gaps and
marks them as as "identity". So for example, for this E820:

                    1GB                                           2GB
 /-------------------+---------\               /----------\    /---+-----\
 | System RAM        | Sys RAM |               | reserved |    | Sys RAM |
 \-------------------+---------/               \----------/    \---+-----/
                               ^- 1029MB                       ^- 2001MB

The identity range would be from 1029MB to 2001MB.

Since the E820 gaps could cross P2M level boundaries (keep in mind that the
P2M structure is a 3-level tree, first level covers 1GB, next down 4MB,
and then each page) we might have to allocate extra pages to handle those
violators.  For large regions (1GB) we create a
page which holds pointers to a shared "p2m_identity" page. For smaller regions
if necessary we create pages wherein we can mark PFNs as 1-1 mapping, so:

The two attached diagrams crudely explain how we are doing this. "P2M story"
is how the P2M is constructed and setup with balloon pages. The "P2M with 1-1.."
is how we insert the identity mappings in the P2M tree.

Also, the first patch "xen/mmu: Add the notion of identity (1-1) mapping."
has an exhaustive explanation.

For the balloon pages, the setting of the "missing" pages is mostly already 
The initial case of carving the last E820 region for balloon ownership is 
to set those PFNs to missing and we also change the balloon code to be more

This patchset is also available under git:

Further work (once ACPI S3 suspend works):
Also filter out _PAGE_IOMAP on entries that are System RAM (happens after ACPI 
S3 suspend with
radeon/nouveau drivers).  Right now we just WARN_ON on them if CONFIG_XEN_DEBUG 
is set.

Changelog: [since v3, not posted]
 - Made the passing of identity PFNs much simpler and cleaner.
 - Expanded the commit description.

[since v2 https://lkml.org/lkml/2010/12/30/163]
 - Added Reviewed-by.
 - Squashed some patches together..
 - Replaced p2m_mid_identity with using reserved_brk to allocate top
   identity entries. This protects us from non 1GB boundary conditions.
 - Expanded the commit descriptions.

[since v1 https://lkml.org/lkml/2010/12/21/255]:
 - Diagrams of P2M included.
 - More conservative approach used (memory that is not populated or 
   identity is considered "missing", instead of as identity).
 - Added IDENTITY_PAGE_FRAME bit to uniquely identify 1-1 mappings.
 - Optional debugfs file (xen/mmu/p2m) to print out the level and types in
   the P2M tree.
 - Lots of comments - if I missed any please prompt me.

Along with the devel/ttm.pci-api.v3, I've been able to boot Dom0 on a variety
of PCIe type graphics hardware with X working (G31M, ATI ES1000, GeForce 6150SE,
HD 4350 Radeon, HD 3200 Radeon, GeForce 8600 GT). That test branch is located
at devel/fix-amd-bootup if you are curious.

Xen-devel mailing list