On Wed, 2011-05-04 at 15:17 +0100, Konrad Rzeszutek Wilk wrote:
> # HG changeset patch
> # User Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> # Date 1304518568 14400
> # Node ID ba218fa1a48ed682651fd90dd8eeb68eeab68e7a
> # Parent b6af9b428bb16c4c5364ace0617923ffa44ad887
> libxl: Add support for passing in the machine's E820 for PCI passthrough
>
> The code that populates E820 is unconditionally triggered by the guest
> configuration having "pci=['<BDF>,..']", being a PV guest, and if
> b_info->u.pv.machine_e820 is set.
>
> The code do_domain_create calls the libxl__e820_alloc when
> it notices that the guest is PV, has at least one PCI devices, and has
> the machine_e820 flag set.
>
> libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the systems
> E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as
> well remove any E820_RAM or E820_UNUSED regions as the guest does not need to
> know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED
> to
> get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes
> that any
> gap in the E820 is considered PCI I/O space which means that if we pass
> in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
> gap between 2GB and 3GB will be considered as PCI I/O space. To guard against
> that we also create an E820_UNUSABLE between the region of 'target_kb'
> (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region.
> Lastly, the xc_domain_set_memory_map is called to install the new E820.
Do we need to document the requirements this places on a PV guest
somewhere more prominent?
Phrases like "up to the first E820...." make me worry that this will
only work on "sensible" machines, but I suppose we can cross those
bridges when we come to them...
> When tested with another PV guest (NetBSD 5.1) the modified E820 gave
> it no trouble. The code has also been tested with older "classic" Xen Linux
> and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid,
> Debian Squeeze, 2.6.37, 2.6.38, 2.6.39).
>
> Memory that is slack or for balloon (so 'maxmem' in guest configuration)
> is put behind the machine E820. Which in most cases is after the 4GB.
>
> The reason for doing the fetching of the E820 using the hypercall in
> the toolstack (instead of the guest doing it) is that when a guest
> would do a hypercall to 'XENMEM_machine_memory_map' it would
> retrieve an E820 with I/O range caps added in. Meaning that the
> region after 4GB up to end of possible memory would be marked as unusable
> and the kernel would not have any space to allocate a balloon
> region.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>
> diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl.idl
> --- a/tools/libxl/libxl.idl Wed May 04 10:16:08 2011 -0400
> +++ b/tools/libxl/libxl.idl Wed May 04 10:16:08 2011 -0400
> @@ -180,6 +180,7 @@ libxl_domain_build_info = Struct("domain
> ("cmdline", string),
> ("ramdisk", libxl_file_reference),
> ("features", string, True),
> + ("machine_e820", bool, False, "Use
> machine's E820 for PCI passthrough."),
> ])),
> ])),
> ],
> diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_create.c
> --- a/tools/libxl/libxl_create.c Wed May 04 10:16:08 2011 -0400
> +++ b/tools/libxl/libxl_create.c Wed May 04 10:16:08 2011 -0400
> @@ -519,6 +519,14 @@ static int do_domain_create(libxl__gc *g
> for (i = 0; i < d_config->num_pcidevs; i++)
> libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
>
> + if (!d_config->c_info.hvm && d_config->b_info.u.pv.machine_e820) {
> + int rc = 0;
> + rc = libxl__e820_alloc(ctx, domid, d_config);
rc is pretty comprehensively initialised ;-0
> + if (rc)
> + LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> + "Failed while collecting E820 with: %d (errno:%d)\n",
> + rc, errno);
LIBXL__LOG_ERRNO takes care of logging errno for you.
> + }
> if ( cb && (d_config->c_info.hvm || d_config->b_info.u.pv.bootloader )) {
> if ( (*cb)(ctx, domid, priv) )
> goto error_out;
> diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_pci.c
> --- a/tools/libxl/libxl_pci.c Wed May 04 10:16:08 2011 -0400
> +++ b/tools/libxl/libxl_pci.c Wed May 04 10:16:08 2011 -0400
> @@ -1047,3 +1047,157 @@ int libxl_device_pci_shutdown(libxl_ctx
> free(pcidevs);
> return 0;
> }
> +
> +static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
> + uint32_t *nr_entries,
> + unsigned long map_limitkb,
> + unsigned long balloon_kb)
> +{
> + uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end;
> + uint32_t i, idx = 0, nr;
> + struct e820entry e820[E820MAX];
> +
> + if (!src || !map_limitkb || !balloon_kb || !nr_entries)
> + return ERROR_INVAL;
> +
> + nr = *nr_entries;
> + if (!nr)
> + return ERROR_INVAL;
> +
> + if (nr > E820MAX)
> + return ERROR_NOMEM;
> +
> + /* Weed out anything under 16MB */
> + for (i = 0; i < nr; i++) {
> + if (src[i].addr > 0x100000)
0x100000 is 1MB not 16MB. I'm not sure if the code or the comment is
correct.
> + continue;
> +
> + src[i].type = 0;
> + src[i].size = 0;
> + src[i].addr = -1ULL;
> + }
> +
> + /* Find the lowest and highest entry in E820, skipping over
> + * undersired entries. */
undesired ?
> + start = -1ULL;
> + last = 0;
> + for (i = 0; i < nr; i++) {
> + if ((src[i].type == E820_RAM) ||
> + (src[i].type == E820_UNUSABLE) ||
> + (src[i].type == 0))
[...]
> nr = idx;
> +
> + for (i = 0; i < nr; i++) {
> + LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, ":%s\t[%lx -> %lx]",
> + e820[i].type == E820_RAM ? "RAM " :
> + (e820[i].type == E820_RESERVED ? "RSV " :
> + e820[i].type == E820_ACPI ? "ACPI" :
> + (e820[i].type == E820_NVS ? "NVS " :
> + (e820[i].type == E820_UNUSABLE ? "UNU " : "----"))),
> + e820[i].addr >> 12,
> + (e820[i].addr + e820[i].size) >> 12);
This screams out for a e820_type_as_string style function or a lookup
table, or just about anything else really ;-).
Also you can use "%-4s" instead of manually aligning all the strings.
> + }
> +
> + /* Done: copy the sanitized version. */
> + *nr_entries = nr;
> + memcpy(src, e820, nr * sizeof(struct e820entry));
> + return 0;
> +}
> +
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|