WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH 2 of 3] libxl: Add support for passing in the mac

To: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH 2 of 3] libxl: Add support for passing in the machine's E820 for PCI passthrough
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Tue, 10 May 2011 10:56:33 -0400
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
Delivery-date: Tue, 10 May 2011 07:57:27 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1305016190.26692.254.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <patchbomb.1304518670@xxxxxxxxxxxxxxxxxxxxxxx> <ba218fa1a48ed682651f.1304518672@xxxxxxxxxxxxxxxxxxxxxxx> <1305016190.26692.254.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, May 10, 2011 at 09:29:50AM +0100, Ian Campbell wrote:
> On Wed, 2011-05-04 at 15:17 +0100, Konrad Rzeszutek Wilk wrote:
> > # HG changeset patch
> > # User Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > # Date 1304518568 14400
> > # Node ID ba218fa1a48ed682651fd90dd8eeb68eeab68e7a
> > # Parent  b6af9b428bb16c4c5364ace0617923ffa44ad887
> > libxl: Add support for passing in the machine's E820 for PCI passthrough
> > 
> > The code that populates E820 is unconditionally triggered by the guest
> > configuration having "pci=['<BDF>,..']", being a PV guest, and if
> > b_info->u.pv.machine_e820 is set.
> > 
> > The code do_domain_create calls the libxl__e820_alloc when
> > it notices that the guest is PV, has at least one PCI devices, and has
> > the machine_e820 flag set.
> > 
> > libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the 
> > systems
> > E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as
> > well remove any E820_RAM or E820_UNUSED regions as the guest does not need 
> > to
> > know about them. The guest only needs the E820_ACPI, E820_NVS, 
> > E820_RESERVED to
> > get an idea of where the PCI I/O space is. Mostly.. The Linux kernel 
> > assumes that any
> > gap in the E820 is considered PCI I/O space which means that if we pass
> > in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
> > gap between 2GB and 3GB will be considered as PCI I/O space. To guard 
> > against
> > that we also create an E820_UNUSABLE between the region of 'target_kb'
> > (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] 
> > region.
> > Lastly, the xc_domain_set_memory_map is called to install the new E820.
> 
> Do we need to document the requirements this places on a PV guest
> somewhere more prominent?
> 
> Phrases like "up to the first E820...." make me worry that this will
> only work on "sensible" machines, but I suppose we can cross those
> bridges when we come to them...

I did test it with non-sensible machines that have a E820 peppered with
E820_RAM amongs the E820_RESERVED. The last patches makes that work.


> 
> > When tested with another PV guest (NetBSD 5.1) the modified E820 gave
> > it no trouble. The code has also been tested with older "classic" Xen Linux
> > and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid,
> > Debian Squeeze, 2.6.37, 2.6.38, 2.6.39).
> > 
> > Memory that is slack or for balloon (so 'maxmem' in guest configuration)
> > is put behind the machine E820. Which in most cases is after the 4GB.
> > 
> > The reason for doing the fetching of the E820 using the hypercall in
> > the toolstack (instead of the guest doing it) is that when a guest
> > would do a hypercall to 'XENMEM_machine_memory_map' it would
> > retrieve an E820 with I/O range caps added in. Meaning that the
> > region after 4GB up to end of possible memory would be marked as unusable
> > and the kernel would not have any space to allocate a balloon
> > region.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > 
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl.idl
> > --- a/tools/libxl/libxl.idl Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl.idl Wed May 04 10:16:08 2011 -0400
> > @@ -180,6 +180,7 @@ libxl_domain_build_info = Struct("domain
> >                                          ("cmdline", string),
> >                                          ("ramdisk", libxl_file_reference),
> >                                          ("features", string, True),
> > +                                        ("machine_e820", bool, False, "Use 
> > machine's E820 for PCI passthrough."),
> >                                          ])),
> >                   ])),
> >      ],
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_create.c
> > --- a/tools/libxl/libxl_create.c    Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl_create.c    Wed May 04 10:16:08 2011 -0400
> > @@ -519,6 +519,14 @@ static int do_domain_create(libxl__gc *g
> >      for (i = 0; i < d_config->num_pcidevs; i++)
> >          libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
> >  
> > +    if (!d_config->c_info.hvm && d_config->b_info.u.pv.machine_e820) {
> > +        int rc = 0;
> > +        rc = libxl__e820_alloc(ctx, domid, d_config);
> 
> rc is pretty comprehensively initialised ;-0

Duh!
> 
> > +        if (rc)
> > +            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> > +                      "Failed while collecting E820 with: %d (errno:%d)\n",
> > +                      rc, errno);
> 
> LIBXL__LOG_ERRNO takes care of logging errno for you.

OK.
> 
> > +    }
> >      if ( cb && (d_config->c_info.hvm || d_config->b_info.u.pv.bootloader 
> > )) {
> >          if ( (*cb)(ctx, domid, priv) )
> >              goto error_out;
> 
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_pci.c
> > --- a/tools/libxl/libxl_pci.c       Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl_pci.c       Wed May 04 10:16:08 2011 -0400
> > @@ -1047,3 +1047,157 @@ int libxl_device_pci_shutdown(libxl_ctx 
> >      free(pcidevs);
> >      return 0;
> >  }
> > +
> > +static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
> > +                         uint32_t *nr_entries,
> > +                         unsigned long map_limitkb,
> > +                         unsigned long balloon_kb)
> > +{
> > +    uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end;
> > +    uint32_t i, idx = 0, nr;
> > +    struct e820entry e820[E820MAX];
> > +
> > +    if (!src || !map_limitkb || !balloon_kb || !nr_entries)
> > +        return ERROR_INVAL;
> > +
> > +    nr = *nr_entries;
> > +    if (!nr)
> > +        return ERROR_INVAL;
> > +
> > +    if (nr > E820MAX)
> > +        return ERROR_NOMEM;
> > +
> > +    /* Weed out anything under 16MB */
> > +    for (i = 0; i < nr; i++) {
> > +      if (src[i].addr > 0x100000)
> 
> 0x100000 is 1MB not 16MB. I'm not sure if the code or the comment is
> correct.

Comment should have said 1MB.
> 
> > +        continue;
> > +
> > +      src[i].type = 0;
> > +      src[i].size = 0;
> > +      src[i].addr = -1ULL;
> > +    }
> > +
> > +    /* Find the lowest and highest entry in E820, skipping over
> > +     * undersired entries. */
> 
>           undesired ?
> 
> > +    start = -1ULL;
> > +    last = 0;
> > +    for (i = 0; i < nr; i++) {
> > +        if ((src[i].type == E820_RAM) ||
> > +            (src[i].type == E820_UNUSABLE) ||
> > +            (src[i].type == 0)) 
> [...]
> >     nr = idx;
> > +
> > +    for (i = 0; i < nr; i++) {
> > +      LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, ":%s\t[%lx -> %lx]",
> > +   e820[i].type == E820_RAM ? "RAM " :
> > +   (e820[i].type == E820_RESERVED ? "RSV " :
> > +    e820[i].type == E820_ACPI ? "ACPI" :
> > +         (e820[i].type == E820_NVS ? "NVS " :
> > +          (e820[i].type == E820_UNUSABLE ? "UNU " : "----"))),
> > +          e820[i].addr >> 12,
> > +         (e820[i].addr + e820[i].size) >> 12);
> 
> This screams out for a e820_type_as_string style function or a lookup
> table, or just about anything else really ;-).

<nods> I had it at some point in the patches, not sure why it got lost.
Let me add it back in.
> Also you can use "%-4s" instead of manually aligning all the strings.

/me nods.
> 
> > +    }
> > +
> > +    /* Done: copy the sanitized version. */
> > +    *nr_entries = nr;
> > +    memcpy(src, e820, nr * sizeof(struct e820entry));
> > +    return 0;
> > +}
> > +
> 
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>