WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH 6 of 6] xl: implement multifunction device PCI pa

To: "Gianni Tedesco (3P)" <gianni.tedesco@xxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH 6 of 6] xl: implement multifunction device PCI pass-through
From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
Date: Wed, 4 Aug 2010 16:47:05 +0100
Cc: Xen Devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
Delivery-date: Wed, 04 Aug 2010 08:49:11 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1280934462.18490.204.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <patchbomb.1280761122@xxxxxxxxxxxxxxxxxxxxxx> <6379343447e58cc7b83e.1280761128@xxxxxxxxxxxxxxxxxxxxxx> <alpine.DEB.2.00.1008041543320.19809@kaball-desktop> <1280934462.18490.204.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Alpine 2.00 (DEB 1167 2008-08-23)
On Wed, 4 Aug 2010, Gianni Tedesco (3P) wrote:
> On Wed, 2010-08-04 at 15:52 +0100, Stefano Stabellini wrote:
> > On Mon, 2 Aug 2010, Gianni Tedesco (3P) wrote:
> > >  tools/libxl/libxl.h     |    1 +
> > >  tools/libxl/libxl_pci.c |  118 
> > > ++++++++++++++++++++++++++++++++++++++++++++---
> > >  2 files changed, 111 insertions(+), 8 deletions(-)
> > > 
> > > 
> > > # HG changeset patch
> > > # User Gianni Tedesco <gianni.tedesco@xxxxxxxxxx>
> > > # Date 1280760698 -3600
> > > # Branch pci-patches-v3
> > > # Node ID 6379343447e58cc7b83e27beba9c35e62c232d9d
> > > # Parent  eb8ee481bc063b18b39feaa76d9b757331ed3a9f
> > > xl: implement multifunction device PCI pass-through
> > > 
> > > Implement PCI pass-through for multi-function devices. The supported BDF
> > > notation is: BB:DD.* - therefore passing-through a subset of functions or
> > > remapping the function numbers is not supported. Largely because I don't 
> > > see
> > > how that can ever be safe or sane (perhaps for SR-IOV cards?)
> > > 
> > > Letting qemu automatically select the virtual slot is not supported since 
> > > I
> > > don't believe that qemu-dm can actually handle that case.
> > > 
> > > Signed-off-by: Gianni Tedesco <gianni.tedesco@xxxxxxxxxx>
> > > 
> > > diff -r eb8ee481bc06 -r 6379343447e5 tools/libxl/libxl.h
> > > --- a/tools/libxl/libxl.h Mon Aug 02 15:47:38 2010 +0100
> > > +++ b/tools/libxl/libxl.h Mon Aug 02 15:51:38 2010 +0100
> > > @@ -308,6 +308,7 @@ typedef struct {
> > >      unsigned int vdevfn;
> > >      bool msitranslate;
> > >      bool power_mgmt;
> > > +    bool multifunction;
> > >  } libxl_device_pci;
> > >  
> > >  enum {
> > > diff -r eb8ee481bc06 -r 6379343447e5 tools/libxl/libxl_pci.c
> > > --- a/tools/libxl/libxl_pci.c     Mon Aug 02 15:47:38 2010 +0100
> > > +++ b/tools/libxl/libxl_pci.c     Mon Aug 02 15:51:38 2010 +0100
> > > @@ -136,8 +136,13 @@ int libxl_device_pci_parse_bdf(libxl_ctx
> > >                      break;
> > >                  }
> > >                  *ptr = '\0';
> > > -                if ( hex_convert(tok, &func, 0x7) )
> > > -                    goto parse_error;
> > > +                if ( !strcmp(tok, "*") ) {
> > > +                    pcidev->multifunction = 1;
> > > +                }else{
> > > +                    if ( hex_convert(tok, &func, 0x7) )
> > > +                        goto parse_error;
> > > +                    pcidev->multifunction = 0;
> > > +                }
> > >                  tok = ptr + 1;
> > >              }
> > >              break;
> > 
> > Couldn't you add the func_mask somewhere in libxl_device_pci, instead of
> > calling pci_multifunction_check multilple times?
> > Would be possible to make the normal passthrough case just a subset of
> > the general multifunction passthrough case to simplify the code?
> 
> Not sure what you mean by this, the parsing parts don't (and oughtn't)
> frob about with sysfs nodes etc. The pci_multifunction_check function is
> called once per device add and once per device remove iff the parser had
> seen XX:YY.* notation indicating multifunction devices.
> 

I mean we could have a mask instead of an interger in libxl_device_pci
to specify which functions should be assigned and in the single function
case the mask will have only one bit set.
At this point single function passthrough become just a case of
multifunction passthrough.
I am not entirely sure that this approach would produce better code
though.


> > 
> > > @@ -531,6 +536,50 @@ int libxl_device_pci_list_assignable(lib
> > >      return 0;
> > >  }
> > >  
> > > +static int pci_multifunction_check(libxl_ctx *ctx, libxl_device_pci 
> > > *pcidev, unsigned int *func_mask)
> > > +{
> > 
> > a comment on top of this function to explain what it is actually checking 
> > would be nice 
> 
> This function checks that all functions of a device are bound to pciback
> driver. It also initialises a bit-mask of which function numbers are
> present on that device.
> 
> I shall add the comment in next rev.
> 
> > > +    struct dirent *de;
> > > +    DIR *dir;
> > > +
> > > +    *func_mask = 0;
> > > +
> > > +    dir = opendir(SYSFS_PCI_DEV);
> > > +    if ( NULL == dir ) {
> > > +        XL_LOG_ERRNO(ctx, XL_LOG_ERROR, "Couldn't open %s", 
> > > SYSFS_PCI_DEV);
> > > +        return -1;
> > > +    }
> > > +
> > > +    while( (de = readdir(dir)) ) {
> > > +        unsigned dom, bus, dev, func;
> > > +        struct stat st;
> > > +        char *path;
> > > +
> > > +        if ( sscanf(de->d_name, PCI_BDF, &dom, &bus, &dev, &func) != 4 )
> > > +            continue;
> > > +        if ( pcidev->domain != dom )
> > > +            continue;
> > > +        if ( pcidev->bus != bus )
> > > +            continue;
> > > +        if ( pcidev->dev != dev )
> > > +            continue;
> > > +
> > > +        path = libxl_sprintf(ctx, "%s/" PCI_BDF, SYSFS_PCIBACK_DRIVER, 
> > > dom, bus, dev, func);
> > > +        if ( lstat(path, &st) ) {
> > > +            if ( errno == ENOENT )
> > > +                XL_LOG(ctx, XL_LOG_ERROR, PCI_BDF " is not assigned to 
> > > pciback driver",
> > > +                       dom, bus, dev, func);
> > > +            else
> > > +                XL_LOG_ERRNO(ctx, XL_LOG_ERROR, "Couldn't lstat %s", 
> > > path);
> > > +            closedir(dir);
> > > +            return -1;
> > > +        }
> > > +        (*func_mask) |= (1 << func);
> > > +    }
> > > +
> > > +    closedir(dir);
> > > +    return 0;
> > > +}
> > > +
> > >  static int pci_ins_check(libxl_ctx *ctx, uint32_t domid, const char 
> > > *state, void *priv)
> > >  {
> > >      char *orig_state = priv;
> > > @@ -679,10 +728,32 @@ int libxl_device_pci_add(libxl_ctx *ctx,
> > >              return rc;
> > >      }
> > >  
> > > +    if ( pcidev->multifunction ) {
> > > +        unsigned int i, orig_vdev, func_mask, rc = 0;
> > > +        orig_vdev = pcidev->vdevfn & ~7U;
> > > +
> > > +        if ( pci_multifunction_check(ctx, pcidev, &func_mask) )
> > > +            return ERROR_FAIL;
> > > +        if ( !(pcidev->vdevfn >> 3) ) {
> > > +            XL_LOG(ctx, XL_LOG_ERROR, "Must specify a v-slot for 
> > > multi-function devices");
> > > +            return ERROR_FAIL;
> > > +        }
> > > +
> > > +        for(i = 7; i < 8; --i) {
> > 
> > shouldn't this be for(i = 7; i >= 0; --i)?
> 
> That will generate a compiler warning since unsigned int's are always >=
> 0. The code relies on an underflow condition. However, I could change it
> to int if you think that makes it clearer.
> 

yeah please do that

> > > +            if ( (1 << i) & func_mask ) {
> > > +                pcidev->func = i;
> > > +                pcidev->vdevfn = orig_vdev | i;
> > > +                if ( do_pci_add(ctx, domid, pcidev) )
> > > +                    rc = ERROR_FAIL;
> > > +            }
> > > +        }
> > > +        return rc;
> > > +    }
> > > +
> > >      return do_pci_add(ctx, domid, pcidev);
> > >  }
> > >  
> > > -int libxl_device_pci_remove(libxl_ctx *ctx, uint32_t domid, 
> > > libxl_device_pci *pcidev)
> > > +static int do_pci_remove(libxl_ctx *ctx, uint32_t domid, 
> > > libxl_device_pci *pcidev)
> > >  {
> > >      libxl_device_pci *assigned;
> > >      char *path;
> > > @@ -711,10 +782,15 @@ int libxl_device_pci_remove(libxl_ctx *c
> > >          libxl_xs_write(ctx, XBT_NULL, path, PCI_BDF, pcidev->domain,
> > >                         pcidev->bus, pcidev->dev, pcidev->func);
> > >          path = libxl_sprintf(ctx, 
> > > "/local/domain/0/device-model/%d/command", domid);
> > > -        xs_write(ctx->xsh, XBT_NULL, path, "pci-rem", strlen("pci-rem"));
> > > -        if (libxl_wait_for_device_model(ctx, domid, "pci-removed", NULL, 
> > > NULL) < 0) {
> > > -            XL_LOG(ctx, XL_LOG_ERROR, "Device Model didn't respond in 
> > > time");
> > > -            return ERROR_FAIL;
> > > +
> > > +        /* Remove all functions at once atomically by only signalling
> > > +         * device-model for function 0 */
> > > +        if ( !pcidev->multifunction || pcidev->func == 0 ) {
> > > +            xs_write(ctx->xsh, XBT_NULL, path, "pci-rem", 
> > > strlen("pci-rem"));
> > > +            if (libxl_wait_for_device_model(ctx, domid, "pci-removed", 
> > > NULL, NULL) < 0) {
> > > +                XL_LOG(ctx, XL_LOG_ERROR, "Device Model didn't respond 
> > > in time");
> > > +                return ERROR_FAIL;
> > > +            }
> > >          }
> > >          path = libxl_sprintf(ctx, 
> > > "/local/domain/0/device-model/%d/state", domid);
> > >          xs_write(ctx->xsh, XBT_NULL, path, state, strlen(state));
> > > @@ -769,7 +845,10 @@ skip1:
> > >          fclose(f);
> > >      }
> > >  out:
> > > -    libxl_device_pci_reset(ctx, pcidev->domain, pcidev->bus, 
> > > pcidev->dev, pcidev->func);
> > > +    /* don't do multiple resets while some functions are still passed 
> > > through */
> > > +    if ( !pcidev->multifunction || pcidev->func == 0 ) {
> > > +        libxl_device_pci_reset(ctx, pcidev->domain, pcidev->bus, 
> > > pcidev->dev, pcidev->func);
> > > +    }
> > >  
> > >      if (!libxl_is_stubdom(ctx, domid, NULL)) {
> > >          rc = xc_deassign_device(ctx->xch, domid, pcidev->value);
> > > @@ -786,6 +865,29 @@ out:
> > >      return 0;
> > >  }
> > >  
> > > +int libxl_device_pci_remove(libxl_ctx *ctx, uint32_t domid, 
> > > libxl_device_pci *pcidev)
> > > +{
> > > +    if ( pcidev->multifunction ) {
> > > +        unsigned int i, orig_vdev, func_mask, rc;
> > > +        orig_vdev = pcidev->vdevfn & ~7U;
> > > +
> > > +        if ( pci_multifunction_check(ctx, pcidev, &func_mask) )
> > > +            return ERROR_FAIL;
> > > +
> > > +        for(i = 7; i < 8; --i) {
> > 
> > same here
> > 
> > > +            if ( (1 << i) & func_mask ) {
> > > +                pcidev->func = i;
> > > +                pcidev->vdevfn = orig_vdev | i;
> > > +                if ( do_pci_remove(ctx, domid, pcidev) )
> > > +                    rc = ERROR_FAIL;
> > > +            }
> > > +        }
> > > +        return rc;
> > > +    }
> > > +
> > > +    return do_pci_remove(ctx, domid, pcidev);
> > > +}
> > > +
> > >  int libxl_device_pci_list_assigned(libxl_ctx *ctx, libxl_device_pci 
> > > **list, uint32_t domid, int *num)
> > >  {
> > >      char *be_path, *num_devs, *xsdev, *xsvdevfn, *xsopts;
> > > 
> > 
> > 
> > applied all the other patches in the series
> 
> OK thanks.
> 
> I will also need to handle cleanup in the error path of multifunction
> device addition as discussed. This can actually happen quite easily if
> function 0 alone is assigned to another domain because
> pci_multifunction_check() doesn't look in xenstore and also because of
> missing safety checks. Specifically, when adding a non-multifunction
> device we need a check to make sure that the physical device really has
> only one function.
> 
> I am thinking that we probably ought to scrap the XX:YY.* notation and
> enforce that function is ALWAYS set to 0. Then when adding/removing the
> device we should check for additional functions and handle them
> transparently to the user. As I said in the commit message I am unaware
> of any valid uses for separating functions but I need to look at how
> SR-IOV works.
> 
> 

With SR-IOV devices you can easily have valid BDFs like 0000:05:1c.2, so
enforcing that function is always 0 is not a good idea.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel