WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH V3 07/10] Introduce Xen PCI Passthrough, qdevice

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH V3 07/10] Introduce Xen PCI Passthrough, qdevice (1/3)
From: Anthony PERARD <anthony.perard@xxxxxxxxxx>
Date: Fri, 11 Nov 2011 16:27:55 +0000
Cc: Guy Zana <guy@xxxxxxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Allen Kay <allen.m.kay@xxxxxxxxx>, QEMU-devel <qemu-devel@xxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
Delivery-date: Fri, 11 Nov 2011 08:30:21 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20111110212840.GA23643@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1319814456-8158-1-git-send-email-anthony.perard@xxxxxxxxxx> <1319814456-8158-8-git-send-email-anthony.perard@xxxxxxxxxx> <20111110212840.GA23643@xxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Alpine 2.00 (DEB 1167 2008-08-23)
On Thu, 10 Nov 2011, Konrad Rzeszutek Wilk wrote:

> On Fri, Oct 28, 2011 at 04:07:33PM +0100, Anthony PERARD wrote:
> > From: Allen Kay <allen.m.kay@xxxxxxxxx>
> >
>
> This is going to be a bit lame review..
>

[...]

> > +        return;
> > +    }
> > +
> > +    /* find register group entry */
> > +    reg_grp_entry = pt_find_reg_grp(s, address);
> > +    if (reg_grp_entry) {
> > +        /* check 0 Hardwired register group */
> > +        if (reg_grp_entry->reg_grp->grp_type == GRP_TYPE_HARDWIRED) {
> > +            /* ignore silently */
> > +            PT_LOG("Warning: Access to 0 Hardwired register. "
> > +                   "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
> > +                   pci_bus_num(d->bus), PCI_SLOT(d->devfn), 
> > PCI_FUNC(d->devfn),
> > +                   address, len);
> > +            return;
> > +        }
> > +    }
> > +
> > +    /* read I/O device register value */
> > +    rc = host_pci_get_block(s->real_device, address,
> > +                             (uint8_t *)&read_val, len);
> > +    if (!rc) {
> > +        PT_LOG("Error: pci_read_block failed. return value[%d].\n", rc);
>
> There isn't a PT_ERR? Hm, looking at the code there is only PT_LOG. Perhaps
> declearing PT_ERR and PT_WARN might be a good idea? In case in the future
> one wants different levels of this? Or do we really not care much about that?

I will add this two macros.

> > +        memset(&read_val, 0xff, len);
> > +    }
> > +
> > +    /* pass directly to libpci for passthrough type register group */
>
> Um, is the libpci requirement a certain thing?

:(, it's just an old comment. libpci is not used anymore and have been
replaced by host-pci-device. I will replace libpci in the comment by
"the real device".


[...]

> > +            switch (reg->size) {
> > +            case 1:
> > +                if (reg->u.b.write) {
> > +                    rc = reg->u.b.write(s, reg_entry, ptr_val,
> > +                                        read_val >> ((real_offset & 3) << 
> > 3),
> > +                                        valid_mask);
> > +                }
> > +                break;
> > +            case 2:
> > +                if (reg->u.w.write) {
> > +                    rc = reg->u.w.write(s, reg_entry, (uint16_t *)ptr_val,
> > +                                        (read_val >> ((real_offset & 3) << 
> > 3)),
> > +                                        valid_mask);
> > +                }
> > +                break;
> > +            case 4:
> > +                if (reg->u.dw.write) {
> > +                    rc = reg->u.dw.write(s, reg_entry, (uint32_t *)ptr_val,
> > +                                         (read_val >> ((real_offset & 3) 
> > << 3)),
> > +                                         valid_mask);
> > +                }
> > +                break;
> > +            }
> > +
> > +            if (rc < 0) {
> > +                hw_error("Internal error: Invalid write emulation "
> > +                         "return value[%d]. I/O emulator exit.\n", rc);
>
> Oh. I hadn't realized this, but you are using hw_error. Which is
> calling 'abort'! Yikes. Is there no way to recover from this? Say return 
> 0xfffff?

In qemu-xen-traditionnal, it was an exit(1). I do not know the
consequence of a bad write, and I can not return anythings. So I suppose
that the guest would know that somethings wrong only on the next read.

Instead of abort();, I can just do nothing and return. Or we could unplug
the device from QEMU.

Any preference?

> > +            }
> > +
> > +            /* calculate next address to find */
> > +            emul_len -= reg->size;
> > +            if (emul_len > 0) {
> > +                find_addr = real_offset + reg->size;
> > +            }
> > +        } else {
> > +            /* nothing to do with passthrough type register,
> > +             * continue to find next byte */
> > +            emul_len--;
> > +            find_addr++;
> > +        }
> > +    }
> > +
> > +    /* need to shift back before passing them to libpci */
> > +    val >>= (address & 3) << 3;
> > +
> > +out:
> > +    if (!(reg && reg->no_wb)) {
> > +        /* unknown regs are passed through */
> > +        rc = host_pci_set_block(s->real_device, address, (uint8_t *)&val, 
> > len);
> > +
> > +        if (!rc) {
> > +            PT_LOG("Error: pci_write_block failed. return value[%d].\n", 
> > rc);
> > +        }
> > +    }
> > +
> > +    if (s->pm_state != NULL && s->pm_state->flags & PT_FLAG_TRANSITING) {
> > +        qemu_mod_timer(s->pm_state->pm_timer,
> > +                       qemu_get_clock_ms(rt_clock) + 
> > s->pm_state->pm_delay);
> > +    }
> > +}
> > +
> > +/* ioport/iomem space*/
> > +static void pt_iomem_map(XenPCIPassthroughState *s, int i,
> > +                         pcibus_t e_phys, pcibus_t e_size, int type)
> > +{
> > +    uint32_t old_ebase = s->bases[i].e_physbase;
> > +    bool first_map = s->bases[i].e_size == 0;
> > +    int ret = 0;
> > +
> > +    s->bases[i].e_physbase = e_phys;
> > +    s->bases[i].e_size = e_size;
> > +
> > +    PT_LOG("e_phys=%#"PRIx64" maddr=%#"PRIx64" type=%%d"
> > +           " len=%#"PRIx64" index=%d first_map=%d\n",
> > +           e_phys, s->bases[i].access.maddr, /*type,*/
> > +           e_size, i, first_map);
> > +
> > +    if (e_size == 0) {
> > +        return;
> > +    }
> > +
> > +    if (!first_map && old_ebase != -1) {
>
> old_ebase != PCI_BAR_UNMAPPED ?

:(, no. Because old_ebase is a uint32_t and PCI_BAR_UNMAPPED is
pcibus_t (uint64_t in Xen case).

I'm not sure that a good idee to change the type of old_ebase as
xc_domain_memory_mapping bellow takes only uint32_t.

But, if I can replace a -1 by PCI_BAR_UNMAPPED, I will.

> > +        /* Remove old mapping */
> > +        ret = xc_domain_memory_mapping(xen_xc, xen_domid,
> > +                               old_ebase >> XC_PAGE_SHIFT,
> > +                               s->bases[i].access.maddr >> XC_PAGE_SHIFT,
> > +                               (e_size + XC_PAGE_SIZE - 1) >> 
> > XC_PAGE_SHIFT,
> > +                               DPCI_REMOVE_MAPPING);
> > +        if (ret != 0) {
> > +            PT_LOG("Error: remove old mapping failed!\n");
> > +            return;
> > +        }
> > +    }
> > +
> > +    /* map only valid guest address */
> > +    if (e_phys != -1) {
>
> PCI_BAR_UNMAPPED
>
> > +        /* Create new mapping */
> > +        ret = xc_domain_memory_mapping(xen_xc, xen_domid,
> > +                                   s->bases[i].e_physbase >> XC_PAGE_SHIFT,
> > +                                   s->bases[i].access.maddr >> 
> > XC_PAGE_SHIFT,
> > +                                   (e_size+XC_PAGE_SIZE-1) >> 
> > XC_PAGE_SHIFT,
> > +                                   DPCI_ADD_MAPPING);
> > +
> > +        if (ret != 0) {
> > +            PT_LOG("Error: create new mapping failed!\n");
> > +        }
> > +    }
> > +}

[...]

> > +void pt_bar_mapping(XenPCIPassthroughState *s, int io_enable, int 
> > mem_enable)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > +        pt_bar_mapping_one(s, i, io_enable, mem_enable);
> > +    }
> > +}
> > +
> > +/* register regions */
> > +static int pt_register_regions(XenPCIPassthroughState *s)
> > +{
> > +    int i = 0;
> > +    uint32_t bar_data = 0;
> > +    HostPCIDevice *d = s->real_device;
> > +
> > +    /* Register PIO/MMIO BARs */
> > +    for (i = 0; i < PCI_BAR_ENTRIES; i++) {
> > +        HostPCIIORegion *r = &d->io_regions[i];
> > +
> > +        if (r->base_addr) {
>
> So should you check for PCI_BAR_UNMAPPED or is that not really
> required here as the pci_register_bar would do it?

Actually, this value come from the real device (the value in
sysfs/resource). So, I think it's just 0 if it's not mapped.

Here, it's probably better to check for the size instead, to know if
there is actually a BAR.

> > +            s->bases[i].e_physbase = r->base_addr;
> > +            s->bases[i].access.u = r->base_addr;
> > +
> > +            /* Register current region */
> > +            if (r->flags & IORESOURCE_IO) {
> > +                memory_region_init_io(&s->bar[i], NULL, NULL,
> > +                                      "xen-pci-pt-bar", r->size);
>
> You can make the "xen_pci-pt-bar" be a #define somewhere and reuse that.
>
> > +                pci_register_bar(&s->dev, i, PCI_BASE_ADDRESS_SPACE_IO,
> > +                                 &s->bar[i]);
> > +            } else if (r->flags & IORESOURCE_PREFETCH) {
> > +                memory_region_init_io(&s->bar[i], NULL, NULL,
> > +                                      "xen-pci-pt-bar", r->size);
> > +                pci_register_bar(&s->dev, i, PCI_BASE_ADDRESS_MEM_PREFETCH,
> > +                                 &s->bar[i]);
> > +            } else {
> > +                memory_region_init_io(&s->bar[i], NULL, NULL,
> > +                                      "xen-pci-pt-bar", r->size);
> > +                pci_register_bar(&s->dev, i, PCI_BASE_ADDRESS_SPACE_MEMORY,
> > +                                 &s->bar[i]);
> > +            }
> > +
> > +            PT_LOG("IO region registered (size=0x%08"PRIx64
> > +                   " base_addr=0x%08"PRIx64")\n",
> > +                   r->size, r->base_addr);
> > +        }
> > +    }
> > +
> > +    /* Register expansion ROM address */
> > +    if (d->rom.base_addr && d->rom.size) {
> > +        /* Re-set BAR reported by OS, otherwise ROM can't be read. */
> > +        bar_data = host_pci_get_long(d, PCI_ROM_ADDRESS);
> > +        if ((bar_data & PCI_ROM_ADDRESS_MASK) == 0) {
> > +            bar_data |= d->rom.base_addr & PCI_ROM_ADDRESS_MASK;
> > +            host_pci_set_long(d, PCI_ROM_ADDRESS, bar_data);
> > +        }
> > +
> > +        s->bases[PCI_ROM_SLOT].e_physbase = d->rom.base_addr;
> > +        s->bases[PCI_ROM_SLOT].access.maddr = d->rom.base_addr;
> > +
> > +        memory_region_init_rom_device(&s->rom, NULL, NULL, &s->dev.qdev,
> > +                                      "xen-pci-pt-rom", d->rom.size);
> > +        pci_register_bar(&s->dev, PCI_ROM_SLOT, 
> > PCI_BASE_ADDRESS_MEM_PREFETCH,
> > +                         &s->rom);
> > +
> > +        PT_LOG("Expansion ROM registered (size=0x%08"PRIx64
> > +               " base_addr=0x%08"PRIx64")\n",
> > +               d->rom.size, d->rom.base_addr);
> > +    }
> > +
> > +    return 0;
> > +}
> > +

[...]

> > +static int pt_initfn(PCIDevice *pcidev)
> > +{
> > +    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, 
> > pcidev);
> > +    int dom, bus;
> > +    unsigned slot, func;
> > +    int rc = 0;
> > +    uint32_t machine_irq;
> > +    int pirq = -1;
> > +
> > +    if (pci_parse_devaddr(s->hostaddr, &dom, &bus, &slot, &func) < 0) {
> > +        fprintf(stderr, "error parse bdf: %s\n", s->hostaddr);
> > +        return -1;
> > +    }
> > +
> > +    /* register real device */
> > +    PT_LOG("Assigning real physical device %02x:%02x.%x to devfn %i ...\n",
> > +           bus, slot, func, s->dev.devfn);
> > +
> > +    s->real_device = host_pci_device_get(bus, slot, func);
> > +    if (!s->real_device) {
> > +        return -1;
> > +    }
> > +
> > +    s->is_virtfn = s->real_device->is_virtfn;
> > +    if (s->is_virtfn) {
> > +        PT_LOG("%04x:%02x:%02x.%x is a SR-IOV Virtual Function\n",
> > +               s->real_device->domain, bus, slot, func);
> > +    }
> > +
> > +    /* Initialize virtualized PCI configuration (Extended 256 Bytes) */
> > +    if (host_pci_get_block(s->real_device, 0, pcidev->config,
> > +                           PCI_CONFIG_SPACE_SIZE) == -1) {
> > +        return -1;
> > +    }
> > +
> > +    /* Handle real device's MMIO/PIO BARs */
> > +    pt_register_regions(s);
> > +
> > +    /* reinitialize each config register to be emulated */
> > +    pt_config_init(s);
> > +
> > +    /* Bind interrupt */
> > +    if (!s->dev.config[PCI_INTERRUPT_PIN]) {
> > +        PT_LOG("no pin interrupt\n");
>
> Perhaps include some details of which device failed?

There is already detailed about the device at the beginning of the
function. Is it not enough?

> > +        goto out;
> > +    }
> > +
> > +    machine_irq = host_pci_get_byte(s->real_device, PCI_INTERRUPT_LINE);
> > +    rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
> > +
> > +    if (rc) {
> > +        PT_LOG("Error: Mapping irq failed, rc = %d\n", rc);
>
> Can you also include the IRQ it tried to map (both machine and pirq).

Yep.

> > +
> > +        /* Disable PCI intx assertion (turn on bit10 of devctl) */
> > +        host_pci_set_word(s->real_device,
> > +                          PCI_COMMAND,
> > +                          pci_get_word(s->dev.config + PCI_COMMAND)
> > +                          | PCI_COMMAND_INTX_DISABLE);
> > +        machine_irq = 0;
> > +        s->machine_irq = 0;
> > +    } else {
> > +        machine_irq = pirq;
> > +        s->machine_irq = pirq;
> > +        mapped_machine_irq[machine_irq]++;
> > +    }
> > +
> > +    /* bind machine_irq to device */
> > +    if (rc < 0 && machine_irq != 0) {
> > +        uint8_t e_device = PCI_SLOT(s->dev.devfn);
> > +        uint8_t e_intx = pci_intx(s);
> > +
> > +        rc = xc_domain_bind_pt_pci_irq(xen_xc, xen_domid, machine_irq, 0,
> > +                                       e_device, e_intx);
> > +        if (rc < 0) {
> > +            PT_LOG("Error: Binding of interrupt failed! rc=%d\n", rc);
>
> A bit details - name of the device, the IRQ,..
>
> > +
> > +            /* Disable PCI intx assertion (turn on bit10 of devctl) */
> > +            host_pci_set_word(s->real_device, PCI_COMMAND,
> > +                              *(uint16_t *)(&s->dev.config[PCI_COMMAND])
> > +                              | PCI_COMMAND_INTX_DISABLE);
> > +            mapped_machine_irq[machine_irq]--;
> > +
> > +            if (mapped_machine_irq[machine_irq] == 0) {
> > +                if (xc_physdev_unmap_pirq(xen_xc, xen_domid, machine_irq)) 
> > {
> > +                    PT_LOG("Error: Unmapping of interrupt failed! rc=%d\n",
> > +                           rc);
>
> And here too. It would be beneficial to have on the error paths lots of
> nice details so that in the field it will be easier to find out what
> went wrong (and match up PIRQ with the GSI).

Yes, I will try to improve the messages.

It's also probably good to always print the errors.


Thanks,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel