Re: [Xen-devel] [VTD][patch 1/5] HVM device assignment using vt-

Hi there,

Some comments and questions inline. In general I like it and hope we
can converge to a common IOMMU support layer (for Intel, AMD and IBM
Calgary/CalIOC2) soon.

On Wed, May 30, 2007 at 12:07:15PM -0700, Kay, Allen M wrote:
> vtd1.patch:
>     - vt-d specific code 
>     - low risk changes in common code
> 
> Signed-off-by: Allen Kay <allen.m.kay@xxxxxxxxx>
> Signed-off-by: Xiaohui Xin <xiaohui.xin@xxxxxxxxx>
>
> diff -r 089696e0c603 buildconfigs/linux-defconfig_xen_x86_64
> --- a/buildconfigs/linux-defconfig_xen_x86_64 Thu May 17 11:42:46 2007 +0100
> +++ b/buildconfigs/linux-defconfig_xen_x86_64 Wed May 30 11:24:05 2007 -0400

The .config changes do not appear necessary?

> diff -r 089696e0c603 tools/libxc/xc_domain.c
> --- a/tools/libxc/xc_domain.c Thu May 17 11:42:46 2007 +0100
> +++ b/tools/libxc/xc_domain.c Wed May 30 11:24:05 2007 -0400
> @@ -654,6 +654,78 @@ int xc_domain_send_trigger(int xc_handle
>      domctl.domain = domid;
>      domctl.u.sendtrigger.trigger = trigger;
>      domctl.u.sendtrigger.vcpu = vcpu;
> +
> +    return do_domctl(xc_handle, &domctl);
> +}
> +
> +int xc_assign_device(int xc_handle,
> +                     uint32_t domid,
> +                     uint32_t machine_bdf)
> +{
> +    DECLARE_DOMCTL;
> +
> +    domctl.cmd = XEN_DOMCTL_assign_device;
> +    domctl.domain = domid;
> +    domctl.u.assign_device.machine_bdf = machine_bdf;
> + 
> +    return do_domctl(xc_handle, &domctl);
> +}

What are the intended semantics of xc_assign_device? I am assuming it
leads to the creation of an IO address space. What about devices which
share an IO address space or IOMMUs with limited IO addressability?
How about an interface where we:

- create an IO address space of size 'size' for BDF 'bdf'
- destroy an IO address space

Also, I'm not sure whether these should be domctls or platform ops.

> diff -r 089696e0c603 xen/arch/x86/domain.c
> --- a/xen/arch/x86/domain.c   Thu May 17 11:42:46 2007 +0100
> +++ b/xen/arch/x86/domain.c   Wed May 30 11:24:31 2007 -0400
> @@ -42,6 +42,7 @@
>  #include <asm/hvm/hvm.h>
>  #include <asm/hvm/support.h>
>  #include <asm/msr.h>
> +#include <asm/iommu.h>
>  #ifdef CONFIG_COMPAT
>  #include <compat/vcpu.h>
>  #endif
> @@ -454,6 +455,9 @@ int arch_domain_create(struct domain *d)
>          share_xen_page_with_guest(
>              virt_to_page(d->shared_info), d, XENSHARE_writable);
>      }
> +
> +    if (iommu_found())
> +        iommu_domain_init(d);

This is where the fun starts :-)

Is it Xen's or dom0's job to find the IOMMU? on one hand, dom0 will
have all of that code through the Linux VT-d support patch, on the
other hand, splitting the detection and initialization of the platform
hardware between Xen and dom0 leads to artifical interfaces. I guess
the question is whether it's acceptable to add a whole lot of
infrastructure to Xen that duplicates stuff that Linux has or will
have soon for detecting the presence of the IOMMU?

If I understand iommu_domain_init(), shouldn't it be called
conditionally on whether there's a device assigned to the domain, and
(with your current patchset) whether the domain is a VMX domain?

>      if ( is_hvm_domain(d) )
>      {
> diff -r 089696e0c603 xen/arch/x86/domctl.c
> --- a/xen/arch/x86/domctl.c   Thu May 17 11:42:46 2007 +0100
> +++ b/xen/arch/x86/domctl.c   Wed May 30 11:24:05 2007 -0400
> @@ -25,6 +25,8 @@
>  #include <asm/hvm/support.h>
>  #include <asm/processor.h>
>  #include <public/hvm/e820.h>
> +#include <xen/list.h>
> +#include <asm/iommu.h>
>  
>  long arch_do_domctl(
>      struct xen_domctl *domctl,
> @@ -397,6 +399,7 @@ long arch_do_domctl(
>              ret = switch_native(d);
>              break;
>  #endif
> +

superflous whitespace

>          default:
>              ret = (domctl->u.address_size.size == BITS_PER_LONG) ? 0 : 
> -EINVAL;
>              break;
> @@ -421,6 +424,155 @@ long arch_do_domctl(
>  
>          if ( copy_to_guest(u_domctl, domctl, 1) )
>              ret = -EFAULT;
> +    }
> +    break;
> +
> +    case XEN_DOMCTL_assign_device:
> +    {
> +        struct domain *d;
> +        u8 bus, devfn;
> +
> +        ret = -EINVAL;
> +        d = get_domain_by_id(domctl->domain);

get_domain_by_id() can fail

> +        bus = (domctl->u.assign_device.machine_bdf >> 16) & 0xff;
> +        devfn = (domctl->u.assign_device.machine_bdf >> 8) & 0xff;

High-level comment - there should be an IOMMU multiplexing layer for
different IOMMU implementations that calls the right IOMMU
implementation of assign_device, etc.

> +        ret = assign_device(d, bus, devfn);
> +        gdprintk(XENLOG_INFO, "XEN_DOMCTL_assign_device: bdf = %x:%x:%x\n",
> +            bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> +        put_domain(d);
> +    }
> +    break;
> +
> +    case XEN_DOMCTL_irq_mapping:
> +    {
> +        struct domain *d;
> +        uint32_t machine_gsi, guest_gsi;
> +        uint32_t device, intx;
> +
> +        ret = -EINVAL;
> +
> +        d = get_domain_by_id(domctl->domain);

get_domain_by_id() can fail

> +        machine_gsi = domctl->u.irq_mapping.machine_irq;
> +        device = domctl->u.irq_mapping.device;
> +        intx = domctl->u.irq_mapping.intx;
> +        guest_gsi = hvm_pci_intx_gsi(device, intx);
> +
> +        d->arch.hvm_domain.irq.mirq[machine_gsi].valid = 1;
> +        d->arch.hvm_domain.irq.mirq[machine_gsi].device = device;
> +        d->arch.hvm_domain.irq.mirq[machine_gsi].intx = intx;
> +        d->arch.hvm_domain.irq.mirq[machine_gsi].guest_gsi = guest_gsi;
> +
> +        d->arch.hvm_domain.irq.girq[guest_gsi].valid = 1;
> +        d->arch.hvm_domain.irq.girq[guest_gsi].device = device;
> +        d->arch.hvm_domain.irq.girq[guest_gsi].intx = intx;
> +        d->arch.hvm_domain.irq.girq[guest_gsi].machine_gsi = d->machine_gsi;

Should we check here that d is an hvm domain?

> +
> +        pirq_guest_bind(d->vcpu[0], machine_gsi, BIND_PIRQ__WILL_SHARE);

pirq_guest_bind() may fail

> +        gdprintk(XENLOG_INFO,
> +            "XEN_DOMCTL_irq_mapping: m_irq = %x device = %x intx = %x\n",
> +            machine_gsi, domctl->u.irq_mapping.device,
> +            domctl->u.irq_mapping.intx);
> +        ret = 0;
> +        put_domain(d);
> +    }
> +    break;
> +
> +    case XEN_DOMCTL_memory_mapping:
> +    {
> +        struct domain *d;
> +        unsigned long gfn = domctl->u.memory_mapping.first_gfn;
> +        unsigned long mfn = domctl->u.memory_mapping.first_mfn;
> +        unsigned long nr_mfns = domctl->u.memory_mapping.nr_mfns;
> +        int i;
> +
> +        ret = -EINVAL;
> +        if ( (mfn + nr_mfns - 1) < mfn ) /* wrap? */
> +            break;
> +        ret = -ESRCH;
> +        d = get_domain_by_id(domctl->domain);
> +        if ( d == NULL )
> +            break;
> +        if ( iomem_access_permitted(d, mfn, mfn) ) {

Shouldn't the second mfn be 'mfn + nr_mfns - 1'?

> +            if ( domctl->u.memory_mapping.add_mapping ) {
> +                gdprintk(XENLOG_INFO,
> +                    "DOMCTL_memory_map:add: gfn= %lx mfn= %lx nr_mfns = 
> %lx\n",
> +                    gfn, mfn, nr_mfns);
> +                for ( i = 0; i < nr_mfns ; i++ )
> +                    set_p2m_entry(d, gfn+i, _mfn(mfn+i));
> +            }
> +            else {
> +                gdprintk(XENLOG_INFO,
> +                    "DOMCTL_memory_map:remove:gfn=%lx mfn=%lx nr_mfns=%lx\n",
> +                     gfn, mfn, nr_mfns);
> +                for( i = 0; i <= nr_mfns; i++ )
> +                    set_p2m_entry(d, gfn+i, _mfn(INVALID_MFN));
> +            }

I would prefer a different hypercall to remove a mapping, rather than
a magic toggle...

> +            ret = iomem_permit_access(d, gfn, gfn + nr_mfns - 1);
> +            ret = 0;

Err... surely one of the above two lines is wrong :-)

> +        }
> +        else
> +            ret = 1;
> +        put_domain(d);
> +    }
> +    break;
> +
> +    case XEN_DOMCTL_ioport_mapping:
> +    {
> +#define MAX_IOPORTS    0x10000
> +        struct domain *d;
> +        struct hvm_domain *hd;
> +        unsigned int fgp = domctl->u.ioport_mapping.first_gport;
> +        unsigned int fmp = domctl->u.ioport_mapping.first_mport;
> +        unsigned int np = domctl->u.ioport_mapping.nr_ports;
> +        struct g2m_ioport *g2m_ioport;
> +        int found = 0;
> +
> +        gdprintk(XENLOG_ERR,
> +            "XEN_DOMCTL_ioport_map: f_gport = %x f_mport = %x np = %x\n",
> +            fgp, fmp, np);
> +
> +        ret = -EINVAL;
> +        if ((fgp > MAX_IOPORTS) || (fmp > MAX_IOPORTS) ||
> +            ((fgp + np) > MAX_IOPORTS) || ((fmp + np) > MAX_IOPORTS))
> +        {
> +            gdprintk(XENLOG_INFO,
> +                "XEN_DOMCTL_ioport_map:invalid:gport=%x mport=%x 
> nr_ports=%x\n",
> +                fgp, fmp, np);
> +            break;
> +        }

Something is strange with this check... why check both fxp and fxp +
np against MAX_IOPORTS? did you mean to check for wrap-around 
(fxp + np < fxp)?

> +        if ( np == 0 )
> +            ret = 0;

If np == 0, shouldn't we break here?

> diff -r 089696e0c603 xen/arch/x86/setup.c
> --- a/xen/arch/x86/setup.c    Thu May 17 11:42:46 2007 +0100
> +++ b/xen/arch/x86/setup.c    Wed May 30 11:26:48 2007 -0400
> @@ -902,6 +902,9 @@ void __init __start_xen(multiboot_info_t
>          _initrd_len   = mod[initrdidx].mod_end - mod[initrdidx].mod_start;
>      }
>  
> +    if (iommu_setup() != 0)
> +        panic("iommu_setup() failed\n");
> +

panic() seems heavy-handed here... surely we can limp along even if we
failed to setup the IOMMU?

> diff -r 089696e0c603 xen/include/asm-x86/hvm/domain.h
> --- a/xen/include/asm-x86/hvm/domain.h        Thu May 17 11:42:46 2007 +0100
> +++ b/xen/include/asm-x86/hvm/domain.h        Wed May 30 11:24:05 2007 -0400
> @@ -21,6 +21,7 @@
>  #ifndef __ASM_X86_HVM_DOMAIN_H__
>  #define __ASM_X86_HVM_DOMAIN_H__
>  
> +#include <asm/iommu.h>
>  #include <asm/hvm/irq.h>
>  #include <asm/hvm/vpt.h>
>  #include <asm/hvm/vlapic.h>
> @@ -55,6 +56,14 @@ struct hvm_domain {
>      spinlock_t             pbuf_lock;
>  
>      uint64_t               params[HVM_NR_PARAMS];
> +
> +    /* iommu support */
> +    spinlock_t iommu_list_lock;    /* protect iommu specific lists */
> +    struct list_head pdev_list;    /* direct accessed pci devices */
> +    struct list_head g2m_ioport_list;  /* guest to machine ioport mapping */
> +    struct dma_pte *pgd;           /* io page directory root */
> +    spinlock_t mapping_lock;       /* io page table lock */
> +    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
>  };

Why not make this a seperate struct, specific to vt-d, with a void*
iommu pointer in struct hvm_domain (or better yet, struct domain)? 
That way we do not bloat struct hvm_domain even when no IOMMU is
present, as well as make it possible to support different IOMMUs, and
IOMMUs where the IOMMU data is shared between multiple domains.

> diff -r 089696e0c603 xen/include/public/domctl.h
> --- a/xen/include/public/domctl.h     Thu May 17 11:42:46 2007 +0100
> +++ b/xen/include/public/domctl.h     Wed May 30 11:24:05 2007 -0400
> @@ -429,7 +429,63 @@ typedef struct xen_domctl_sendtrigger xe
>  typedef struct xen_domctl_sendtrigger xen_domctl_sendtrigger_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_sendtrigger_t);
>  
> - 
> +
> +#define XEN_DOMCTL_assign_device        37
> +struct xen_domctl_assign_device {
> +    domid_t   domain_id;
> +    uint16_t  pad;
> +    uint32_t  machine_bdf;
> +};

The padding will not be necessary is we rearrange the struct
members. Also, see comment about wrt create/destroy_io_space.

> +typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
> +
> +#define DPCI_ADD_MAPPING         1
> +#define DPCI_REMOVE_MAPPING      0 

See comment above wrt making these separate interfaces rather than a
single multiplexed one with magic toggles.

> +#define XEN_DOMCTL_irq_mapping        38
> +#define DEVICE_METHOD       1
> +#define MSI_METHOD          2
> +struct xen_domctl_irq_mapping {
> +    uint32_t machine_irq;     /* machine irq to be mapped to guest */
> +    uint32_t method;          /* msi or device routine method */
> +    uint32_t add_mapping;     /* add or remove mapping */
> +    union {
> +        struct {
> +            uint32_t device;
> +            uint32_t intx;
> +        };
> +        struct {
> +            uint32_t vector;
> +            uint32_t pad;
> +        };
> +    };

ISTR some gcc versions having trouble with anonymous unions.

> diff -r 089696e0c603 xen/include/asm-x86/iommu.h
> --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> +++ b/xen/include/asm-x86/iommu.h     Wed May 30 11:24:05 2007 -0400
> @@ -0,0 +1,94 @@
> +/*
> + * Copyright (c) 2006, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 
> Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + * Copyright (C) Allen Kay <allen.m.kay@xxxxxxxxx>
> + */
> +
> +#ifndef _IOMMU_H_
> +#define _IOMMU_H_
> +
> +#include <xen/init.h>
> +#include <xen/bitmap.h>
> +#include <xen/irq.h>
> +#include <xen/spinlock.h>
> +#include <xen/mm.h>
> +#include <xen/xmalloc.h>
> +#include <asm/hvm/vmx/intel-iommu.h>
> +#include <public/hvm/ioreq.h>
> +
> +#define iommu_found()     (!list_empty(&acpi_drhd_units))
> +#define dev_assigned(d)  (!list_empty(&d->arch.hvm_domain.pdev_list))

This is specific to VT-d and should not be in a common header
(iommu.h)

> +
> +#ifdef __x86_64__
> +typedef u64 dma_addr_t;
> +#else
> +typedef u32 dma_addr_t;
> +#endif

Why is this needed?

> +
> +#define MAX_IOMMUS 32

Hmm? specific to VT-d, I assume?

> +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48

Likewise?

Cool stuff, more to come later... but first the million dollar
question: when will VT-d hardware be publicly available?

Cheers,
Muli


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [VTD][patch 1/5] HVM device assignment using vt-d