Jiang, Yunhong wrote:
Jeremy, attached is basic MSI support to PV_dom0. Please have a look on it.
Thanks
Yunhong Jiang
The pci_wrapper.patch add some hook to pci_bus_type, so that Xen will be
notified when a PCI device is added to system.
Makefile | 2 -
pci_wrap.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+), 1 deletion(-)
The enable_msi.patch add the MSI support to dom0, basically it just allocate
irq from Xen irq name space.
arch/x86/include/asm/xen/pci.h | 17 +++++++
arch/x86/kernel/apic/io_apic.c | 22 +++++++--
arch/x86/xen/apic.c | 1
drivers/xen/events.c | 91 +++++++++++++++++++++++++++++++++++++++-
include/xen/interface/physdev.h | 59 +++++++++++++++++++++++++
5 files changed, 182 insertions(+), 8 deletions(-)
The reenable_msi.patch just remove original function that disable MSI in xen
environment.
arch/x86/xen/apic.c | 2 --
drivers/pci/pci.h | 2 ++
include/linux/pci.h | 6 ------
3 files changed, 2 insertions(+), 8 deletions(-)
Notice:
a) Currently the PCI save/restore is broken. The reason is because pci_restore_msi_state()
in "drivers/pci/msi.c" will try to restore the MSI config depends on device's msi
msg information (i.e. content of msi_desc->msg). However, that information is wrong in
Xen environment because Xen HV owns MSI. I'm still trying to find a method to achieve the
save/restore without touch the common msi.c code, any suggestion is welcome.
You mean because it tries to directly write to pci config space, or that
it writes the wrong thing there?
What will this affect? Dom0 S3 suspend/resume? More?
b) I notice pci frontend is in a branch. But to support pci frontend without
touch common PCI function is difficult, I'm still considering it.
OK.
c) I'm not sure if xen's irq space should be same as pirq. Basically I think
irq is dom0 internal structure, while PIRQ is interface between Xen HV/dom0.
But seems current implementation think these two items are same. I didn't try
to change it, but that may need improvement.
What do you mean by "Xen's irq space"? Does it have an irq space that's
distinct from pirqs? Or do you mean "Linux's irq space"? At the
moment, a Linux irq == the gsi, which is a convention I kept for Xen
interrupts, though there's no very strong tie (identity_mapped_irq()
determines where we bother with the identity mapping; only the legacy
ISA interrupts are essential). Obviously this has no meaning with MSI
interrupts.
Or have I misunderstood you?
(It would be easier to review the patches if they were inline, one per mail)
commit 702965d82162e07c0c2afbdddbbe9a0c9a1c599d Author: Jiang, Yunhong
<yunhong.jiang@xxxxxxxxx> Date: Fri May 8 00:50:34 2009 +0800 Add MSI
support to Xen Dom0 It add some hook to arch specific MSI function, so
that it will a) allocate irq from Xen's irq space b) allocate vector
through Xen's map_irq hypercall. Currently Xen's irq space function
has no chip_data function, I assume this should be ok since we Xen's
IRQ is different chip with native IOAPIC. Signed-off-by: Jiang,
Yunhong <yunhong.jiang@xxxxxxxxx> diff --git
a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
index 0563fc6..a5d9027 100644 --- a/arch/x86/include/asm/xen/pci.h +++
b/arch/x86/include/asm/xen/pci.h @@ -3,11 +3,28 @@ #ifdef
CONFIG_XEN_DOM0_PCI int xen_register_gsi(u32 gsi, int triggering, int
polarity); +int xen_create_msi_irq(struct pci_dev *dev, + unsigned int
want, + struct msi_desc *msidesc, + int type); +int
xen_destroy_irq(int irq); #else static inline int xen_register_gsi(u32
gsi, int triggering, int polarity) { return -1; } + +int
xen_create_msi_irq(struct pci_dev *dev,
static inline
+ unsigned int want, + struct msi_desc *msidesc, + int type) +{ +
return -1; +} +int xen_destroy_irq(int irq)
static inline
+{ + return -1; +} #endif #endif /* _ASM_X86_XEN_PCI_H */ diff --git
a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b562550..d1e3408 100644 --- a/arch/x86/kernel/apic/io_apic.c +++
b/arch/x86/kernel/apic/io_apic.c @@ -66,6 +66,7 @@ #include
<asm/xen/hypervisor.h> #include <asm/apic.h> +#include <asm/xen/pci.h>
#define __apicdebuginit(type) static type __init @@ -3472,6 +3473,10
@@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc
*msidesc, int irq) return ret; set_irq_msi(irq, msidesc); + + if
(xen_domain()) + return 0;
I think it would be cleaner to have a xen_setup_msi_irq() which just does the
ret = msi_compose_msg(dev, irq, &msg);
if (ret < 0)
return ret;
set_irq_msi(irq, msidesc);
parts then returns, and put the if (xen) logic at the callsite (either with an
explicit
test, or add an ops vec of some kind.
+ write_msi_msg(irq, &msg); if (irq_remapped(irq)) { @@ -3505,9
+3510,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int
type) irq_want = nr_irqs_gsi; sub_handle = 0;
list_for_each_entry(msidesc, &dev->msi_list, list) { - irq =
create_irq_nr(irq_want); - if (irq == 0) - return -1; + if ((irq =
xen_create_msi_irq(dev, irq_want, msidesc, type)) > 0) + { +
dev_printk(KERN_DEBUG, "xen create irq %x for msi\n", irq); + } else {
+ irq = create_irq_nr(irq_want); + if (irq == 0) + return -1; + }
Hoist this out into a create_msi_irq() which does the native vs Xen thing
appropriately.
irq_want = irq + 1; if (!intr_remapping_enabled) goto no_ir; @@
-3544,13 +3554,15 @@ no_ir: return 0; error: - destroy_irq(irq); + if
(xen_destroy_irq(irq)) + destroy_irq(irq);
Hoist this out too. (destroy_msi_irq()?)
return ret; }
Do we (or will we ever) support the interrupt remapping stuff in the dom0
kernel,
or is that something that might happen within Xen? (I don't know anything about
it.)
If not, then it looks like there isn't very much common code between the native
and Xen versions of arch_setup_msi_irqs(). I think it might be overall cleaner
just to explicitly have two versions, with arch_setup_msi_irqs() being a
wrapper to
choose which one to use. Even if we do support interrupt remapping, it wouldn't
result in much duplicated code at all, as far as I can see.
void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); +
if (xen_destroy_irq(irq)) + destroy_irq(irq);
Yes, this little pattern should be put into a single place.
} #if defined (CONFIG_DMAR) || defined (CONFIG_INTR_REMAP) diff --git
a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 496f07d..2f35a2d
100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -1,7
+1,6 @@ #include <linux/kernel.h> #include <linux/threads.h> #include
<linux/bitmap.h> -#include <linux/pci.h> #include <asm/io_apic.h>
#include <asm/acpi.h> diff --git a/drivers/xen/events.c
b/drivers/xen/events.c index af2aad4..65e4c7a 100644 ---
a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -28,6 +28,9 @@
#include <linux/string.h> #include <linux/bootmem.h> #include
<linux/irqnr.h> +#include <linux/pci_regs.h> +#include <linux/pci.h>
+#include <linux/msi.h> #include <asm/ptrace.h> #include <asm/irq.h>
@@ -560,14 +563,98 @@ int xen_allocate_pirq(unsigned gsi, char *name)
if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) {
dynamic_irq_cleanup(irq); irq = -ENOSPC; + goto out; + } + +
irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); +out: +
spin_unlock(&irq_mapping_update_lock); + return irq; +} + +int
xen_destroy_irq(int irq) +{ + struct physdev_unmap_pirq unmap_irq; +
int rc; + + if (!xen_domain()) + return -1; + + unmap_irq.pirq = irq;
+ unmap_irq.domid = DOMID_SELF; + if ((rc =
HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq))) + { +
printk(KERN_WARNING "unmap irq failed %x\n", rc); goto out; } -
irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + irq_info[irq] =
mk_unbound_info(); + + dynamic_irq_cleanup(irq); + return 0; out: -
spin_unlock(&irq_mapping_update_lock); + return -1; +} + +int
xen_create_msi_irq(struct pci_dev *dev, + unsigned int want,
Do we need this param? Looks unused.
+ struct msi_desc *msidesc, + int type) +{ + int irq = 0; + struct
physdev_map_pirq map_irq; + int rc; + domid_t domid = DOMID_SELF; +
int pos; + u32 table_offset, bir; + if (!xen_domain()) + return -1; +
+ memset(&map_irq, 0, sizeof(map_irq)); + map_irq.domid = domid; +
map_irq.type = MAP_PIRQ_TYPE_MSI; + map_irq.index = -1; + map_irq.bus
= dev->bus->number; + map_irq.devfn = dev->devfn; + + if (type ==
PCI_CAP_ID_MSIX) + { + pos = pci_find_capability(dev,
PCI_CAP_ID_MSIX); + +#define msix_table_offset_reg(base) (base + 0x04)
Isn't this defined somewhere else? If not, is there a better place to define
it?
+ pci_read_config_dword(dev, msix_table_offset_reg(pos),
&table_offset); + bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK); +
+ map_irq.table_base = pci_resource_start(dev, bir); +
map_irq.entry_nr = msidesc->msi_attrib.entry_nr; + } + +
spin_lock(&irq_mapping_update_lock); + + irq = find_unbound_irq(); + +
if (irq == -1) + goto out; + + map_irq.pirq = irq; + + if ((rc =
HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq))) + { +
printk(KERN_WARNING "xen map irq failed %x\n", rc); +
dynamic_irq_cleanup(irq); + irq = -1; + goto out; + } + +
irq_info[irq] = mk_pirq_info(0, -1, map_irq.index); +
set_irq_chip_and_handler_name(irq, &xen_pirq_chip, + handle_level_irq,
+ (type == PCI_CAP_ID_MSIX) ? "msi-x":"msi"); + +out: +
spin_unlock(&irq_mapping_update_lock); return irq; } diff --git
a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index cd69391..47ab7f7 100644 --- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h @@ -121,6 +121,65 @@ struct
physdev_op { } u; }; + +#define MAP_PIRQ_TYPE_MSI 0x0 +#define
MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define
PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; + /*
IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int
pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int
entry_nr; + /* IN */ + uint64_t table_base; +}; + +#define
PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t domid;
+ /* IN */ + int pirq; +}; + +#define PHYSDEVOP_manage_pci_add 15
+#define PHYSDEVOP_manage_pci_remove 16 +struct physdev_manage_pci { +
/* IN */ + uint8_t bus; + uint8_t devfn; +}; + +#define
PHYSDEVOP_restore_msi 19 +struct physdev_restore_msi { + /* IN */ +
uint8_t bus; + uint8_t devfn; +}; + +#define
PHYSDEVOP_manage_pci_add_ext 20 +struct physdev_manage_pci_ext { + /*
IN */ + uint8_t bus; + uint8_t devfn; + unsigned is_extfn; + unsigned
is_virtfn; + struct { + uint8_t bus; + uint8_t devfn; + } physfn; +};
+ /* * Notify that some PIRQ-bound event channels have been unmasked.
* ** This command is obsolete since interface version 0x00030202 and
is **
commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang, Yunhong
<yunhong.jiang@xxxxxxxxx> Date: Thu May 7 21:22:26 2009 +0800 Add the
pci wrap function, to notify Xen of all PCI information. Currently Xen
depends on Dom0 to notify all PCI information. When Xen try to setup
MSI for a pci device, it will depends on this information. This method
need more discussion, since it add some ugly hook to pci_bus_type.
Yes, this is a bit unpleasant. I can't see any immediately satisfying
solution, partly because
I don't fully understand what needs to be done in these hooks. Why does it
need to do this at probe
time rather than when setting up the interrupts?
Signed-off-by: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> diff --git
a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f0d1a89..372ad9e
100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@
-13,4 +13,4 @@ obj-$(CONFIG_SMP) += smp.o spinlock.o
obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o obj-$(CONFIG_XEN_DOM0) +=
vga.o apic.o obj-$(CONFIG_PCI_XEN) += pci-swiotlb.o
-obj-$(CONFIG_XEN_DOM0_PCI) += pci.o \ No newline at end of file
+obj-$(CONFIG_XEN_DOM0_PCI) += pci.o pci_wrap.o diff --git
a/arch/x86/xen/pci_wrap.c b/arch/x86/xen/pci_wrap.c new file mode
100644 index 0000000..4ab6966 --- /dev/null +++
b/arch/x86/xen/pci_wrap.c @@ -0,0 +1,90 @@ +/* + *
vim:shiftwidth=8:noexpandtab + */ + +#include <linux/kernel.h>
+#include <linux/init.h> +#include <linux/pci.h> +#include
<xen/interface/physdev.h> +#include <asm/xen/hypercall.h> +#include
<asm/xen/hypervisor.h> + +static int (*pci_bus_probe)(struct device
*dev); +static int (*pci_bus_remove)(struct device *dev);
I'd give these more distinctive names: orig_pci_bus_probe, or something,
and call them with (*orig_pci_bus_probe)(...) to make it obvious they're
function callers. I overlooked the calls the first couple of times because
they didn't stand out.
+static int pci_ari_enabled(struct pci_bus *bus) +{ + return bus->self
&& bus->self->ari_enabled; +}
Is this a generally useful predicate?
+ +static int pci_bus_probe_wrapper(struct device *dev) +{ + int r; +
struct pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci
manage_pci; + struct physdev_manage_pci_ext manage_pci_ext; + +#ifdef
CONFIG_PCI_IOV + if (pci_dev->is_virtfn) { + memset(&manage_pci_ext,
0, sizeof(manage_pci_ext)); + manage_pci_ext.bus =
pci_dev->bus->number; + manage_pci_ext.devfn = pci_dev->devfn; +
manage_pci_ext.is_virtfn = 1; + manage_pci_ext.physfn.bus =
pci_dev->physfn->bus->number; + manage_pci_ext.physfn.devfn =
pci_dev->physfn->devfn; + r =
HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, +
&manage_pci_ext); + } else +#endif + if (pci_ari_enabled(pci_dev->bus)
&& PCI_SLOT(pci_dev->devfn)) { + memset(&manage_pci_ext, 0,
sizeof(manage_pci_ext)); + manage_pci_ext.bus = pci_dev->bus->number;
+ manage_pci_ext.devfn = pci_dev->devfn; + manage_pci_ext.is_extfn =
1; + r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, +
&manage_pci_ext); + } else { + manage_pci.bus = pci_dev->bus->number;
+ manage_pci.devfn = pci_dev->devfn; + r =
HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add, + &manage_pci); + } +
if (r && r != -ENOSYS) + return r;
Why is ENOSYS OK? Doesn't it mean that Xen is missing some aspect of MSI
support?
+ + r = pci_bus_probe(dev); + return r; +} + +static int
pci_bus_remove_wrapper(struct device *dev) +{ + int r; + struct
pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci
manage_pci; + manage_pci.bus = pci_dev->bus->number; +
manage_pci.devfn = pci_dev->devfn; + + r = pci_bus_remove(dev); + /*
dev and pci_dev are no longer valid!! */ + +
WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, +
&manage_pci));
I prefer:
if (... != 0)
WARN_ON(1);
to make it obvious that we're doing an error check, rather than some
internal consistency assertion.
+ return r; +} + +static int __init hook_pci_bus(void) +{ + + if
(!xen_domain()) + return 0; + + pci_bus_probe = pci_bus_type.probe; +
pci_bus_type.probe = pci_bus_probe_wrapper; + + pci_bus_remove =
pci_bus_type.remove; + pci_bus_type.remove = pci_bus_remove_wrapper; +
+ return 0; +} + +core_initcall(hook_pci_bus);
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|