WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox Connec

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
From: Ward Vandewege <ward@xxxxxxx>
Date: Fri, 28 Jan 2011 13:58:09 -0500
Delivery-date: Fri, 28 Jan 2011 10:59:50 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.20 (2009-06-14)
Hi list,

I'm having some problems trying to pass through a Mellaxnox ConnectX HCA
to a domU.

This is on Xen 4.0.1, with the latest Debian Testing packages:

  ii  xen-hypervisor-4.0-amd64                4.0.1-2  
  ii  linux-image-2.6.32-5-xen-amd64          2.6.32-30

The hardware is Supermicro H8DGT-HIBQF, BIOS revision 1.0c (date 10/29/10).
It has two AMD Opteron 6128 CPUs, for a total of 16 cores. The machine has
32GiB of ram. The Mellannox adapter looks like this in the dom0:

  02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 
5GT/s - IB QDR / 10GigE] (rev b0)
    Subsystem: Super Micro Computer Inc Device 0048
    Flags: fast devsel, IRQ 19
    Memory at fea00000 (64-bit, non-prefetchable) [size=1M]
    Memory at fc800000 (64-bit, prefetchable) [size=8M]
    Capabilities: [40] Power Management version 3
    Capabilities: [48] Vital Product Data
    Capabilities: [9c] MSI-X: Enable- Count=256 Masked-
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
    Kernel driver in use: pciback

I've attached the output of xm dmesg (xm.dmesg.txt).

I have the following in the domU config files:

  pci = ['0000:02:00.0'] 

I've attached the boot log from trying to boot the same kernel as a HVM guest
(testsqueezehvm.bootlog.txt). Doing so generates these four lines of output
in xm dmesg:

(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c0c0

The mlx4_core driver in the domU is not happy:

[    0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
[    0.411879] mlx4_core: Initializing 0000:00:00.0
[    0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command
interface revision 0.
[    1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
[    1.417527] mlx4_core 0000:00:00.0: This driver version supports only
revisions 2 to 3.
[    1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.

When trying to boot a PV domU with kernel options iommu=soft and
swiotlb=force, the output is slightly different. The full bootlog is attached
(testsqueeze.bootlog.txt). Here's the relevant excerpt:

[    0.441684] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2
(August 4, 2010)
[    0.441696] mlx4_core: Initializing 0000:00:00.0
[    0.442044] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.442741] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    2.752125] mlx4_core 0000:00:00.0: NOP command failed to generate MSI-X
interrupt IRQ 54).
[    2.752158] mlx4_core 0000:00:00.0: Trying again without MSI-X.
[    2.884105] mlx4_core 0000:00:00.0: NOP command failed to generate
interrupt (IRQ 54), aborting.
[    2.884138] mlx4_core 0000:00:00.0: BIOS or ACPI interrupt routing
problem?
[    2.916920] mlx4_core: probe of 0000:00:00.0 failed with error -16

And xm dmesg quickly fills up with many, many lines like this:

(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43020
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43060
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430a0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430c0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430e0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43100
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43120
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43140
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43160
...

Booting a PV domU with only the swiotlb=force option makes the output much
more like the HVM output.

Any thoughts on what could be going on here?

Thanks,
Ward.


Attachment: xm.dmesg.txt
Description: Text document

Attachment: testsqueeze.bootlog.txt
Description: Text document

Attachment: testsqueezehvm.bootlog.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel