WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table

To: Jan Beulich <JBeulich@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and pending bit array
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Thu, 26 Aug 2010 14:24:57 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Delivery-date: Wed, 25 Aug 2010 23:26:30 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C6566B2020000780000FBD9@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C6566B2020000780000FBD9@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acs67Jaj2VVrArvyTFifEtcl9vRutgJ8W6eQ
Thread-topic: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and pending bit array

>-----Original Message-----
>From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich
>Sent: Friday, August 13, 2010 9:37 PM
>To: xen-devel@xxxxxxxxxxxxxxxxxxx
>Cc: Konrad Rzeszutek Wilk
>Subject: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table 
>and
>pending bit array
>
>Below/attached is an untested and possibly not yet complete patch
>attempting to address the problem originally described on this thread
>(can't really test this myself as I don't have, with one exception,
>any MSI-X capable devices around, and the exceptional one doesn't have
>a driver making use of it). I had sent it once before, it only got
>refreshed since then to build with xen-unstable as of c/s 21952, and
>I'm re-sending it mainly because there was no feedback so far, despite
>the original problem representing a security issue.

I setup a MSI-x NIC and can test the patch after it is finalized.

>
>It tracks MMIO MFNs required to only have read-only guest access
>(determined when the first MSI-X interrupt gets enabled on a device)
>in a global range set, and enforces the write protection as
>translations get established.
>
>The changes are made under the assumption that p2m_mmio_direct will
>only ever be used for order 0 pages.
>
>An open question is whether dealing with pv guests (including the
>IOMMU-less case) is necessary, as handling mappings a domain may
>already have in place at the time the first interrupt gets set up
>would require scanning all of the guest's page table pages.


>
>An alternative would be to determine and insert the address ranges
>earlier into mmio_ro_ranges, but that would require a hook in the
>PCI config space writes, which is particularly problematic in case
>MMCONFIG accesses are being used.

I noticed you stated in your previous mail that this should be done in 
hypervisor, not tools. Is it because tools is not trusted by xen hypervisor? 
If tools can be trusted, is it possible to achieve this in tools: Tools tell 
xen hypervisor the MMIO range that is read-only to this guest after the guest 
is created, but before the domain is unpaused.


>
>A second alternative would be to require Dom0 to report all devices
>(or at least all MSI-X capable ones) regardless of whether they would
>be used by that domain, and do so after resources got determined/
>assigned for them (i.e. a second notification later than the one
>currently happening from the PCI bus scan would be needed).

Currently Xen knows about the PCI device's resource assignment already when 
system boot, since Xen have PCI information. The only situations that Xen may 
have no idea includes: a) Dom0 re-assign the device resource, may because of 
resource balance; b) VF device for SR-IOV.

I think for situation a, IIRC, xen hypervisor can't handle it, because that 
means all shadow need be rebuild, the MSI information need be updated etc. For 
situation b, we can do it when the VF device is assigned to guest.

Still, I think tools can achive this more easily if it is trusted.

>
>(Attached is also a trivial prerequisite patch.)
>
>Jan
>

......

>--- 2010-08-12.orig/xen/arch/x86/mm/shadow/multi.c     2010-08-12
>17:36:43.000000000 +0200
>+++ 2010-08-12/xen/arch/x86/mm/shadow/multi.c  2010-08-12
>17:16:32.000000000 +0200
>@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v,
>     }
>
>     /* Read-only memory */
>-    if ( p2m_is_readonly(p2mt) )
>+    if ( p2m_is_readonly(p2mt) ||
>+         (p2mt == p2m_mmio_direct &&
>+          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
>         sflags &= ~_PAGE_RW;

Would it have performance impact if too much mmio rangeset and we need search 
it for each l1 entry update? Or you assume this range will not be updated so 
frequently? 
I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand the 
p2m_is_readonly to cover this also? PAE xen may have trouble for it, but at 
least it works for x86_64, and some wrap function with #ifdef X86_64 can handle 
the difference.

BTW, I remember it has been discussed for a long time that PAE xen will be 
removed, but seems it still exist, are there any special reason for PAE Xen?

>
>     // protect guest page tables
>@@ -1204,15 +1206,19 @@ static int shadow_set_l1e(struct vcpu *v
>         /* About to install a new reference */
>         if ( shadow_mode_refcounts(d) ) {
>
>TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
>-            if ( shadow_get_page_from_l1e(new_sl1e, d, new_type) == 0 )
>+            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
>             {
>+            case 0:
>                 /* Doesn't look like a pagetable. */
>                 flags |= SHADOW_SET_ERROR;
>                 new_sl1e = shadow_l1e_empty();
>-            }
>-            else
>-            {
>+                break;
>+            case -1:
>+                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
>+                /* fall through */
>+            default:
>                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
>+                break;
>             }
>         }
>     }
>--- 2010-08-12.orig/xen/arch/x86/msi.c 2010-08-12 17:36:43.000000000 +0200
>+++ 2010-08-12/xen/arch/x86/msi.c      2010-08-12 18:09:43.000000000 +0200
>@@ -16,12 +16,14 @@
> #include <xen/errno.h>
> #include <xen/pci.h>
> #include <xen/pci_regs.h>
>+#include <xen/iocap.h>
> #include <xen/keyhandler.h>
> #include <asm/io.h>
> #include <asm/smp.h>
> #include <asm/desc.h>
> #include <asm/msi.h>
> #include <asm/fixmap.h>
>+#include <asm/p2m.h>
> #include <mach_apic.h>
> #include <io_ports.h>
> #include <public/physdev.h>
>@@ -520,6 +522,43 @@ static int msi_capability_init(struct pc
>     return 0;
> }
>
>+static u64 read_pci_mem_bar(u8 bus, u8 slot, u8 func, u8 bir)
>+{
>+    u8 limit;
>+    u32 addr;
>+
>+    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
>+    {
>+    case PCI_HEADER_TYPE_NORMAL:
>+        limit = 6;
>+        break;
>+    case PCI_HEADER_TYPE_BRIDGE:
>+        limit = 2;
>+        break;
>+    case PCI_HEADER_TYPE_CARDBUS:
>+        limit = 1;
>+        break;
>+    default:
>+        return 0;
>+    }
>+
>+    if ( bir >= limit )
>+        return 0;
>+    addr = pci_conf_read32(bus, slot, func, PCI_BASE_ADDRESS_0 + bir * 4);
>+    if ( (addr & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>+        return 0;
>+    if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
>PCI_BASE_ADDRESS_MEM_TYPE_64 )
>+    {
>+        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>+        if ( ++bir >= limit )
>+            return 0;
>+        return addr |
>+               ((u64)pci_conf_read32(bus, slot, func,
>+                                     PCI_BASE_ADDRESS_0 + bir * 4) <<
>32);
>+    }
>+    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>+}
>+
> /**
>  * msix_capability_init - configure device's MSI-X capability
>  * @dev: pointer to the pci_dev data structure of MSI-X device function
>@@ -532,7 +571,8 @@ static int msi_capability_init(struct pc
>  **/
> static int msix_capability_init(struct pci_dev *dev,
>                                 struct msi_info *msi,
>-                                struct msi_desc **desc)
>+                                struct msi_desc **desc,
>+                                unsigned int nr_entries)
> {
>     struct msi_desc *entry;
>     int pos;
>@@ -587,6 +627,69 @@ static int msix_capability_init(struct p
>
>     list_add_tail(&entry->list, &dev->msi_list);
>
>+    if ( !dev->msix_nr_entries )
>+    {
>+        u64 pba_paddr;
>+        u32 pba_offset;
>+
>+        ASSERT(!dev->msix_used_entries);
>+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
>+
>+        dev->msix_nr_entries = nr_entries;
>+        dev->msix_table.first = PFN_DOWN(table_paddr);
>+        dev->msix_table.last = PFN_DOWN(table_paddr +
>+                                        nr_entries *
>PCI_MSIX_ENTRY_SIZE - 1);
>+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges,
>dev->msix_table.first,
>+                                        dev->msix_table.last));
>+
>+        pba_offset = pci_conf_read32(bus, slot, func,
>+                                     msix_pba_offset_reg(pos));
>+        bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
>+        pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
>+        WARN_ON(!pba_paddr);
>+        pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
>+
>+        dev->msix_pba.first = PFN_DOWN(pba_paddr);
>+        dev->msix_pba.last = PFN_DOWN(pba_paddr +
>+                                      BITS_TO_LONGS(nr_entries) - 1);
>+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges,
>dev->msix_pba.first,
>+                                        dev->msix_pba.last));
>+
>+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
>+                                dev->msix_table.last) )
>+            WARN();
>+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
>+                                dev->msix_pba.last) )
>+            WARN();
>+printk("MSIX%02x:%02x.%x: table@(%lx,%lx), pba@(%lx,%lx)\n", bus, slot, func,
>+       dev->msix_table.first, dev->msix_table.last,
>+       dev->msix_pba.first, dev->msix_pba.last);//temp
>+
>+        if ( dev->domain )
>+            p2m_change_entry_type_global(p2m_get_hostp2m(dev->domain),
>+                                         p2m_mmio_direct,
>p2m_mmio_direct);
>+        if ( !dev->domain || !paging_mode_translate(dev->domain) )
>+        {
>+            struct domain *d = dev->domain;
>+
>+            if ( !d )
>+                for_each_domain(d)
>+                    if ( !paging_mode_translate(d) &&
>+                         (iomem_access_permitted(d, dev->msix_table.first,
>+                                                 dev->msix_table.last)
>||
>+                          iomem_access_permitted(d, dev->msix_pba.first,
>+                                                 dev->msix_pba.last)) )
>+                        break;
>+            if ( d )
>+            {
>+                /* XXX How to deal with existing mappings? */
>+            }
>+        }
>+    }
>+    WARN_ON(dev->msix_nr_entries != nr_entries);
>+    WARN_ON(dev->msix_table.first != (table_paddr >> PAGE_SHIFT));
>+    ++dev->msix_used_entries;
>+
>     /* Mask interrupt here */
>     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
>
>@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_
>         return 0;
>     }
>
>-    status = msix_capability_init(pdev, msi, desc);
>+    status = msix_capability_init(pdev, msi, desc, nr_entries);
>     return status;
> }

As stated above, would it be possible to achive this in tools?
Also if it is possible to place the mmio_ro_ranges to be per domain structure?


Thanks
--jyh


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel