RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI


>-----Original Message-----
>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
>Sent: Thursday, August 26, 2010 3:07 PM
>To: Jiang, Yunhong
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Konrad Rzeszutek Wilk
>Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X 
>table and
>pending bit array
>
>>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>>>From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>>[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich
>>>An alternative would be to determine and insert the address ranges
>>>earlier into mmio_ro_ranges, but that would require a hook in the
>>>PCI config space writes, which is particularly problematic in case
>>>MMCONFIG accesses are being used.
>>
>> I noticed you stated in your previous mail that this should be done in
>> hypervisor, not tools. Is it because tools is not trusted by xen hypervisor?
>> If tools can be trusted, is it possible to achieve this in tools: Tools tell
>> xen hypervisor the MMIO range that is read-only to this guest after the guest
>> is created, but before the domain is unpaused.
>
>Doing this from the tools would have the unfortunate side effect that
>Dom0 (or the associated stubdom) would continue to need special
>casing, which really it shouldn't for this purpose (and all the special
>casing in the patch is really just because qemu wants to map writably
>the ranges in question, and this can only be removed after the
>hypervisor handling changed).

Agree that doing this from tools can't remove dom0's write access.
Maybe we can combine this together. Tools for normal guest, while your change 
to msix_capability_init() is for dom0, the flow is followed:
1) When xen boot, xen will scan all PCI hierarchy, and get the MSI-X address. 
Mark that range to dom0's mmio_ro_range. (Maybe a seperetaed list is needed to 
track which MMIO range to which device).
2) When a MSI-x interrupt is started, we check the corresponding BAR to see if 
the range is changed by dom0. If yes, update dom0's mmio_ro_range.  We can also 
check if the assigned domain's mmio_ro_range cover this.
3) When tools create domain, tools will remove the mmio range for the guest.

A bit over-complicated?

>
>Another aspect is that of necessity of denying write access to these
>ranges: The guest must be prevented writing to them only once at

Although preventing writing access is only needed when MSI-x interrupt enabled, 
but as your patch stated, we need consider how to handle mapping setup already 
before the MSI-x enabled (In fact, we are working on removing setup-already 
mapping for MCA purpose, although not sure if it is accepetable by upstream). 
Removing the access from beginning will avoid such clean-already-mapped 
requirement.

>least one MSI-X interrupt got enabled. Plus (while not the case with
>the current Linux implementation) a (non-Linux or future Linux)
>version may choose to (re-)assign device resources only when the
>device gets enabled, which would be after the guest was already
>launched.

I'm a bit confused. Are you talking about guest re-assign device resources or 
dom0?
If you are talking about guest, I think that's easy to handle , and we should 
anyway do that. Especially it should not impact physical resource.
If you are talking aboud dom0, I can't think out while guest enable device will 
cause re-assignment in dom0.

>
>>>A second alternative would be to require Dom0 to report all devices
>>>(or at least all MSI-X capable ones) regardless of whether they would
>>>be used by that domain, and do so after resources got determined/
>>>assigned for them (i.e. a second notification later than the one
>>>currently happening from the PCI bus scan would be needed).
>>
>> Currently Xen knows about the PCI device's resource assignment already when
>> system boot, since Xen have PCI information. The only situations that Xen may
>> have no idea includes: a) Dom0 re-assign the device resource, may because of
>> resource balance; b) VF device for SR-IOV.
>>
>> I think for situation a, IIRC, xen hypervisor can't handle it, because that
>> means all shadow need be rebuild, the MSI information need be updated etc.
>
>Shadows and p2m table need updating in this case, but MSI information
>doesn't afaict (it gets proagated only when the first interrupt is being
>set up).

Maybe I didn't state my point clearly. What I mean is, since xen hypervisor 
knows about PCI hierarchy, they can setup the mmio_ro_range when xen boot. The 
only concerns for this method is situation a and situation b, which will update 
the PCI device's resource assignment, while xen hypervisor have no idea and 
can't update mmio_ro_range.

>>>@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v,
>>>     }
>>>
>>>     /* Read-only memory */
>>>-    if ( p2m_is_readonly(p2mt) )
>>>+    if ( p2m_is_readonly(p2mt) ||
>>>+         (p2mt == p2m_mmio_direct &&
>>>+          rangeset_contains_singleton(mmio_ro_ranges,
>mfn_x(target_mfn))) )
>>>         sflags &= ~_PAGE_RW;
>>
>> Would it have performance impact if too much mmio rangeset and we need
>> search it for each l1 entry update? Or you assume this range will not be
>> updated so frequently?
>
>Yes, performance wise this isn't nice.
>
>> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand
>> the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but
>> at least it works for x86_64, and some wrap function with #ifdef X86_64 can
>> handle the difference.
>
>That would be a possible alternative, but I'm afraid I wouldn't dare to
>make such a change.

Which change? Would following code works without much issue?

 if (p2m_is_readonly(p2mt) || p2m_is_ro_mmio(p2mt, mfn_t(target_mfn)))
        sflags &= ~_PAGE_RW;

#ifdef __x86_64
#define P2M_RO_TYPES  (p2m_to_mask(p2m_ram_logdirty) 
|....|p2m_to_mask(p2m_ram_shared)|p2m_to_mask(p2m_mmio_ro)
#define p2m_is_ro_mmio() 0
#else
#define P2M_RO_TYPES  (p2m_to_mask(p2m_ram_logdirty) 
|....|p2m_to_mask(p2m_ram_shared)
#define p2m_is_ro_mmio(_t, mfn)  (p2mt == p2m_mmio_direct  && 
(rangeset_contains_singleton(mmio_ro_ranges,>mfn_x(target_mfn)))
#endif

#define p2m_is_readonly(_t)  (p2m_to_mask(_t) & P2M_RO_TYPE))


>
>>>@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_
>>>         return 0;
>>>     }
>>>
>>>-    status = msix_capability_init(pdev, msi, desc);
>>>+    status = msix_capability_init(pdev, msi, desc, nr_entries);
>>>     return status;
>>> }
>>
>> As stated above, would it be possible to achive this in tools?
>> Also if it is possible to place the mmio_ro_ranges to be per domain
>> structure?
>
>As said above, doing it in the tools has down sides, but if done
>there and if excluding/special-casing Dom0 (or the servicing
>stubdom) it should be possible to make the ranges per-domain.

I agree that dom0's writing access should also be avoided . The concern for 
global mmio_ro_ranges is, the ranges maybe a bit big, if several SR-IOV card 
populated, each support several VF. But I have no data how bit impact would it 
be.

Thanks
--jyh

>
>Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table