>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
>Sent: Thursday, August 26, 2010 3:07 PM
>To: Jiang, Yunhong
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Konrad Rzeszutek Wilk
>Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X
>pending bit array
>>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>>>[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich
>>>An alternative would be to determine and insert the address ranges
>>>earlier into mmio_ro_ranges, but that would require a hook in the
>>>PCI config space writes, which is particularly problematic in case
>>>MMCONFIG accesses are being used.
>> I noticed you stated in your previous mail that this should be done in
>> hypervisor, not tools. Is it because tools is not trusted by xen hypervisor?
>> If tools can be trusted, is it possible to achieve this in tools: Tools tell
>> xen hypervisor the MMIO range that is read-only to this guest after the guest
>> is created, but before the domain is unpaused.
>Doing this from the tools would have the unfortunate side effect that
>Dom0 (or the associated stubdom) would continue to need special
>casing, which really it shouldn't for this purpose (and all the special
>casing in the patch is really just because qemu wants to map writably
>the ranges in question, and this can only be removed after the
>hypervisor handling changed).
Agree that doing this from tools can't remove dom0's write access.
Maybe we can combine this together. Tools for normal guest, while your change
to msix_capability_init() is for dom0, the flow is followed:
1) When xen boot, xen will scan all PCI hierarchy, and get the MSI-X address.
Mark that range to dom0's mmio_ro_range. (Maybe a seperetaed list is needed to
track which MMIO range to which device).
2) When a MSI-x interrupt is started, we check the corresponding BAR to see if
the range is changed by dom0. If yes, update dom0's mmio_ro_range. We can also
check if the assigned domain's mmio_ro_range cover this.
3) When tools create domain, tools will remove the mmio range for the guest.
A bit over-complicated?
>Another aspect is that of necessity of denying write access to these
>ranges: The guest must be prevented writing to them only once at
Although preventing writing access is only needed when MSI-x interrupt enabled,
but as your patch stated, we need consider how to handle mapping setup already
before the MSI-x enabled (In fact, we are working on removing setup-already
mapping for MCA purpose, although not sure if it is accepetable by upstream).
Removing the access from beginning will avoid such clean-already-mapped
>least one MSI-X interrupt got enabled. Plus (while not the case with
>the current Linux implementation) a (non-Linux or future Linux)
>version may choose to (re-)assign device resources only when the
>device gets enabled, which would be after the guest was already
I'm a bit confused. Are you talking about guest re-assign device resources or
If you are talking about guest, I think that's easy to handle , and we should
anyway do that. Especially it should not impact physical resource.
If you are talking aboud dom0, I can't think out while guest enable device will
cause re-assignment in dom0.
>>>A second alternative would be to require Dom0 to report all devices
>>>(or at least all MSI-X capable ones) regardless of whether they would
>>>be used by that domain, and do so after resources got determined/
>>>assigned for them (i.e. a second notification later than the one
>>>currently happening from the PCI bus scan would be needed).
>> Currently Xen knows about the PCI device's resource assignment already when
>> system boot, since Xen have PCI information. The only situations that Xen may
>> have no idea includes: a) Dom0 re-assign the device resource, may because of
>> resource balance; b) VF device for SR-IOV.
>> I think for situation a, IIRC, xen hypervisor can't handle it, because that
>> means all shadow need be rebuild, the MSI information need be updated etc.
>Shadows and p2m table need updating in this case, but MSI information
>doesn't afaict (it gets proagated only when the first interrupt is being
Maybe I didn't state my point clearly. What I mean is, since xen hypervisor
knows about PCI hierarchy, they can setup the mmio_ro_range when xen boot. The
only concerns for this method is situation a and situation b, which will update
the PCI device's resource assignment, while xen hypervisor have no idea and
can't update mmio_ro_range.
>>>@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v,
>>> /* Read-only memory */
>>>- if ( p2m_is_readonly(p2mt) )
>>>+ if ( p2m_is_readonly(p2mt) ||
>>>+ (p2mt == p2m_mmio_direct &&
>>> sflags &= ~_PAGE_RW;
>> Would it have performance impact if too much mmio rangeset and we need
>> search it for each l1 entry update? Or you assume this range will not be
>> updated so frequently?
>Yes, performance wise this isn't nice.
>> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand
>> the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but
>> at least it works for x86_64, and some wrap function with #ifdef X86_64 can
>> handle the difference.
>That would be a possible alternative, but I'm afraid I wouldn't dare to
>make such a change.
Which change? Would following code works without much issue?
if (p2m_is_readonly(p2mt) || p2m_is_ro_mmio(p2mt, mfn_t(target_mfn)))
sflags &= ~_PAGE_RW;
#define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)
#define p2m_is_ro_mmio() 0
#define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)
#define p2m_is_ro_mmio(_t, mfn) (p2mt == p2m_mmio_direct &&
#define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPE))
>>>@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_
>>> return 0;
>>>- status = msix_capability_init(pdev, msi, desc);
>>>+ status = msix_capability_init(pdev, msi, desc, nr_entries);
>>> return status;
>> As stated above, would it be possible to achive this in tools?
>> Also if it is possible to place the mmio_ro_ranges to be per domain
>As said above, doing it in the tools has down sides, but if done
>there and if excluding/special-casing Dom0 (or the servicing
>stubdom) it should be possible to make the ranges per-domain.
I agree that dom0's writing access should also be avoided . The concern for
global mmio_ro_ranges is, the ranges maybe a bit big, if several SR-IOV card
populated, each support several VF. But I have no data how bit impact would it
Xen-devel mailing list