>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
>Sent: Thursday, August 26, 2010 7:23 PM
>To: Jiang, Yunhong
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; KonradRzeszutek Wilk
>Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X
>pending bit array
>>>> On 26.08.10 at 10:41, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>>>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
>>>>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>>>>>[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich
>>>>>An alternative would be to determine and insert the address ranges
>>>>>earlier into mmio_ro_ranges, but that would require a hook in the
>>>>>PCI config space writes, which is particularly problematic in case
>>>>>MMCONFIG accesses are being used.
>>>> I noticed you stated in your previous mail that this should be done in
>>>> hypervisor, not tools. Is it because tools is not trusted by xen
>>>> If tools can be trusted, is it possible to achieve this in tools: Tools
>>>> xen hypervisor the MMIO range that is read-only to this guest after the
>>>> is created, but before the domain is unpaused.
>>>Doing this from the tools would have the unfortunate side effect that
>>>Dom0 (or the associated stubdom) would continue to need special
>>>casing, which really it shouldn't for this purpose (and all the special
>>>casing in the patch is really just because qemu wants to map writably
>>>the ranges in question, and this can only be removed after the
>>>hypervisor handling changed).
>> Agree that doing this from tools can't remove dom0's write access.
>> Maybe we can combine this together. Tools for normal guest, while your
>> change to msix_capability_init() is for dom0, the flow is followed:
>> 1) When xen boot, xen will scan all PCI hierarchy, and get the MSI-X address.
>> Mark that range to dom0's mmio_ro_range. (Maybe a seperetaed list is needed
>> to track which MMIO range to which device).
>That won't work for any devices that BIOS didn't assign resources to.
>> 2) When a MSI-x interrupt is started, we check the corresponding BAR to see
>> if the range is changed by dom0. If yes, update dom0's mmio_ro_range. We can
>> also check if the assigned domain's mmio_ro_range cover this.
>> 3) When tools create domain, tools will remove the mmio range for the guest.
>This (and therefore 2 above) won't work: You must not disallow the
>guest access to this space altogether - it may validly want to read the
>PBA. Remember that the code to remove pv guests' access to these
>two ranges altogether got reverted due to causing problems with
>real world devices/drivers.
Oops, clearly I wanted to say "add the mmio_ro_range".
>Also, if the check would be done only when the interrupt is being
>started, we would still have the problem of potentially needing to
>change existing mappings.
>>>least one MSI-X interrupt got enabled. Plus (while not the case with
>>>the current Linux implementation) a (non-Linux or future Linux)
>>>version may choose to (re-)assign device resources only when the
>>>device gets enabled, which would be after the guest was already
>> I'm a bit confused. Are you talking about guest re-assign device resources or
>> If you are talking about guest, I think that's easy to handle , and we
>> should anyway do that. Especially it should not impact physical resource.
>> If you are talking aboud dom0, I can't think out while guest enable device
>> will cause re-assignment in dom0.
>Because pciback does the actual enabling on behalf of the guest.
>Any resource adjustments done when memory decoding gets
>enabled won't be known at the time the guest starts.
Does this resource adjustment work in current pciback/hypervisor? At least it
means we need remove the old iomem permission/mapping and add the new iomem
permission, although not sure if any other impact. And if we want to add this
support to pciback/hypervisor, we can of course cover the MSI-x read-only
>>>Shadows and p2m table need updating in this case, but MSI information
>>>doesn't afaict (it gets proagated only when the first interrupt is being
>> Maybe I didn't state my point clearly. What I mean is, since xen hypervisor
>> knows about PCI hierarchy, they can setup the mmio_ro_range when xen boot.
>> The only concerns for this method is situation a and situation b, which will
>> update the PCI device's resource assignment, while xen hypervisor have no
>> idea and can't update mmio_ro_range.
>Again - knowing about the PCI hierarchy doesn't mean knowing about
>all resources. One option clearly is to require a (new) callout from Dom0
>when any resources get adjusted. What I don't like about this is that
>all existing Dom0 kernels would need fixing, i.e. I'd prefer a solution
>that is contained to hypervisor+tools.
If we trust dom0, we don't need care dom0's writing access.
If we don't trust dom0, and if hypervisor does not protect pci configuration
space access, this new callout, comparing with your patch, seems make no
Anyway, your patch do make great improvement, these issues (global list,
guest's existing mapping and checking in _sh_propgate) is not important, or can
be enhanced later, I'm glad to verify your patch on my system. Do you have any
updated patch, or I can simply use the one you sent out on Aug, 13?
>>>> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand
>>>> the p2m_is_readonly to cover this also? PAE xen may have trouble for it,
>>>> at least it works for x86_64, and some wrap function with #ifdef X86_64 can
>>>> handle the difference.
>>>That would be a possible alternative, but I'm afraid I wouldn't dare to
>>>make such a change.
>> Which change? Would following code works without much issue?
>I don't know. I would just be afraid that there are other places in
>the code checking explicitly (or, worse, implicitly) for p2m_mmio
>would need updating. And not knowing well both shadow and p2m
>code, that's not something I would want to do on my own.
A bit surprise to me. I thought you knows well every bit on xen.
>> I agree that dom0's writing access should also be avoided . The concern for
>> global mmio_ro_ranges is, the ranges maybe a bit big, if several SR-IOV card
>> populated, each support several VF. But I have no data how bit impact would
>> it be.
>A potential later optimization for this would be to make Dom0 try
>co-locate all these regions.
Xen-devel mailing list