|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges
On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote:
> >>> On 08.08.18 at 12:07, <roger.pau@xxxxxxxxxx> wrote:
> > Several people have reported hardware issues (malfunctioning USB
> > controllers) due to iommu page faults on Intel hardware. Those faults
> > are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
> > be worked around on VTd hardware by manually adding RMRR entries on
> > the command line, this is however limited to Intel hardware and quite
> > cumbersome to do.
> >
> > In order to solve those issues add a new dom0-iommu=reserved option
> > that identity maps all regions marked as reserved in the memory map.
>
> Considering that report we've had (yesterday? earlier today?), don't
> we need to go further and use the union of RMRRs and reserved
> regions? Iirc they had a case where an RMRR was not in a reserved
> region ...
AFAICT (and I could be reading the code wrong) RMRR regions not on
reserved regions are still correctly mapped to the guest.
> > --- a/docs/misc/xen-command-line.markdown
> > +++ b/docs/misc/xen-command-line.markdown
> > @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses
> > to that port.
> > >> Enable IOMMU debugging code (implies `verbose`).
> >
> > ### dom0-iommu
> > -> `= List of [ none | strict | relaxed | inclusive ]`
> > +> `= List of [ none | strict | relaxed | inclusive | reserved ]`
> >
> > * `none`: disables DMA remapping for Dom0.
> >
> > @@ -1233,6 +1233,15 @@ meaning:
> > option is only applicable to a PV Dom0 and is enabled by default on Intel
> > hardware.
> >
> > +* `reserved`: sets up DMA remapping for all the reserved regions in the
> > memory
> > + map for Dom0. Use this to work around firmware issues providing incorrect
> > + RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses
> > + for Dom0, all memory regions marked as reserved in the memory map that
> > don't
> > + overlap with any MMIO region from emulated devices will be identity
> > mapped.
> > + This option maps a subset of the memory that would be mapped when using
> > the
> > + `inclusive` option. This option is available to a PVH Dom0 and is
> > enabled by
> > + default on Intel hardware.
>
> The sub-options so far were quite clear in their meanings, but
> "dom0-iommu=reserved" might mean all sorts of things to me, but quite
> certainly not "map all reserved regions". "map-reserved" perhaps?
Then we should also have 'map-inclusive' for symmetry IMO.
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -404,6 +404,11 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const
> > struct domain *d,
> > return NULL;
> > }
> >
> > +bool vpci_mmcfg_address(const struct domain *d, paddr_t addr)
> > +{
> > + return vpci_mmcfg_find(d, addr);
> > +}
>
> I think the function name should have an "is_" somewhere, to make
> clear address is a noun here and not a verb.
>
> > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > @@ -256,6 +256,9 @@ static void __hwdom_init amd_iommu_hwdom_init(struct
> > domain *d)
> > /* Inclusive IOMMU mappings are disabled by default on AMD hardware. */
> > iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false
> > :
> > iommu_dom0_inclusive;
> > + /* Reserved IOMMU mappings are disabled by default on AMD hardware. */
> > + iommu_dom0_reserved = iommu_dom0_reserved == -1 ? false
> > + : iommu_dom0_reserved;
>
> Especially seeing these two together now, I think an if() each would
> be easier to read (not the least because of being shorter). To me,
> the main purpose of the conditional operator is to allow shortening
> simple if/else pairs, rather than lengthening simple if()-s.
>
> > @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
> > {
> > }
> >
> > +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned
> > long pfn,
> > + unsigned long max_pfn)
> > +{
> > + unsigned int i;
> > +
> > + /*
> > + * Ignore any address below 1MB, that's already identity mapped by the
> > + * domain builder for HVM.
> > + */
> > + if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
>
> Careful again here with the distinction between Dom0 and hwdom:
> The domain builder you refer to is - aiui - the in-Xen one, i.e. the
> one _only_ dealing with Dom0.
So this should instead be:
if ( (is_control_domain(d) && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
> > + /*
> > + * If dom0-strict mode is enabled or the guest type is PVH/HVM then
> > exclude
> > + * conventional RAM and let the common code map dom0's pages.
> > + */
> > + if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> > + (iommu_dom0_strict || is_hvm_domain(d)) )
> > + return false;
> > + if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> > + !iommu_dom0_reserved && !iommu_dom0_inclusive )
> > + return false;
> > + if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
> > + !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> > + !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> > + (!iommu_dom0_inclusive || pfn > max_pfn) )
> > + return false;
>
> As page_is_ram_type() is, especially on systems with many E820
> entries, not really cheap, I think at least a minimum amount of
> optimization is on order here - after all you do this for every
> single page of the system. Hence minimally the result of the first
> CONVENTIONAL and RESERVED queries should be latched and
> re-used (or "else" be used suitably). Ideally an approach would
> be used which involved just a single iteration through the E820
> map, but I realize this may be more than is feasible here.
This would be quite better if I could just fetch the type, then I
could add a switch (type) { ... and it would be better IMO.
> Furthermore I'm unconvinced the !page_is_ram_type() uses
> are really what you want: The function returning false means
> "don't know", not "no". Perhaps the function needs to be made
> return a tristate (yes, no, or part of it).
Right, that's why I have three different !page_is_ram_type checks in
the last branch of the if, so that I can make sure it's not one of the
previous types and also account for holes.
> > + /* Check that it doesn't overlap with the LAPIC */
> > + if ( has_vlapic(d) )
> > + {
> > + const struct vcpu *v;
> > +
> > + for_each_vcpu(d, v)
> > + if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
> > + return false;
> > + }
>
> I don't, btw, recall any code adjusting the IOMMU mappings in
> case the domain relocates its LAPIC. If you do the check above,
> wouldn't that other side too need taking care of?
Yes. I can add something later but this is already an issue if the
guest for example relocates the LAPIC over a RAM region.
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |