[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole



On 4/29/26 11:29, Roger Pau Monné wrote:
> On Fri, Mar 13, 2026 at 04:35:04PM +0000, Thierry Escande wrote:
>> The actual MMCONFIG size depends on the number of PCI buses available
>> which should be covered by ECAM. Possible options are 64MB, 128MB and
>> 256MB.
> 
> Are such values inherited from the real q35 impleemntation?
> 
> AFAICT the ACPI MCFG spec notes:
> 
> "The size of the memory mapped configuration region is indicated by
> the start and end bus number fields in the Memory mapped Enhanced
> configuration space base address allocation structure as shown in
> Table 4-3. 0-255 is the range of allowed bus numbers supported for a
> given PCI Segment Group."
> 
> So it's in principle possible to specify a MCFG that covers a single
> bus, and then it would have a size of 256 * 4K = 1M.  Which avoids
> wasting 63M of MMIO space in the low MMIO hole that's already fairly
> tight on space.
> 
> Is this limitation possibly inherited from the way the ECAM region
> position and size must be notified to the chipset?
> 
> And further seeing the code below - I found the answer myself, it's
> because the chipset only supports negotiation those ECAM sizes.  We
> could possibly expose a smaller region in MCFG, but doesn't seem like
> a good move.

Yes indeed, it's a waste of space in MMIO hole. Maybe it's ok to only
allocate the 1MB needed for 1 bus since there would be no reason for a
guest to access passed the 1st MB. I can give it a try but that's
possibly risky...

> 
>> As Xen is limited to the bus 0 currently, the lowest possible
>> setting is used (64MB), defined via PCI_MAX_MCFG_BUSES in
>> hvmloader/config.h. When multiple PCI buses support for Xen will be
>> implemented, PCI_MAX_MCFG_BUSES may be replaced by a calculation of the
>> number of buses according to PCI devices enumeration.
>>
>> The MMCONFIG entry is inserted into bars array in the same manner like
>> for any other BARs. In this case, the devfn field will point to MCH PCI
>> device and bar_reg will contain PCIEXBAR register offset. It will be
>> assigned a slot in the MMIO hole later in a very same way like for plain
>> PCI BARs, with respect to its size and alignment. At this point, the
>> actual base address and size of the ECAM space are passed to Xen using
>> the HVMOP_set_ecam_space hypercall.
>>
>> Signed-off-by: Alexey Gerasimenko <x1917x@xxxxxxxxx>
>> Signed-off-by: Thierry Escande <thierry.escande@xxxxxxxxxx>
>> ---
>>  tools/firmware/hvmloader/config.h   |  4 +++
>>  tools/firmware/hvmloader/pci.c      | 55 +++++++++++++++++++++++++++++
>>  tools/firmware/hvmloader/pci_regs.h |  7 ++++
>>  3 files changed, 66 insertions(+)
>>
>> diff --git a/tools/firmware/hvmloader/config.h 
>> b/tools/firmware/hvmloader/config.h
>> index baaed91c7f..aa3158bca5 100644
>> --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -55,6 +55,10 @@ extern uint32_t *cpu_to_apicid;
>>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
>> +
>> +/* possible values are: 64, 128, 256 */
>> +#define PCI_MAX_MCFG_BUSES  64
>>  
>>  #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
>>  
>> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
>> index 6e6720adae..54c23ffdd8 100644
>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -413,6 +413,58 @@ void pci_setup(void)
>>          pci_devfn_decode_type[devfn] |= PCI_COMMAND_MASTER;
>>      }
>>  
>> +    /*
>> +     *  Calculate MMCONFIG area size and squeeze it into the bars array
>> +     *  for assigning a slot in the MMIO hole
>> +     */
>> +    if ( is_running_on_q35 )
>> +    {
>> +        /* disable PCIEXBAR decoding for now */
>> +        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
>> +        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);
>> +
>> +        switch ( PCI_MAX_MCFG_BUSES )
>> +        {
>> +        case 64:
>> +            bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
>> +            bar_sz = MB(64);
>> +            break;
>> +
>> +        case 128:
>> +            bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
>> +            bar_sz = MB(128);
>> +            break;
>> +
>> +        case 256:
>> +            bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
>> +            bar_sz = MB(256);
>> +            break;
>> +
>> +        default:
>> +            /* unsupported number of buses specified */
>> +            BUG();
>> +        }
>> +
>> +        addr_mask = ~(bar_sz - 1);
>> +
>> +        for ( i = 0; i < nr_bars; i++ )
>> +            if ( bars[i].bar_sz < bar_sz )
>> +                break;
>> +
>> +        if ( i != nr_bars )
>> +            memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
>> +
>> +        bars[i].is_mem    = 1;
>> +        bars[i].devfn     = PCI_MCH_DEVFN;
>> +        bars[i].bar_reg   = PCI_MCH_PCIEXBAR;
>> +        bars[i].bar_sz    = bar_sz;
>> +        bars[i].addr_mask = addr_mask;
>> +        bars[i].bar_data  = bar_data;
>> +
>> +        mmio_total += bar_sz;
>> +        nr_bars++;
>> +    }
> 
> I think it might be best if the ECAM fake BAR is the first element in
> the bars array, so we ensure it's the first item to consume memory
> from the low MMIO hole.  Not sure how that will work with the current
> sorting of the resources based on their size, but it's imperative for
> hvmloader to attempt to position ECAM ahead of the other device
> resources IMO.

With a size of 64MB it's always placed first from what I can tell. I
don't get why it is imperative. Would it be to make sure that it is
actually allocated in the first 4GB?

> 
>> +
>>      if ( mmio_hole_size )
>>      {
>>          uint64_t max_ram_below_4g = GB(4) - mmio_hole_size;
>> @@ -592,6 +644,9 @@ void pci_setup(void)
>>              }
>>          }
>>  
>> +        if ( bar_reg == PCI_MCH_PCIEXBAR )
>> +            hvm_set_ecam_space(base, bar_sz);
> 
> As noted in a previous patch, it would be better if it's QEMU (as part
> of handling the PCI_MCH_PCIEXBAR writes) that notifies Xen of the ECAM
> window placement.

Yes, as mentioned earlier I can do it that way.

Regards,


--
Thierry Escande | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.