|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 1/2] amd/iommu: Always atomically update DTE
Le 17/11/2025 à 14:05, Teddy Astie a écrit :
> Le 13/11/2025 à 12:38, Jan Beulich a écrit :
>> On 12.11.2025 16:37, Teddy Astie wrote:
>>> amd_iommu_set_root_page_table chooses between updating atomically
>>> and non-atomically depending on whether the DTE is active or not.
>>>
>>> This choice existed mostly because cx16 wasn't supposed always available
>>> until [1]. Thus we don't need to threat the non-atomic path in a special
>>> way anymore.
>>>
>>> By rearranging slightly the atomic path, we can make it cover all the
>>> cases
>>> which improves the code generation at the expense of systematically
>>> performing
>>> cmpxchg16b.
>>>
>>> Also remove unused raw64 fields of ldte, and the deprecated comment
>>> as the
>>> function actually behaves in a more usual way and can't return >0.
>>>
>>> [1] 2636fcdc15c7 "x86/iommu: check for CMPXCHG16B when enabling IOMMU"
>>>
>>> Signed-off-by: Teddy Astie <teddy.astie@xxxxxxxxxx>
>>> ---
>>> xen/drivers/passthrough/amd/iommu_map.c | 78 ++++++++-----------------
>>> 1 file changed, 25 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/
>>> passthrough/amd/iommu_map.c
>>> index 320a2dc64c..e3165d93aa 100644
>>> --- a/xen/drivers/passthrough/amd/iommu_map.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_map.c
>>> @@ -154,69 +154,41 @@ static void set_iommu_ptes_present(unsigned
>>> long pt_mfn,
>>> unmap_domain_page(table);
>>> }
>>> -/*
>>> - * This function returns
>>> - * - -errno for errors,
>>> - * - 0 for a successful update, atomic when necessary
>>> - * - 1 for a successful but non-atomic update, which may need to be
>>> warned
>>> - * about by the caller.
>>> - */
>>> int amd_iommu_set_root_page_table(struct amd_iommu_dte *dte,
>>> uint64_t root_ptr, uint16_t
>>> domain_id,
>>> uint8_t paging_mode, unsigned int
>>> flags)
>>> {
>>> bool valid = flags & SET_ROOT_VALID;
>>> - if ( dte->v && dte->tv )
>>> - {
>>> - union {
>>> - struct amd_iommu_dte dte;
>>> - uint64_t raw64[4];
>>> - __uint128_t raw128[2];
>>> - } ldte = { .dte = *dte };
>>> - __uint128_t res, old = ldte.raw128[0];
>>> - int ret = 0;
>>> -
>>> - ldte.dte.domain_id = domain_id;
>>> - ldte.dte.pt_root = paddr_to_pfn(root_ptr);
>>> - ldte.dte.iw = true;
>>> - ldte.dte.ir = true;
>>> - ldte.dte.paging_mode = paging_mode;
>>> - ldte.dte.v = valid;
>>> -
>>> - res = cmpxchg16b(dte, &old, &ldte.raw128[0]);
>>> -
>>> - /*
>>> - * Hardware does not update the DTE behind our backs, so the
>>> - * return value should match "old".
>>> - */
>>> - if ( res != old )
>>> - {
>>> - printk(XENLOG_ERR
>>> - "Dom%d: unexpected DTE %016lx_%016lx (expected
>>> %016lx_%016lx)\n",
>>> - domain_id,
>>> - (uint64_t)(res >> 64), (uint64_t)res,
>>> - (uint64_t)(old >> 64), (uint64_t)old);
>>> - ret = -EILSEQ;
>>> - }
>>> + union {
>>> + struct amd_iommu_dte dte;
>>> + __uint128_t raw128[2];
>>> + } ldte = { .dte = *dte };
>>> + __uint128_t res, old = ldte.raw128[0];
>>> - return ret;
>>> - }
>>> + ldte.dte.domain_id = domain_id;
>>> + ldte.dte.pt_root = paddr_to_pfn(root_ptr);
>>> + ldte.dte.iw = true;
>>> + ldte.dte.ir = true;
>>> + ldte.dte.paging_mode = paging_mode;
>>> + ldte.dte.tv = true;
>>> + ldte.dte.v = valid;
>>> +
>>> + res = cmpxchg16b(dte, &old, &ldte.raw128[0]);
>>> - if ( valid || dte->v )
>>> + /*
>>> + * Hardware does not update the DTE behind our backs, so the
>>> + * return value should match "old".
>>> + */
>>> + if ( res != old )
>>> {
>>> - dte->tv = false;
>>> - dte->v = true;
>>> - smp_wmb();
>>> + printk(XENLOG_ERR
>>> + "Dom%d: unexpected DTE %016lx_%016lx (expected
>>> %016lx_%016lx)\n",
>>> + domain_id,
>>> + (uint64_t)(res >> 64), (uint64_t)res,
>>> + (uint64_t)(old >> 64), (uint64_t)old);
>>
>> Indentation is now off by 1 here.
>>
>>> + return -EILSEQ;
>>
>> The downside of this is that all updates can now take this path. Yes,
>> this shouldn't
>> be possible to be taken, but it's a (minor) concern nevertheless. At
>> the very least
>> such a downside wants, imo, mentioning in the description, even if for
>> nothing else
>> than to make clear that it was a deliberate choice.
>>
>
> Apparently it doesn't build (debian-bookworm-gcc-arm32-randconfig
> catched it).
> ARM does provide MAX_VIRT_CPUS and GUEST_MAX_VCPUS which is 128 or
> lower, but that doesn't map (or not properly) with what we have in x86
> (MAX_VIRT_CPUS=8192 is PV-specific, and GUEST_MAX_VCPUS doesn't exist).
>
> I am not sure what to do, looks like many things are redundant here.
>
>> Jan
>>
>
Oops, sent the wrong message. Please ignore.
--
Teddy Astie | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |