[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [for 4.22 v5 10/18] xen/riscv: implement p2m_set_range()


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 17 Nov 2025 09:56:27 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 17 Nov 2025 08:56:37 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.11.2025 18:04, Oleksii Kurochko wrote:
> On 11/10/25 3:53 PM, Jan Beulich wrote:
>> On 20.10.2025 17:57, Oleksii Kurochko wrote:
>>> +#define GFN_MASK(lvl) (P2M_PAGETABLE_ENTRIES(lvl) - 1UL)
>> If I'm not mistaken, this is a mask with the low 10 or 12 bits set.
> 
> I'm not sure I fully understand you here. With the current implementation,
> it returns a bitmask that corresponds to the number of index bits used
> at each level. So, if|P2M_ROOT_LEVEL = 2|, then:
>    |G||FN_MASK(0) = 0x1ff| (9-bit GFN for the level 0)
>    |GFN_MASK(1) = 0x1ff| (9-bit GFN width for level 1)
>    |GFN_MASK(2) = 0x7ff| (11-bit GFN width for level 2)

Oh, sorry, 9 and 11 bits is what I meant.

> Or do you mean that GFN_MASK(lvl) should return something like this:
>    |G||FN_MASK_(0) = 0x1FF000 (0x1ff << 0xc) GFN_MASK_(1) = 0x3FE00000 
> (GFN_MASK_(0)<<9) GFN_MASK_(2) = 0x1FFC0000000 (GFN_MASK_(1)<<9 + extra 
> 2 bits)

Yes.

> And then here ...|
> 
>> That's not really something you can apply to a GFN, unlike the name
>> suggests.
> 
> That is why virtual address should be properly shifted before, something
> like it is done in calc_offset():

Please can we stop calling guest physical addresses "virtual address"?

>    (va >> P2M_LEVEL_SHIFT(lvl)) & GFN_MASK(lvl);
> 
> ...
>   (va & GFN_MASK_(lvl)) >> P2M_LEVEL_SHIFT(lvl) ?
> In this option more shifts will be needed.

It's okay to try to limit the number of shifts needed, but the macros need
naming accordingly.

> Would it be better to just rename GFN_MASK() to P2M_PT_INDEX_MASK()? Or,
> maybe, even just P2M_INDEX_MASK().

Perhaps. I would recommend though that you take a looks at other ports'
naming. In x86, for example, we have l<N>_table_offset().

>>> --- a/xen/arch/riscv/p2m.c
>>> +++ b/xen/arch/riscv/p2m.c
>>> @@ -9,6 +9,7 @@
>>>   #include <xen/rwlock.h>
>>>   #include <xen/sched.h>
>>>   #include <xen/sections.h>
>>> +#include <xen/xvmalloc.h>
>>>   
>>>   #include <asm/csr.h>
>>>   #include <asm/flushtlb.h>
>>> @@ -17,6 +18,43 @@
>>>   #include <asm/vmid.h>
>>>   
>>>   unsigned char __ro_after_init gstage_mode;
>>> +unsigned int __ro_after_init gstage_root_level;
>> Like for mode, I'm unconvinced of this being a global (and not per-P2M /
>> per-domain).
> 
> The question is then if we really will (or want to) have cases when gstage
> mode will be different per-domain/per-p2m?

Can you explain to me why you think we wouldn't want that, sooner or later?

>>> +/*
>>> + * The P2M root page table is extended by 2 bits, making its size 16KB
>>> + * (instead of 4KB for non-root page tables). Therefore, P2M root page
>>> + * is allocated as four consecutive 4KB pages (since alloc_domheap_pages()
>>> + * only allocates 4KB pages).
>>> + */
>>> +#define ENTRIES_PER_ROOT_PAGE \
>>> +    (P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL) / P2M_ROOT_ORDER)
>>> +
>>> +static inline unsigned int calc_offset(unsigned int lvl, vaddr_t va)
>> Where would a vaddr_t come from here? Your input are guest-physical 
>> addresses,
>> if I'm not mistaken.
> 
> You are right. Would it be right to 'paddr_t gpa' here? Or paddr_t is 
> supposed to use
> only with machine physical address?

In x86 we use paddr_t in such cases. Arm iirc additionally has gaddr_t.

>>> +#define P2M_MAX_ROOT_LEVEL 4
>>> +
>>> +#define P2M_DECLARE_OFFSETS(var, addr) \
>>> +    unsigned int var[P2M_MAX_ROOT_LEVEL] = {-1};\
>>> +    for ( unsigned int i = 0; i <= gstage_root_level; i++ ) \
>>> +        var[i] = calc_offset(i, addr);
>> This surely is more than just "declare", and it's dealing with all levels no
>> matter whether you actually will use all offsets.
> 
> I will rename|P2M_DECLARE_OFFSETS| to|P2M_BUILD_LEVEL_OFFSETS()|.
> 
> But how can I know which offset I will actually need to use?
> If we take the following loop as an example:
>    |for( level = P2M_ROOT_LEVEL; level > target; level-- ) { ||/* ||* Don't 
> try to allocate intermediate page tables if the mapping ||* is about to be 
> removed. ||*/ ||rc = p2m_next_level(p2m, !removing_mapping, ||level, &table, 
> offsets[level]); ||... ||} |It walks from|P2M_ROOT_LEVEL| down to|target|, 
> where|target| is determined at runtime.
> 
> If you mean that, for example, when the G-stage mode is Sv39, there is no 
> need to allocate
> an array with 4 entries (or 5 entries if we consider Sv57, so 
> P2M_MAX_ROOT_LEVEL should be
> updated), because Sv39 only uses 3 page table levels — then yes, in theory it 
> could be
> smaller. But I don't think it is a real issue if the|offsets[]| array on the 
> stack has a
> few extra unused entries.
> 
> If preferred, Icould allocate the array dynamically based 
> on|gstage_root_level|.
> Would that be better?

Having a few unused entries isn't a big deal imo. What I'm not happy with here 
is
that you may _initialize_ more entries than actually needed. I have no good
suggestion within the conceptual framework you use for page walking (the same
issue iirc exists in host page table walks, just that the calculations there are
cheaper).

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.