[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 13/17] xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 23 Jul 2025 11:46:20 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 23 Jul 2025 09:46:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.07.2025 18:07, Oleksii Kurochko wrote:
> 
> On 7/22/25 4:35 PM, Jan Beulich wrote:
>> On 22.07.2025 16:25, Oleksii Kurochko wrote:
>>> On 7/22/25 2:00 PM, Jan Beulich wrote:
>>>> On 22.07.2025 13:34, Oleksii Kurochko wrote:
>>>>> On 7/22/25 12:41 PM, Oleksii Kurochko wrote:
>>>>>> On 7/21/25 2:18 PM, Jan Beulich wrote:
>>>>>>> On 18.07.2025 11:52, Oleksii Kurochko wrote:
>>>>>>>> On 7/17/25 12:25 PM, Jan Beulich wrote:
>>>>>>>>> On 17.07.2025 10:56, Oleksii Kurochko wrote:
>>>>>>>>>> On 7/16/25 6:18 PM, Jan Beulich wrote:
>>>>>>>>>>> On 16.07.2025 18:07, Oleksii Kurochko wrote:
>>>>>>>>>>>> On 7/16/25 1:31 PM, Jan Beulich wrote:
>>>>>>>>>>>>> On 15.07.2025 16:47, Oleksii Kurochko wrote:
>>>>>>>>>>>>>> On 7/1/25 5:08 PM, Jan Beulich wrote:
>>>>>>>>>>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>>>>>>>>>>>>> --- a/xen/arch/riscv/p2m.c
>>>>>>>>>>>>>>>> +++ b/xen/arch/riscv/p2m.c
>>>>>>>>>>>>>>>> @@ -345,6 +345,26 @@ static pte_t *p2m_get_root_pointer(struct 
>>>>>>>>>>>>>>>> p2m_domain *p2m, gfn_t gfn)
>>>>>>>>>>>>>>>>             return __map_domain_page(p2m->root + 
>>>>>>>>>>>>>>>> root_table_indx);
>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>         
>>>>>>>>>>>>>>>> +static int p2m_type_radix_set(struct p2m_domain *p2m, pte_t 
>>>>>>>>>>>>>>>> pte, p2m_type_t t)
>>>>>>>>>>>>>>> See comments on the earlier patch regarding naming.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +    int rc;
>>>>>>>>>>>>>>>> +    gfn_t gfn = mfn_to_gfn(p2m->domain, mfn_from_pte(pte));
>>>>>>>>>>>>>>> How does this work, when you record GFNs only for Xenheap pages?
>>>>>>>>>>>>>> I think I don't understand what is an issue. Could you please 
>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>> some extra details?
>>>>>>>>>>>>> Counter question: The mfn_to_gfn() you currently have is only a 
>>>>>>>>>>>>> stub. It only
>>>>>>>>>>>>> works for 1:1 mapped domains. Can you show me the eventual final 
>>>>>>>>>>>>> implementation
>>>>>>>>>>>>> of the function, making it possible to use it here?
>>>>>>>>>>>> At the moment, I planned to support only 1:1 mapped domains, so it 
>>>>>>>>>>>> is final
>>>>>>>>>>>> implementation.
>>>>>>>>>>> Isn't that on overly severe limitation?
>>>>>>>>>> I wouldn't say that it's a severe limitation, as it's just a matter 
>>>>>>>>>> of how
>>>>>>>>>> |mfn_to_gfn()| is implemented. When non-1:1 mapped domains are 
>>>>>>>>>> supported,
>>>>>>>>>> |mfn_to_gfn()| can be implemented differently, while the code where 
>>>>>>>>>> it’s called
>>>>>>>>>> will likely remain unchanged.
>>>>>>>>>>
>>>>>>>>>> What I meant in my reply is that, for the current state and current 
>>>>>>>>>> limitations,
>>>>>>>>>> this is the final implementation of|mfn_to_gfn()|. But that doesn't 
>>>>>>>>>> mean I don't
>>>>>>>>>> see the value in, or the need for, non-1:1 mapped domains—it's just 
>>>>>>>>>> that this
>>>>>>>>>> limitation simplifies development at the current stage of the RISC-V 
>>>>>>>>>> port.
>>>>>>>>> Simplification is fine in some cases, but not supporting the "normal" 
>>>>>>>>> way of
>>>>>>>>> domain construction looks like a pretty odd restriction. I'm also 
>>>>>>>>> curious
>>>>>>>>> how you envision to implement mfn_to_gfn() then, suitable for generic 
>>>>>>>>> use like
>>>>>>>>> the one here. Imo, current limitation or not, you simply want to 
>>>>>>>>> avoid use of
>>>>>>>>> that function outside of the special gnttab case.
>>>>>>>>>
>>>>>>>>>>>>>>> In this context (not sure if I asked before): With this use of 
>>>>>>>>>>>>>>> a radix tree,
>>>>>>>>>>>>>>> how do you intend to bound the amount of memory that a domain 
>>>>>>>>>>>>>>> can use, by
>>>>>>>>>>>>>>> making Xen insert very many entries?
>>>>>>>>>>>>>> I didn’t think about that. I assumed it would be enough to set 
>>>>>>>>>>>>>> the amount of
>>>>>>>>>>>>>> memory a guest domain can use by 
>>>>>>>>>>>>>> specifying|xen,domain-p2m-mem-mb| in the DTS,
>>>>>>>>>>>>>> or using some predefined value if|xen,domain-p2m-mem-mb| isn’t 
>>>>>>>>>>>>>> explicitly set.
>>>>>>>>>>>>> Which would require these allocations to come from that pool.
>>>>>>>>>>>> Yes, and it is true only for non-hardware domains with the current 
>>>>>>>>>>>> implementation.
>>>>>>>>>>> ???
>>>>>>>>>> I meant that pool is used now only for non-hardware domains at the 
>>>>>>>>>> moment.
>>>>>>>>> And how does this matter here? The memory required for the radix tree 
>>>>>>>>> doesn't
>>>>>>>>> come from that pool anyway.
>>>>>>>> I thought that is possible to do that somehow, but looking at a code of
>>>>>>>> radix-tree.c it seems like the only one way to allocate memroy for the 
>>>>>>>> radix
>>>>>>>> tree isradix_tree_node_alloc() -> xzalloc(struct rcu_node).
>>>>>>>>
>>>>>>>> Then it is needed to introduce radix_tree_node_allocate(domain)
>>>>>>> That would be a possibility, but you may have seen that less than half a
>>>>>>> year ago we got rid of something along these lines. So it would require
>>>>>>> some pretty good justification to re-introduce.
>>>>>>>
>>>>>>>> or radix tree
>>>>>>>> can't be used at all for mentioned in the previous replies security 
>>>>>>>> reason, no?
>>>>>>> (Very) careful use may still be possible. But the downside of using this
>>>>>>> (potentially long lookup times) would always remain.
>>>>>> Could you please clarify what do you mean here by "(Very) careful"?
>>>>>> I thought about an introduction of an amount of possible keys in radix 
>>>>>> tree and if this amount
>>>>>> is 0 then stop domain. And it is also unclear what should be a value for 
>>>>>> this amount.
>>>>>> Probably, you have better idea.
>>>>>>
>>>>>> But generally your idea below ...
>>>>>>>>>>>>>> Also, it seems this would just lead to the issue you mentioned 
>>>>>>>>>>>>>> earlier: when
>>>>>>>>>>>>>> the memory runs out,|domain_crash()| will be called or PTE will 
>>>>>>>>>>>>>> be zapped.
>>>>>>>>>>>>> Or one domain exhausting memory would cause another domain to 
>>>>>>>>>>>>> fail. A domain
>>>>>>>>>>>>> impacting just itself may be tolerable. But a domain affecting 
>>>>>>>>>>>>> other domains
>>>>>>>>>>>>> isn't.
>>>>>>>>>>>> But it seems like this issue could happen in any implementation. 
>>>>>>>>>>>> It won't happen only
>>>>>>>>>>>> if we will have only pre-populated pool for any domain type 
>>>>>>>>>>>> (hardware, control, guest
>>>>>>>>>>>> domain) without ability to extend them or allocate extra pages 
>>>>>>>>>>>> from domheap in runtime.
>>>>>>>>>>>> Otherwise, if extra pages allocation is allowed then we can't 
>>>>>>>>>>>> really do something
>>>>>>>>>>>> with this issue.
>>>>>>>>>>> But that's why I brought this up: You simply have to. Or, as 
>>>>>>>>>>> indicated, the
>>>>>>>>>>> moment you mark Xen security-supported on RISC-V, there will be an 
>>>>>>>>>>> XSA needed.
>>>>>>>>>> Why it isn't XSA for other architectures? At least, Arm then should 
>>>>>>>>>> have such
>>>>>>>>>> XSA.
>>>>>>>>> Does Arm use a radix tree for storing types? It uses one for 
>>>>>>>>> mem-access, but
>>>>>>>>> it's not clear to me whether that's actually a supported feature.
>>>>>>>>>
>>>>>>>>>> I don't understand why x86 won't have the same issue. Memory is the 
>>>>>>>>>> limited
>>>>>>>>>> and shared resource, so if one of the domain will use to much memory 
>>>>>>>>>> then it could
>>>>>>>>>> happen that other domains won't have enough memory for its purpose...
>>>>>>>>> The question is whether allocations are bounded. With this use of a 
>>>>>>>>> radix tree,
>>>>>>>>> you give domains a way to have Xen allocate pretty much arbitrary 
>>>>>>>>> amounts of
>>>>>>>>> memory to populate that tree. That unbounded-ness is the problem, not 
>>>>>>>>> memory
>>>>>>>>> allocations in general.
>>>>>>>> Isn't radix tree key bounded to an amount of GFNs given for a domain? 
>>>>>>>> We can't have
>>>>>>>> more keys then a max GFN number for a domain. So a potential amount of 
>>>>>>>> necessary memory
>>>>>>>> for radix tree is also bounded to an amount of GFNs.
>>>>>>> To some degree yes, hence why I said "pretty much arbitrary amounts".
>>>>>>> But recall that "amount of GFNs" is a fuzzy term; I think you mean to
>>>>>>> use it to describe the amount of memory pages given to the guest. GFNs
>>>>>>> can be used for other purposes, though. Guests could e.g. grant
>>>>>>> themselves access to their own memory, then map those grants at
>>>>>>> otherwise unused GFNs.
>>>>>>>
>>>>>>>> Anyway, IIUC I just can't use radix tree for p2m types at all, right?
>>>>>>>> If yes, does it make sense to borrow 2 bits from struct 
>>>>>>>> page_info->type_info as now it
>>>>>>>> is used 9-bits for count of a frame?
>>>>>>> struct page_info describes MFNs, when you want to describe GFNs. As you
>>>>>>> mentioned earlier, multiple GFNs can in principle map to the same MFN.
>>>>>>> You would force them to all have the same properties, which would be in
>>>>>>> direct conflict with e.g. the grant P2M types.
>>>>>>>
>>>>>>> Just to mention one possible alternative to using radix trees: You could
>>>>>>> maintain a 2nd set of intermediate "page tables", just that leaf entries
>>>>>>> would hold meta data for the respective GFN. The memory for those "page
>>>>>>> tables" could come from the normal P2M pool (and allocation would thus
>>>>>>> only consume domain-specific resources). Of course in any model like
>>>>>>> this the question of lookup times (as mentioned above) would remain.
>>>>>> ...looks like an optimal option.
>>>>>>
>>>>>> The only thing I worry about is that it will require some code 
>>>>>> duplication
>>>>>> (I will think how to re-use the current one code), as for example, when
>>>>>> setting/getting metadata, TLB flushing isn’t needed at all as we aren't
>>>>>> working with with real P2M page tables.
>>>>>> Agree that lookup won't be the best one, but nothing can be done with
>>>>>> such models.
>>>>> Probably, instead of having a second set of intermediate "page tables",
>>>>> we could just allocate two consecutive pages within the real P2M page
>>>>> tables for the intermediate page table. The first page would serve as
>>>>> the actual page table to which the intermediate page table points,
>>>>> and the second page would store metadata for each entry of the page
>>>>> table that the intermediate page table references.
>>>>>
>>>>> As we are supporting only 1gb, 2mb and 4kb mappings we could do a little
>>>>> optimization and start allocate these consecutive pages only for PT levels
>>>>> which corresponds to 1gb, 2mb, 4kb mappings.
>>>>>
>>>>> Does it make sense?
>>>> I was indeed entertaining this idea, but I couldn't conclude for myself if
>>>> that would indeed be without any rough edges. Hence I didn't want to
>>>> suggest such. For example, the need to have adjacent pairs of pages could
>>>> result in a higher rate of allocation failures (while populating or
>>>> re-sizing the P2M pool). This would be possible to avoid by still using
>>>> entirely separate pages, and then merely linking them together via some
>>>> unused struct page_info fields (the "normal" linking fields can't be used,
>>>> afaict).
>>> I think that all the fields are used, so it will be needed to introduce new
>>> "struct page_list_entry metadata_list;".
>> All the fields are used _somewhere_, sure. But once you have allocated a
>> page (and that page isn't assigned to a domain), you control what the
>> fields are used for.
> 
> I thought that the whole idea is to use domain's pages from P2M pool freelist,
> pages for which is allocated by alloc_domheap_page(d, MEMF_no_owner), so an
> allocated page is assigned to a domain.

You did check what effect MEMF_no_owner has, didn't you? Such pages are _not_
assigned to the domain.

> I assume that I have in this case to take some pages for an intermediate page
> table from freelist P2M pool, set an owner domain to NULL (pg->inuse.domain = 
> NULL).
> 
> Then in this case it isn't clear why pg->list can't be re-used to link 
> several pages
> for intermediate page table purposes + metadata? Is it because pg->list can 
> be not
> empty? In this case it isn't clear if I could use a page, which has threaded 
> pages.

Actually looks like I was mis-remembering. Pages removed from freelist indeed
aren't put on any other list, so the linking fields are available for use. I
guess I had x86 shadow code in mind, where the linking fields are further used.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.