Xen project Mailing List

Re: [PATCH v2 13/17] xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration

To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Date: Wed, 23 Jul 2025 11:46:20 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 23 Jul 2025 09:46:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.07.2025 18:07, Oleksii Kurochko wrote: > > On 7/22/25 4:35 PM, Jan Beulich wrote: >> On 22.07.2025 16:25, Oleksii Kurochko wrote: >>> On 7/22/25 2:00 PM, Jan Beulich wrote: >>>> On 22.07.2025 13:34, Oleksii Kurochko wrote: >>>>> On 7/22/25 12:41 PM, Oleksii Kurochko wrote: >>>>>> On 7/21/25 2:18 PM, Jan Beulich wrote: >>>>>>> On 18.07.2025 11:52, Oleksii Kurochko wrote: >>>>>>>> On 7/17/25 12:25 PM, Jan Beulich wrote: >>>>>>>>> On 17.07.2025 10:56, Oleksii Kurochko wrote: >>>>>>>>>> On 7/16/25 6:18 PM, Jan Beulich wrote: >>>>>>>>>>> On 16.07.2025 18:07, Oleksii Kurochko wrote: >>>>>>>>>>>> On 7/16/25 1:31 PM, Jan Beulich wrote: >>>>>>>>>>>>> On 15.07.2025 16:47, Oleksii Kurochko wrote: >>>>>>>>>>>>>> On 7/1/25 5:08 PM, Jan Beulich wrote: >>>>>>>>>>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote: >>>>>>>>>>>>>>>> --- a/xen/arch/riscv/p2m.c >>>>>>>>>>>>>>>> +++ b/xen/arch/riscv/p2m.c >>>>>>>>>>>>>>>> @@ -345,6 +345,26 @@ static pte_t *p2m_get_root_pointer(struct >>>>>>>>>>>>>>>> p2m_domain *p2m, gfn_t gfn) >>>>>>>>>>>>>>>> return __map_domain_page(p2m->root + >>>>>>>>>>>>>>>> root_table_indx); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +static int p2m_type_radix_set(struct p2m_domain *p2m, pte_t >>>>>>>>>>>>>>>> pte, p2m_type_t t) >>>>>>>>>>>>>>> See comments on the earlier patch regarding naming. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +{ >>>>>>>>>>>>>>>> + int rc; >>>>>>>>>>>>>>>> + gfn_t gfn = mfn_to_gfn(p2m->domain, mfn_from_pte(pte)); >>>>>>>>>>>>>>> How does this work, when you record GFNs only for Xenheap pages? >>>>>>>>>>>>>> I think I don't understand what is an issue. Could you please >>>>>>>>>>>>>> provide >>>>>>>>>>>>>> some extra details? >>>>>>>>>>>>> Counter question: The mfn_to_gfn() you currently have is only a >>>>>>>>>>>>> stub. It only >>>>>>>>>>>>> works for 1:1 mapped domains. Can you show me the eventual final >>>>>>>>>>>>> implementation >>>>>>>>>>>>> of the function, making it possible to use it here? >>>>>>>>>>>> At the moment, I planned to support only 1:1 mapped domains, so it >>>>>>>>>>>> is final >>>>>>>>>>>> implementation. >>>>>>>>>>> Isn't that on overly severe limitation? >>>>>>>>>> I wouldn't say that it's a severe limitation, as it's just a matter >>>>>>>>>> of how >>>>>>>>>> |mfn_to_gfn()| is implemented. When non-1:1 mapped domains are >>>>>>>>>> supported, >>>>>>>>>> |mfn_to_gfn()| can be implemented differently, while the code where >>>>>>>>>> it’s called >>>>>>>>>> will likely remain unchanged. >>>>>>>>>> >>>>>>>>>> What I meant in my reply is that, for the current state and current >>>>>>>>>> limitations, >>>>>>>>>> this is the final implementation of|mfn_to_gfn()|. But that doesn't >>>>>>>>>> mean I don't >>>>>>>>>> see the value in, or the need for, non-1:1 mapped domains—it's just >>>>>>>>>> that this >>>>>>>>>> limitation simplifies development at the current stage of the RISC-V >>>>>>>>>> port. >>>>>>>>> Simplification is fine in some cases, but not supporting the "normal" >>>>>>>>> way of >>>>>>>>> domain construction looks like a pretty odd restriction. I'm also >>>>>>>>> curious >>>>>>>>> how you envision to implement mfn_to_gfn() then, suitable for generic >>>>>>>>> use like >>>>>>>>> the one here. Imo, current limitation or not, you simply want to >>>>>>>>> avoid use of >>>>>>>>> that function outside of the special gnttab case. >>>>>>>>> >>>>>>>>>>>>>>> In this context (not sure if I asked before): With this use of >>>>>>>>>>>>>>> a radix tree, >>>>>>>>>>>>>>> how do you intend to bound the amount of memory that a domain >>>>>>>>>>>>>>> can use, by >>>>>>>>>>>>>>> making Xen insert very many entries? >>>>>>>>>>>>>> I didn’t think about that. I assumed it would be enough to set >>>>>>>>>>>>>> the amount of >>>>>>>>>>>>>> memory a guest domain can use by >>>>>>>>>>>>>> specifying|xen,domain-p2m-mem-mb| in the DTS, >>>>>>>>>>>>>> or using some predefined value if|xen,domain-p2m-mem-mb| isn’t >>>>>>>>>>>>>> explicitly set. >>>>>>>>>>>>> Which would require these allocations to come from that pool. >>>>>>>>>>>> Yes, and it is true only for non-hardware domains with the current >>>>>>>>>>>> implementation. >>>>>>>>>>> ??? >>>>>>>>>> I meant that pool is used now only for non-hardware domains at the >>>>>>>>>> moment. >>>>>>>>> And how does this matter here? The memory required for the radix tree >>>>>>>>> doesn't >>>>>>>>> come from that pool anyway. >>>>>>>> I thought that is possible to do that somehow, but looking at a code of >>>>>>>> radix-tree.c it seems like the only one way to allocate memroy for the >>>>>>>> radix >>>>>>>> tree isradix_tree_node_alloc() -> xzalloc(struct rcu_node). >>>>>>>> >>>>>>>> Then it is needed to introduce radix_tree_node_allocate(domain) >>>>>>> That would be a possibility, but you may have seen that less than half a >>>>>>> year ago we got rid of something along these lines. So it would require >>>>>>> some pretty good justification to re-introduce. >>>>>>> >>>>>>>> or radix tree >>>>>>>> can't be used at all for mentioned in the previous replies security >>>>>>>> reason, no? >>>>>>> (Very) careful use may still be possible. But the downside of using this >>>>>>> (potentially long lookup times) would always remain. >>>>>> Could you please clarify what do you mean here by "(Very) careful"? >>>>>> I thought about an introduction of an amount of possible keys in radix >>>>>> tree and if this amount >>>>>> is 0 then stop domain. And it is also unclear what should be a value for >>>>>> this amount. >>>>>> Probably, you have better idea. >>>>>> >>>>>> But generally your idea below ... >>>>>>>>>>>>>> Also, it seems this would just lead to the issue you mentioned >>>>>>>>>>>>>> earlier: when >>>>>>>>>>>>>> the memory runs out,|domain_crash()| will be called or PTE will >>>>>>>>>>>>>> be zapped. >>>>>>>>>>>>> Or one domain exhausting memory would cause another domain to >>>>>>>>>>>>> fail. A domain >>>>>>>>>>>>> impacting just itself may be tolerable. But a domain affecting >>>>>>>>>>>>> other domains >>>>>>>>>>>>> isn't. >>>>>>>>>>>> But it seems like this issue could happen in any implementation. >>>>>>>>>>>> It won't happen only >>>>>>>>>>>> if we will have only pre-populated pool for any domain type >>>>>>>>>>>> (hardware, control, guest >>>>>>>>>>>> domain) without ability to extend them or allocate extra pages >>>>>>>>>>>> from domheap in runtime. >>>>>>>>>>>> Otherwise, if extra pages allocation is allowed then we can't >>>>>>>>>>>> really do something >>>>>>>>>>>> with this issue. >>>>>>>>>>> But that's why I brought this up: You simply have to. Or, as >>>>>>>>>>> indicated, the >>>>>>>>>>> moment you mark Xen security-supported on RISC-V, there will be an >>>>>>>>>>> XSA needed. >>>>>>>>>> Why it isn't XSA for other architectures? At least, Arm then should >>>>>>>>>> have such >>>>>>>>>> XSA. >>>>>>>>> Does Arm use a radix tree for storing types? It uses one for >>>>>>>>> mem-access, but >>>>>>>>> it's not clear to me whether that's actually a supported feature. >>>>>>>>> >>>>>>>>>> I don't understand why x86 won't have the same issue. Memory is the >>>>>>>>>> limited >>>>>>>>>> and shared resource, so if one of the domain will use to much memory >>>>>>>>>> then it could >>>>>>>>>> happen that other domains won't have enough memory for its purpose... >>>>>>>>> The question is whether allocations are bounded. With this use of a >>>>>>>>> radix tree, >>>>>>>>> you give domains a way to have Xen allocate pretty much arbitrary >>>>>>>>> amounts of >>>>>>>>> memory to populate that tree. That unbounded-ness is the problem, not >>>>>>>>> memory >>>>>>>>> allocations in general. >>>>>>>> Isn't radix tree key bounded to an amount of GFNs given for a domain? >>>>>>>> We can't have >>>>>>>> more keys then a max GFN number for a domain. So a potential amount of >>>>>>>> necessary memory >>>>>>>> for radix tree is also bounded to an amount of GFNs. >>>>>>> To some degree yes, hence why I said "pretty much arbitrary amounts". >>>>>>> But recall that "amount of GFNs" is a fuzzy term; I think you mean to >>>>>>> use it to describe the amount of memory pages given to the guest. GFNs >>>>>>> can be used for other purposes, though. Guests could e.g. grant >>>>>>> themselves access to their own memory, then map those grants at >>>>>>> otherwise unused GFNs. >>>>>>> >>>>>>>> Anyway, IIUC I just can't use radix tree for p2m types at all, right? >>>>>>>> If yes, does it make sense to borrow 2 bits from struct >>>>>>>> page_info->type_info as now it >>>>>>>> is used 9-bits for count of a frame? >>>>>>> struct page_info describes MFNs, when you want to describe GFNs. As you >>>>>>> mentioned earlier, multiple GFNs can in principle map to the same MFN. >>>>>>> You would force them to all have the same properties, which would be in >>>>>>> direct conflict with e.g. the grant P2M types. >>>>>>> >>>>>>> Just to mention one possible alternative to using radix trees: You could >>>>>>> maintain a 2nd set of intermediate "page tables", just that leaf entries >>>>>>> would hold meta data for the respective GFN. The memory for those "page >>>>>>> tables" could come from the normal P2M pool (and allocation would thus >>>>>>> only consume domain-specific resources). Of course in any model like >>>>>>> this the question of lookup times (as mentioned above) would remain. >>>>>> ...looks like an optimal option. >>>>>> >>>>>> The only thing I worry about is that it will require some code >>>>>> duplication >>>>>> (I will think how to re-use the current one code), as for example, when >>>>>> setting/getting metadata, TLB flushing isn’t needed at all as we aren't >>>>>> working with with real P2M page tables. >>>>>> Agree that lookup won't be the best one, but nothing can be done with >>>>>> such models. >>>>> Probably, instead of having a second set of intermediate "page tables", >>>>> we could just allocate two consecutive pages within the real P2M page >>>>> tables for the intermediate page table. The first page would serve as >>>>> the actual page table to which the intermediate page table points, >>>>> and the second page would store metadata for each entry of the page >>>>> table that the intermediate page table references. >>>>> >>>>> As we are supporting only 1gb, 2mb and 4kb mappings we could do a little >>>>> optimization and start allocate these consecutive pages only for PT levels >>>>> which corresponds to 1gb, 2mb, 4kb mappings. >>>>> >>>>> Does it make sense? >>>> I was indeed entertaining this idea, but I couldn't conclude for myself if >>>> that would indeed be without any rough edges. Hence I didn't want to >>>> suggest such. For example, the need to have adjacent pairs of pages could >>>> result in a higher rate of allocation failures (while populating or >>>> re-sizing the P2M pool). This would be possible to avoid by still using >>>> entirely separate pages, and then merely linking them together via some >>>> unused struct page_info fields (the "normal" linking fields can't be used, >>>> afaict). >>> I think that all the fields are used, so it will be needed to introduce new >>> "struct page_list_entry metadata_list;". >> All the fields are used _somewhere_, sure. But once you have allocated a >> page (and that page isn't assigned to a domain), you control what the >> fields are used for. > > I thought that the whole idea is to use domain's pages from P2M pool freelist, > pages for which is allocated by alloc_domheap_page(d, MEMF_no_owner), so an > allocated page is assigned to a domain. You did check what effect MEMF_no_owner has, didn't you? Such pages are _not_ assigned to the domain. > I assume that I have in this case to take some pages for an intermediate page > table from freelist P2M pool, set an owner domain to NULL (pg->inuse.domain = > NULL). > > Then in this case it isn't clear why pg->list can't be re-used to link > several pages > for intermediate page table purposes + metadata? Is it because pg->list can > be not > empty? In this case it isn't clear if I could use a page, which has threaded > pages. Actually looks like I was mis-remembering. Pages removed from freelist indeed aren't put on any other list, so the linking fields are available for use. I guess I had x86 shadow code in mind, where the linking fields are further used. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.