[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 13/17] xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration
On 22.07.2025 13:34, Oleksii Kurochko wrote: > > On 7/22/25 12:41 PM, Oleksii Kurochko wrote: >> >> >> On 7/21/25 2:18 PM, Jan Beulich wrote: >>> On 18.07.2025 11:52, Oleksii Kurochko wrote: >>>> On 7/17/25 12:25 PM, Jan Beulich wrote: >>>>> On 17.07.2025 10:56, Oleksii Kurochko wrote: >>>>>> On 7/16/25 6:18 PM, Jan Beulich wrote: >>>>>>> On 16.07.2025 18:07, Oleksii Kurochko wrote: >>>>>>>> On 7/16/25 1:31 PM, Jan Beulich wrote: >>>>>>>>> On 15.07.2025 16:47, Oleksii Kurochko wrote: >>>>>>>>>> On 7/1/25 5:08 PM, Jan Beulich wrote: >>>>>>>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote: >>>>>>>>>>>> --- a/xen/arch/riscv/p2m.c >>>>>>>>>>>> +++ b/xen/arch/riscv/p2m.c >>>>>>>>>>>> @@ -345,6 +345,26 @@ static pte_t *p2m_get_root_pointer(struct >>>>>>>>>>>> p2m_domain *p2m, gfn_t gfn) >>>>>>>>>>>> return __map_domain_page(p2m->root + root_table_indx); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +static int p2m_type_radix_set(struct p2m_domain *p2m, pte_t pte, >>>>>>>>>>>> p2m_type_t t) >>>>>>>>>>> See comments on the earlier patch regarding naming. >>>>>>>>>>> >>>>>>>>>>>> +{ >>>>>>>>>>>> + int rc; >>>>>>>>>>>> + gfn_t gfn = mfn_to_gfn(p2m->domain, mfn_from_pte(pte)); >>>>>>>>>>> How does this work, when you record GFNs only for Xenheap pages? >>>>>>>>>> I think I don't understand what is an issue. Could you please provide >>>>>>>>>> some extra details? >>>>>>>>> Counter question: The mfn_to_gfn() you currently have is only a stub. >>>>>>>>> It only >>>>>>>>> works for 1:1 mapped domains. Can you show me the eventual final >>>>>>>>> implementation >>>>>>>>> of the function, making it possible to use it here? >>>>>>>> At the moment, I planned to support only 1:1 mapped domains, so it is >>>>>>>> final >>>>>>>> implementation. >>>>>>> Isn't that on overly severe limitation? >>>>>> I wouldn't say that it's a severe limitation, as it's just a matter of >>>>>> how >>>>>> |mfn_to_gfn()| is implemented. When non-1:1 mapped domains are supported, >>>>>> |mfn_to_gfn()| can be implemented differently, while the code where it’s >>>>>> called >>>>>> will likely remain unchanged. >>>>>> >>>>>> What I meant in my reply is that, for the current state and current >>>>>> limitations, >>>>>> this is the final implementation of|mfn_to_gfn()|. But that doesn't mean >>>>>> I don't >>>>>> see the value in, or the need for, non-1:1 mapped domains—it's just that >>>>>> this >>>>>> limitation simplifies development at the current stage of the RISC-V >>>>>> port. >>>>> Simplification is fine in some cases, but not supporting the "normal" way >>>>> of >>>>> domain construction looks like a pretty odd restriction. I'm also curious >>>>> how you envision to implement mfn_to_gfn() then, suitable for generic use >>>>> like >>>>> the one here. Imo, current limitation or not, you simply want to avoid >>>>> use of >>>>> that function outside of the special gnttab case. >>>>> >>>>>>>>>>> In this context (not sure if I asked before): With this use of a >>>>>>>>>>> radix tree, >>>>>>>>>>> how do you intend to bound the amount of memory that a domain can >>>>>>>>>>> use, by >>>>>>>>>>> making Xen insert very many entries? >>>>>>>>>> I didn’t think about that. I assumed it would be enough to set the >>>>>>>>>> amount of >>>>>>>>>> memory a guest domain can use by specifying|xen,domain-p2m-mem-mb| >>>>>>>>>> in the DTS, >>>>>>>>>> or using some predefined value if|xen,domain-p2m-mem-mb| isn’t >>>>>>>>>> explicitly set. >>>>>>>>> Which would require these allocations to come from that pool. >>>>>>>> Yes, and it is true only for non-hardware domains with the current >>>>>>>> implementation. >>>>>>> ??? >>>>>> I meant that pool is used now only for non-hardware domains at the >>>>>> moment. >>>>> And how does this matter here? The memory required for the radix tree >>>>> doesn't >>>>> come from that pool anyway. >>>> I thought that is possible to do that somehow, but looking at a code of >>>> radix-tree.c it seems like the only one way to allocate memroy for the >>>> radix >>>> tree isradix_tree_node_alloc() -> xzalloc(struct rcu_node). >>>> >>>> Then it is needed to introduce radix_tree_node_allocate(domain) >>> That would be a possibility, but you may have seen that less than half a >>> year ago we got rid of something along these lines. So it would require >>> some pretty good justification to re-introduce. >>> >>>> or radix tree >>>> can't be used at all for mentioned in the previous replies security >>>> reason, no? >>> (Very) careful use may still be possible. But the downside of using this >>> (potentially long lookup times) would always remain. >> Could you please clarify what do you mean here by "(Very) careful"? >> I thought about an introduction of an amount of possible keys in radix tree >> and if this amount >> is 0 then stop domain. And it is also unclear what should be a value for >> this amount. >> Probably, you have better idea. >> >> But generally your idea below ... >>>>>>>>>> Also, it seems this would just lead to the issue you mentioned >>>>>>>>>> earlier: when >>>>>>>>>> the memory runs out,|domain_crash()| will be called or PTE will be >>>>>>>>>> zapped. >>>>>>>>> Or one domain exhausting memory would cause another domain to fail. A >>>>>>>>> domain >>>>>>>>> impacting just itself may be tolerable. But a domain affecting other >>>>>>>>> domains >>>>>>>>> isn't. >>>>>>>> But it seems like this issue could happen in any implementation. It >>>>>>>> won't happen only >>>>>>>> if we will have only pre-populated pool for any domain type (hardware, >>>>>>>> control, guest >>>>>>>> domain) without ability to extend them or allocate extra pages from >>>>>>>> domheap in runtime. >>>>>>>> Otherwise, if extra pages allocation is allowed then we can't really >>>>>>>> do something >>>>>>>> with this issue. >>>>>>> But that's why I brought this up: You simply have to. Or, as indicated, >>>>>>> the >>>>>>> moment you mark Xen security-supported on RISC-V, there will be an XSA >>>>>>> needed. >>>>>> Why it isn't XSA for other architectures? At least, Arm then should have >>>>>> such >>>>>> XSA. >>>>> Does Arm use a radix tree for storing types? It uses one for mem-access, >>>>> but >>>>> it's not clear to me whether that's actually a supported feature. >>>>> >>>>>> I don't understand why x86 won't have the same issue. Memory is the >>>>>> limited >>>>>> and shared resource, so if one of the domain will use to much memory >>>>>> then it could >>>>>> happen that other domains won't have enough memory for its purpose... >>>>> The question is whether allocations are bounded. With this use of a radix >>>>> tree, >>>>> you give domains a way to have Xen allocate pretty much arbitrary amounts >>>>> of >>>>> memory to populate that tree. That unbounded-ness is the problem, not >>>>> memory >>>>> allocations in general. >>>> Isn't radix tree key bounded to an amount of GFNs given for a domain? We >>>> can't have >>>> more keys then a max GFN number for a domain. So a potential amount of >>>> necessary memory >>>> for radix tree is also bounded to an amount of GFNs. >>> To some degree yes, hence why I said "pretty much arbitrary amounts". >>> But recall that "amount of GFNs" is a fuzzy term; I think you mean to >>> use it to describe the amount of memory pages given to the guest. GFNs >>> can be used for other purposes, though. Guests could e.g. grant >>> themselves access to their own memory, then map those grants at >>> otherwise unused GFNs. >>> >>>> Anyway, IIUC I just can't use radix tree for p2m types at all, right? >>>> If yes, does it make sense to borrow 2 bits from struct >>>> page_info->type_info as now it >>>> is used 9-bits for count of a frame? >>> struct page_info describes MFNs, when you want to describe GFNs. As you >>> mentioned earlier, multiple GFNs can in principle map to the same MFN. >>> You would force them to all have the same properties, which would be in >>> direct conflict with e.g. the grant P2M types. >>> >>> Just to mention one possible alternative to using radix trees: You could >>> maintain a 2nd set of intermediate "page tables", just that leaf entries >>> would hold meta data for the respective GFN. The memory for those "page >>> tables" could come from the normal P2M pool (and allocation would thus >>> only consume domain-specific resources). Of course in any model like >>> this the question of lookup times (as mentioned above) would remain. >> ...looks like an optimal option. >> >> The only thing I worry about is that it will require some code duplication >> (I will think how to re-use the current one code), as for example, when >> setting/getting metadata, TLB flushing isn’t needed at all as we aren't >> working with with real P2M page tables. >> Agree that lookup won't be the best one, but nothing can be done with >> such models. > > Probably, instead of having a second set of intermediate "page tables", > we could just allocate two consecutive pages within the real P2M page > tables for the intermediate page table. The first page would serve as > the actual page table to which the intermediate page table points, > and the second page would store metadata for each entry of the page > table that the intermediate page table references. > > As we are supporting only 1gb, 2mb and 4kb mappings we could do a little > optimization and start allocate these consecutive pages only for PT levels > which corresponds to 1gb, 2mb, 4kb mappings. > > Does it make sense? I was indeed entertaining this idea, but I couldn't conclude for myself if that would indeed be without any rough edges. Hence I didn't want to suggest such. For example, the need to have adjacent pairs of pages could result in a higher rate of allocation failures (while populating or re-sizing the P2M pool). This would be possible to avoid by still using entirely separate pages, and then merely linking them together via some unused struct page_info fields (the "normal" linking fields can't be used, afaict). Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |