| 
    
 [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings
 
 On 7/2/25 11:25 AM, Jan Beulich wrote: 
    On 10.06.2025 15:05, Oleksii Kurochko wrote: I think that I don't fully understand what is an issue. As to (no need for) BBM: I couldn't find anything to that effect in the privileged spec. Can you provide some pointer? What I found instead is e.g. this sentence: "To ensure that implicit reads observe writes to the same memory locations, an SFENCE.VMA instruction must be executed after the writes to flush the relevant cached translations." And this: "Accessing the same location using different cacheability attributes may cause loss of coherence." (This may not only occur when the same physical address is mapped twice at different VAs, but also after the shattering of a superpage when the new entry differs in cacheability.) I also couldn't find that RISC-V spec mandates BBM explicitly as Arm does it. What I meant that on RISC-V can do: - Write new PTE - Flush TLB While on Arm it is almost always needed to do: - Write zero to PTE - Flush TLB - Write new PTE For example, the common CoW code path where you copy from a read only page to a new page, then map that new page as writable just works on RISC-V without extra considerations and on Arm it requires BBM. It seems to me that BBM is mandated for Arm only because that TLB is shared among cores, so there is no any guarantee that no prior entry for same VA remains in TLB. In case of RISC-V's TLB isn't shared and after a flush it is guaranteed that no prior entry for the same VA remains in the TLB. But in the same time it could be cases, I guess, where BBM will be needed for RISC-V too. Even the case of CoW mentioned above will need some kind of BBM, but nothing that'll the CPU misbehave by doing CoW without BBM on RISC-V. Additionally, the page table walk logic has been adjusted, as ARM uses the opposite walk order compared to RISC-V.I think you used some similar wording already in an earlier patch. I find this confusing: Walk order is, aiui, the same. It's merely the numbering of levels that is the opposite way round, isn't it? Yes, the numbering of levels is different and I counted that as a different walk order. If it is too confusing, I will reword it and use numbering of levels. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx> --- Changes in V2: - New patch. It was a part of a big patch "xen/riscv: implement p2m mapping functionality" which was splitted to smaller. - Update the commit above the cycle which creates new page table as RISC-V travserse page tables in an opposite to ARM order. - RISC-V doesn't require BBM so there is no needed for invalidating and TLB flushing before updating PTE. --- xen/arch/riscv/p2m.c | 102 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 101 insertions(+), 1 deletion(-) diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index 87dd636b80..79c4473f1f 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -743,6 +743,77 @@ static void p2m_free_entry(struct p2m_domain *p2m, p2m_free_page(p2m->domain, pg); } +static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, + unsigned int level, unsigned int target, + const unsigned int *offsets) +{ + struct page_info *page; + unsigned int i; + pte_t pte, *table; + bool rv = true; + + /* Convenience aliases */ + mfn_t mfn = pte_get_mfn(*entry); + unsigned int next_level = level - 1; + unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level); + + /* + * This should only be called with target != level and the entry is + * a superpage. + */ + ASSERT(level > target); + ASSERT(p2me_is_superpage(p2m, *entry, level)); + + page = p2m_alloc_page(p2m->domain); + if ( !page ) + return false; + + page_list_add(page, &p2m->pages);Is there a reason this list maintenance isn't done in p2m_alloc_page()? No there is no any reason, I will move that inside p2m_alloc_page(). 
 It is wording question. It should be something like: + /* + * Shatter superpage in the page to the level we want to make the + * changes. + * This is done outside the loop to avoid checking the offset for every + * entry (of new page table) to know whether the entry should be shattered. + */ 
 + /* TODO: why it is necessary to have clean here? Not somewhere in the caller */ + if ( p2m->clean_pte ) + clean_dcache_va_range(table, PAGE_SIZE); + + unmap_domain_page(table);Again likely not something that wants taking care of right away, but there again is an inefficiency here: The caller almost certainly wants to map the same page again, to update the one entry that caused the request to shatter the page. I'll add TODO. + /* + * Even if we failed, we should install the newly allocated PTE + * entry. The caller will be in charge to free the sub-tree. + */ + p2m_write_pte(entry, page_to_p2m_table(p2m, page), p2m->clean_pte);Why would it be wrong to free the page right here, vacating the entry at the same time (or leaving just that to the caller)? (IOW - if this is an implementation decision of yours, I think the word "should" would want dropping.) After all, the caller invoking p2m_free_entry() on the thus split PTE is less efficient (needs to iterate over all entries) than on the original one (where it's just a single superpage). I think that I didn't get your idea. 
 I am not really sure that I fully understand. My understanding is if level != target then the splitting is needed, I am not really get the part "split a superpage if it really wouldn't need splitting". 
 No as the test for this case was missed. I will add one. 
 I will add TODO to make that part more efficient. And based on your reply regarding statement inside if(), I'll likely to use do/while(). Thanks. ~ Oleksii 
 
 
 
  | 
  
![]()  | 
            
         Lists.xenproject.org is hosted with RackSpace, monitoring our  |