[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 10/18] xen/riscv: implement p2m_set_range()


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 26 Sep 2025 09:07:51 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 26 Sep 2025 07:08:01 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.09.2025 22:08, Oleksii Kurochko wrote:
> On 9/20/25 1:36 AM, Jan Beulich wrote:
>> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>>> +static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
>>> +{
>>> +    unsigned long root_table_indx;
>>> +
>>> +    root_table_indx = gfn_x(gfn) >> P2M_LEVEL_ORDER(P2M_ROOT_LEVEL);
>>> +    if ( root_table_indx >= P2M_ROOT_PAGES )
>>> +        return NULL;
>>> +
>>> +    /*
>>> +     * The P2M root page table is extended by 2 bits, making its size 16KB
>>> +     * (instead of 4KB for non-root page tables). Therefore, p2m->root is
>>> +     * allocated as four consecutive 4KB pages (since alloc_domheap_pages()
>>> +     * only allocates 4KB pages).
>>> +     *
>>> +     * To determine which of these four 4KB pages the root_table_indx falls
>>> +     * into, we divide root_table_indx by
>>> +     * P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1).
>>> +     */
>>> +    root_table_indx /= P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1);
>> The subtraction of 1 here feels odd: You're after the root table's
>> number of entries, i.e. I'd expect you to pass just P2M_ROOT_LEVEL.
>> And the way P2M_PAGETABLE_ENTRIES() works also suggests so.
> 
> The purpose of this line is to select the page within the root table, which
> consists of 4 consecutive pages. However, 
> P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL)
> returns 2048, so root_table_idx will always be 0 after devision, which is not
> what we want.
> 
> As an alternative, P2M_PAGETABLE_ENTRIES(0) could be used, since it always
> returns 512. Dividing root_table_idx by 512 then yields the index of the page
> within the root table, which is made up of 4 consecutive pages.
> 
> Does it make sense now?

Yes and no. I understand what you're after, but that doesn't make the use of
P2M_PAGETABLE_ENTRIES() (with an arbitrary level as argument) correct. This
calculation wants doing by solely using properties of the top level.

> The problem may occur with DECLARE_OFFSET(), which can produce an incorrect
> index within the root page table. Since the index is in the range [0, 2047],
> it becomes an issue if the value is greater than 511, because DECLARE_OFFSET()
> does not account for the larger range of the root index.
> 
> I am not sure whether it is better to make DECLARE_OFFSET() generic enough
> for both P2M and Xen page tables, or to provide a separate 
> P2M_DECLARE_OFFSET()
> and use it only in P2M-related code.
> Also, it could be an option to move DECLARE_OFFSET() from asm/page.h header
> to riscv/pt.c and define another one DECLARE_OFFSETS in riscv/p2m.c.
> 
> Do you have a preference?

Not really, no. I don't like DECLARE_OFFSETS() anyway.

>>> +static int p2m_set_entry(struct p2m_domain *p2m,
>>> +                           gfn_t gfn,
>>> +                           unsigned long page_order,
>>> +                           mfn_t mfn,
>>> +                           p2m_type_t t)
>> Nit: Indentation.
>>
>>> +{
>>> +    unsigned int level;
>>> +    unsigned int target = page_order / PAGETABLE_ORDER;
>>> +    pte_t *entry, *table, orig_pte;
>>> +    int rc;
>>> +    /*
>>> +     * A mapping is removed only if the MFN is explicitly set to 
>>> INVALID_MFN.
>>> +     * Other MFNs that are considered invalid by mfn_valid() (e.g., MMIO)
>>> +     * are still allowed.
>>> +     */
>>> +    bool removing_mapping = mfn_eq(mfn, INVALID_MFN);
>>> +    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
>>> +
>>> +    ASSERT(p2m_is_write_locked(p2m));
>>> +
>>> +    /*
>>> +     * Check if the level target is valid: we only support
>>> +     * 4K - 2M - 1G mapping.
>>> +     */
>>> +    ASSERT(target <= 2);
>>> +
>>> +    table = p2m_get_root_pointer(p2m, gfn);
>>> +    if ( !table )
>>> +        return -EINVAL;
>>> +
>>> +    for ( level = P2M_ROOT_LEVEL; level > target; level-- )
>>> +    {
>>> +        /*
>>> +         * Don't try to allocate intermediate page table if the mapping
>>> +         * is about to be removed.
>>> +         */
>>> +        rc = p2m_next_level(p2m, !removing_mapping,
>>> +                            level, &table, offsets[level]);
>>> +        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
>>> +        {
>>> +            rc = (rc == P2M_TABLE_MAP_NONE) ? -ENOENT : -ENOMEM;
>>> +            /*
>>> +             * We are here because p2m_next_level has failed to map
>>> +             * the intermediate page table (e.g the table does not exist
>>> +             * and they p2m tree is read-only).
>> I thought I commented on this or something similar already: Calling the
>> p2m tree "read-only" is imo misleading.
> 
> I will change then "read-only" to "not allocatable".

That'll be only marginally better: What's "allocatable"? Why not something
like "... does not exist and none should be allocated"? Or maybe simply
omit this part of the comment?

>>> +    /*
>>> +     * Free the entry only if the original pte was valid and the base
>>> +     * is different (to avoid freeing when permission is changed).
>>> +     *
>>> +     * If previously MFN 0 was mapped and it is going to be removed
>>> +     * and considering that during removing MFN 0 is used then `entry`
>>> +     * and `new_entry` will be the same and p2m_free_subtree() won't be
>>> +     * called. This case is handled explicitly.
>>> +     */
>>> +    if ( pte_is_valid(orig_pte) &&
>>> +         (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ||
>>> +          (removing_mapping && mfn_eq(pte_get_mfn(*entry), _mfn(0)))) )
>>> +        p2m_free_subtree(p2m, orig_pte, level);
>> I continue to fail to understand why the MFN would matter here.
> 
> My understanding is that if, for the same GFN, the MFN changes fromMFN_1 to
> MFN_2, then we need to update any references on the page referenced by
> |orig_pte| to ensure the proper reference counter is maintained for the page
> pointed to byMFN_1.
> 
>>   Isn't the
>> need to free strictly tied to a VALID -> NOT VALID transition? A permission
>> change simply retains the VALID state of an entry.
> 
> It covers a case when removing happens and probably in this case we don't need
> to check specifically for mfn(0) case "mfn_eq(pte_get_mfn(*entry), _mfn(0))",
> but it would be enough to check that pte_is_valid(entry) instead:
>    ...
>    (removing_mapping && !pte_is_valid(entry)))) )
> 
> Or only check removing_mapping variable as `entry` would be invalided by the
> code above anyway. So we will get:
> +    if ( pte_is_valid(orig_pte) &&
> +         (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) || 
> removing_mapping) )
> +        p2m_free_subtree(p2m, orig_pte, level);
> 
> Does it make sense now?

Not really, sorry. Imo the complicated condition indicates that something is
wrong (or at least inefficient) here.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.