[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/2] xen/x86: Change stub page allocation/free


  • To: Jason Andryuk <jason.andryuk@xxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 10 Jun 2026 20:20:02 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+zGDy93rB1DYgqXEzkIxkMsUHXhpUViBeCTDJx1Nj2M=; b=vD3+IkNimOFKKPeO4TwC+XwZm7AubKxgtFEYfQ1HNLzG0x7ULcYGIKK7TmnrQTpUhCJr1YaToDT+gZt6IL4VT2dhXzr8vr9Cw9FfiZhxEC8fsApg4sGlMTkAsV0MnNf3F2oRoKffUtFdB36/flErsruVmXT/L6izQ0IuOy1sFya55KvWRgT64fJJvAatjPk75ZsaRggVlhyoN2ieVClFC9yQ/0juk6JWKF9x8pe5zZPr/2Qamzr50oTwubbtKbjyaYuxKOyEWXVK4Lpw7DmJ72stufZCIBPsfkTdlbrWqHtCfDr9R4TRuiZe58uwK6/IBgA+AdEGk7bJU1JA3jQ0MQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=FbLr4AnaqRi692JDcIz+3YWLbJGQrXQ9WifI90DEIBN3V0QA/97/7gPdKqa/Xihv5NVf80EpoypoKUur4YJG22QOU2k4y3GjG5fEfhk4kbWaMkAXvp0CXQ1Z3zpYIO52s1hfASFXZC6tNqcMbjLnjTCFinHUYGVTM4VlWUWy7FME4u+tfqvBpwAVmoML7inTyBM41EnP4v43LADpX9x2N4CqyzGGn/iaU9p6rgH0+P+m60ZMTdsdJRAwfZ3G9hPpnEh8adW6+CqTTO9ll1+99sSjVN13UqEjwrFiELc9z66tyIR2M0pQMzrakwxn5hXIc4TEZRDmMcR/nMC1xwwAZg==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>
  • Delivery-date: Wed, 10 Jun 2026 18:20:33 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 10, 2026 at 11:23:46AM -0400, Jason Andryuk wrote:
> On 2026-06-10 11:01, Roger Pau Monné wrote:
> > On Mon, Jun 08, 2026 at 08:06:38PM -0400, Jason Andryuk wrote:
> > > Today the inline tracking of the stub page is problematic.  0xcc is used
> > > to indicate unused, but it is also a "clear value."  A !CONFIG_PV build
> > > with smt=0 will bring up CPU0, bring up CPU1, bring down CPU1, and free
> > > the in-use stub page.  Subsequent CPU onlining can write to the re-used
> > > page.
> > > 
> > > The new approach uses a global, CPU-indexed array of stub pages.
> > > However, to handle NUMA aware allocations, we cannot allocate all the
> > > pages in advance because the NUMA information is not available.  Keep
> > > track of 1 current page for each NUMA node, allocated on demand, and
> > > allocate the stub buffers out of those pages.
> > > 
> > > The current NUMA allocation approach is opportunistic sharing among the
> > > groups of 32 processors.  The new approach will allocate buffers densely
> > > in a NUMA node.
> > > 
> > > stub pages are no longer freed.  They remain referenced in the global
> > > CPU-indexed array and are re-used if the CPU is re-onlined.
> > > 
> > > stubs and node_stubs don't have an explicit lock.  During boot they are
> > > accessed single threaded.  During runtime, &cpu_add_remove_lock
> > > serializes access.
> > > 
> > > Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
> > > Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
> > > ---
> > > I'm not sure how to test the NUMA part - I don't have an NUMA system.
> > > Also, if NUMA is active, is a cpu node of NUMA_NO_NODE still possible?
> > > I used the MAX_NUMNODES + 1 array sizing to handle that, but it's not
> > > obvious to me if that is necessary.
> > > 
> > > Roger mentioned removing the per-cpu stubs.mfn.  We'd need to replace
> > > that with exposing the stubs array for traps and the emulator.  I have
> > > no idea if that will be an improvement and am looking for agreement on
> > > this patch before attempting.
> > > ---
> > >   xen/arch/x86/include/asm/stubs.h |   2 +-
> > >   xen/arch/x86/setup.c             |   3 +-
> > >   xen/arch/x86/smpboot.c           | 110 +++++++++++++++++++++----------
> > >   3 files changed, 77 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/xen/arch/x86/include/asm/stubs.h 
> > > b/xen/arch/x86/include/asm/stubs.h
> > > index a520928e9a..9d776f81dd 100644
> > > --- a/xen/arch/x86/include/asm/stubs.h
> > > +++ b/xen/arch/x86/include/asm/stubs.h
> > > @@ -32,6 +32,6 @@ struct stubs {
> > >   };
> > >   DECLARE_PER_CPU(struct stubs, stubs);
> > > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
> > > +unsigned long assign_stub_page(unsigned int cpu);
> > >   #endif /* X86_ASM_STUBS_H */
> > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> > > index 19ee857abf..0cac94cbdb 100644
> > > --- a/xen/arch/x86/setup.c
> > > +++ b/xen/arch/x86/setup.c
> > > @@ -2089,8 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void)
> > >       init_idle_domain();
> > > -    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
> > > -                                           &this_cpu(stubs).mfn);
> > > +    this_cpu(stubs.addr) = assign_stub_page(0);
> > 
> > Given stub pages is first used quite late in the boot process, the above
> > arrays would better be dynamically allocated using xvmalloc_array().
> 
> Ok.  At some point I intended to dynamically allocate.  But x86 doesn't have
> num_possible_cpus(), and I thought num_present_cpus() wouldn't have the
> correct value.  nr_cpu_ids seemed close to the value, but then I convinced
> myself NR_CPUS would be okay.

I will double check, but I think num_present_cpus() accounts for the
maximum number of online CPUs possible at any point.

> > diff --git a/xen/arch/x86/include/asm/stubs.h 
> > b/xen/arch/x86/include/asm/stubs.h
> > index a520928e9a50..d575f1eb0631 100644
> > --- a/xen/arch/x86/include/asm/stubs.h
> > +++ b/xen/arch/x86/include/asm/stubs.h
> > @@ -32,6 +32,7 @@ struct stubs {
> >   };
> >   DECLARE_PER_CPU(struct stubs, stubs);
> > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
> > +unsigned long assign_stub_page(unsigned int cpu);
> > +void init_bsp_stub(void);
> 
> With init_bsp_stub(), assign_stub_page can be static and not exported.

Oh, nice one.

> 
> >   #endif /* X86_ASM_STUBS_H */
> 
> > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> > index b3045eac5b5e..dd0972a3025e 100644
> > --- a/xen/arch/x86/smpboot.c
> > +++ b/xen/arch/x86/smpboot.c
> > @@ -20,6 +20,7 @@
> >   #include <xen/serial.h>
> >   #include <xen/softirq.h>
> >   #include <xen/tasklet.h>
> > +#include <xen/xvmalloc.h>
> >   #include <asm/apic.h>
> >   #include <asm/cpuidle.h>
> > @@ -641,41 +642,61 @@ static int do_boot_cpu(int apicid, int cpu)
> >       return rc;
> >   }
> > -#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) * 
> > STUB_BUF_SIZE)
> > +/* Dynamically allocated, indexed by CPU.  Store physical address of 
> > stubs. */
> > +static paddr_t *__ro_after_init stubs;
> > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
> > +unsigned long assign_stub_page(unsigned int cpu)
> >   {
> >       unsigned long stub_va;
> > -    struct page_info *pg;
> > +    paddr_t addr = stubs[cpu];
> > -    BUILD_BUG_ON(STUBS_PER_PAGE & (STUBS_PER_PAGE - 1));
> > -
> > -    if ( *mfn )
> > -        pg = mfn_to_page(_mfn(*mfn));
> > -    else
> > +    if ( addr == INVALID_PADDR )
> >       {
> > -        nodeid_t node = cpu_to_node(cpu);
> > -        unsigned int memflags = node != NUMA_NO_NODE ? MEMF_node(node) : 0;
> > +        nodeid_t nid = cpu_to_node(cpu);
> > -        pg = alloc_domheap_page(NULL, memflags);
> > -        if ( !pg )
> > -            return 0;
> > +        /*
> > +         * Attempt to use the same page as the previous CPU if possible,
> > +         * otherwise allocate a new one.
> > +         */
> > +        if ( cpu && nid == cpu_to_node(cpu - 1) &&
> > +             PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE) )
>     PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE)
> is to ensure we it remains inside the allocated stub page?

Yup, if there's no offset it means we have consumed the full page, and
we need to allocate a new one (at least that was my intention).

> 
> > +            addr = stubs[cpu - 1] + STUB_BUF_SIZE;
> > +        else
> > +        {
> > +            struct page_info *pg = alloc_domheap_page(NULL, 
> > MEMF_node(nid));
> 
> > @@ -1092,15 +1106,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
> >       memcpy(per_cpu(idt, cpu), bsp_idt, sizeof(bsp_idt));
> >       disable_each_ist(per_cpu(idt, cpu));
> > -    for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
> > -          i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
> > -        if ( cpu_online(i) && cpu_to_node(i) == node )
> > -        {
> > -            per_cpu(stubs.mfn, cpu) = per_cpu(stubs.mfn, i);
> 
> This loop tries hard to re-use the same page for a NUMA node.  My posted
> approach will densely allocate the stubs.  Your approach would re-use less,
> unless the CPUs are contiguous in a node.

>From what I saw on a couple of systems, CPUs are contiguous inside a
node, for example:

(XEN) [ 2384.850395] CPU0...39 -> NODE0
(XEN) [ 2384.850397] CPU40...79 -> NODE1

> This is just an observation.  I have no idea how NUMA nodes are allocated.
> The "round robin" code in numa_init_array() made me worry that CPUs are more
> likely to be non-contiguous.

Well, the approach here would still work for non-contiguously assigned
CPUs, it would just be sub-optimal.  We can adjust if we ever find
such a system, but I think it's unlikely.

> If you have NUMA systems in XenRT, I think it would be worthwhile to test.
> Some printks to see how many pages are allocated would be useful.

Sure, I will give it a try across what we have on the lab.



> 
> Thanks,
> Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.