|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 2/2] xen/x86: Change stub page allocation/free
On Wed, Jun 10, 2026 at 11:23:46AM -0400, Jason Andryuk wrote:
> On 2026-06-10 11:01, Roger Pau Monné wrote:
> > On Mon, Jun 08, 2026 at 08:06:38PM -0400, Jason Andryuk wrote:
> > > Today the inline tracking of the stub page is problematic. 0xcc is used
> > > to indicate unused, but it is also a "clear value." A !CONFIG_PV build
> > > with smt=0 will bring up CPU0, bring up CPU1, bring down CPU1, and free
> > > the in-use stub page. Subsequent CPU onlining can write to the re-used
> > > page.
> > >
> > > The new approach uses a global, CPU-indexed array of stub pages.
> > > However, to handle NUMA aware allocations, we cannot allocate all the
> > > pages in advance because the NUMA information is not available. Keep
> > > track of 1 current page for each NUMA node, allocated on demand, and
> > > allocate the stub buffers out of those pages.
> > >
> > > The current NUMA allocation approach is opportunistic sharing among the
> > > groups of 32 processors. The new approach will allocate buffers densely
> > > in a NUMA node.
> > >
> > > stub pages are no longer freed. They remain referenced in the global
> > > CPU-indexed array and are re-used if the CPU is re-onlined.
> > >
> > > stubs and node_stubs don't have an explicit lock. During boot they are
> > > accessed single threaded. During runtime, &cpu_add_remove_lock
> > > serializes access.
> > >
> > > Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
> > > Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
> > > ---
> > > I'm not sure how to test the NUMA part - I don't have an NUMA system.
> > > Also, if NUMA is active, is a cpu node of NUMA_NO_NODE still possible?
> > > I used the MAX_NUMNODES + 1 array sizing to handle that, but it's not
> > > obvious to me if that is necessary.
> > >
> > > Roger mentioned removing the per-cpu stubs.mfn. We'd need to replace
> > > that with exposing the stubs array for traps and the emulator. I have
> > > no idea if that will be an improvement and am looking for agreement on
> > > this patch before attempting.
> > > ---
> > > xen/arch/x86/include/asm/stubs.h | 2 +-
> > > xen/arch/x86/setup.c | 3 +-
> > > xen/arch/x86/smpboot.c | 110 +++++++++++++++++++++----------
> > > 3 files changed, 77 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/xen/arch/x86/include/asm/stubs.h
> > > b/xen/arch/x86/include/asm/stubs.h
> > > index a520928e9a..9d776f81dd 100644
> > > --- a/xen/arch/x86/include/asm/stubs.h
> > > +++ b/xen/arch/x86/include/asm/stubs.h
> > > @@ -32,6 +32,6 @@ struct stubs {
> > > };
> > > DECLARE_PER_CPU(struct stubs, stubs);
> > > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
> > > +unsigned long assign_stub_page(unsigned int cpu);
> > > #endif /* X86_ASM_STUBS_H */
> > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> > > index 19ee857abf..0cac94cbdb 100644
> > > --- a/xen/arch/x86/setup.c
> > > +++ b/xen/arch/x86/setup.c
> > > @@ -2089,8 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void)
> > > init_idle_domain();
> > > - this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
> > > - &this_cpu(stubs).mfn);
> > > + this_cpu(stubs.addr) = assign_stub_page(0);
> >
> > Given stub pages is first used quite late in the boot process, the above
> > arrays would better be dynamically allocated using xvmalloc_array().
>
> Ok. At some point I intended to dynamically allocate. But x86 doesn't have
> num_possible_cpus(), and I thought num_present_cpus() wouldn't have the
> correct value. nr_cpu_ids seemed close to the value, but then I convinced
> myself NR_CPUS would be okay.
I will double check, but I think num_present_cpus() accounts for the
maximum number of online CPUs possible at any point.
> > diff --git a/xen/arch/x86/include/asm/stubs.h
> > b/xen/arch/x86/include/asm/stubs.h
> > index a520928e9a50..d575f1eb0631 100644
> > --- a/xen/arch/x86/include/asm/stubs.h
> > +++ b/xen/arch/x86/include/asm/stubs.h
> > @@ -32,6 +32,7 @@ struct stubs {
> > };
> > DECLARE_PER_CPU(struct stubs, stubs);
> > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
> > +unsigned long assign_stub_page(unsigned int cpu);
> > +void init_bsp_stub(void);
>
> With init_bsp_stub(), assign_stub_page can be static and not exported.
Oh, nice one.
>
> > #endif /* X86_ASM_STUBS_H */
>
> > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> > index b3045eac5b5e..dd0972a3025e 100644
> > --- a/xen/arch/x86/smpboot.c
> > +++ b/xen/arch/x86/smpboot.c
> > @@ -20,6 +20,7 @@
> > #include <xen/serial.h>
> > #include <xen/softirq.h>
> > #include <xen/tasklet.h>
> > +#include <xen/xvmalloc.h>
> > #include <asm/apic.h>
> > #include <asm/cpuidle.h>
> > @@ -641,41 +642,61 @@ static int do_boot_cpu(int apicid, int cpu)
> > return rc;
> > }
> > -#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) *
> > STUB_BUF_SIZE)
> > +/* Dynamically allocated, indexed by CPU. Store physical address of
> > stubs. */
> > +static paddr_t *__ro_after_init stubs;
> > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
> > +unsigned long assign_stub_page(unsigned int cpu)
> > {
> > unsigned long stub_va;
> > - struct page_info *pg;
> > + paddr_t addr = stubs[cpu];
> > - BUILD_BUG_ON(STUBS_PER_PAGE & (STUBS_PER_PAGE - 1));
> > -
> > - if ( *mfn )
> > - pg = mfn_to_page(_mfn(*mfn));
> > - else
> > + if ( addr == INVALID_PADDR )
> > {
> > - nodeid_t node = cpu_to_node(cpu);
> > - unsigned int memflags = node != NUMA_NO_NODE ? MEMF_node(node) : 0;
> > + nodeid_t nid = cpu_to_node(cpu);
> > - pg = alloc_domheap_page(NULL, memflags);
> > - if ( !pg )
> > - return 0;
> > + /*
> > + * Attempt to use the same page as the previous CPU if possible,
> > + * otherwise allocate a new one.
> > + */
> > + if ( cpu && nid == cpu_to_node(cpu - 1) &&
> > + PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE) )
> PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE)
> is to ensure we it remains inside the allocated stub page?
Yup, if there's no offset it means we have consumed the full page, and
we need to allocate a new one (at least that was my intention).
>
> > + addr = stubs[cpu - 1] + STUB_BUF_SIZE;
> > + else
> > + {
> > + struct page_info *pg = alloc_domheap_page(NULL,
> > MEMF_node(nid));
>
> > @@ -1092,15 +1106,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
> > memcpy(per_cpu(idt, cpu), bsp_idt, sizeof(bsp_idt));
> > disable_each_ist(per_cpu(idt, cpu));
> > - for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
> > - i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
> > - if ( cpu_online(i) && cpu_to_node(i) == node )
> > - {
> > - per_cpu(stubs.mfn, cpu) = per_cpu(stubs.mfn, i);
>
> This loop tries hard to re-use the same page for a NUMA node. My posted
> approach will densely allocate the stubs. Your approach would re-use less,
> unless the CPUs are contiguous in a node.
>From what I saw on a couple of systems, CPUs are contiguous inside a
node, for example:
(XEN) [ 2384.850395] CPU0...39 -> NODE0
(XEN) [ 2384.850397] CPU40...79 -> NODE1
> This is just an observation. I have no idea how NUMA nodes are allocated.
> The "round robin" code in numa_init_array() made me worry that CPUs are more
> likely to be non-contiguous.
Well, the approach here would still work for non-contiguously assigned
CPUs, it would just be sub-optimal. We can adjust if we ever find
such a system, but I think it's unlikely.
> If you have NUMA systems in XenRT, I think it would be worthwhile to test.
> Some printks to see how many pages are allocated would be useful.
Sure, I will give it a try across what we have on the lab.
>
> Thanks,
> Jason
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |