Xen project Mailing List

Re: [PATCH v3 2/2] xen/x86: Change stub page allocation/free

To: Jason Andryuk <jason.andryuk@xxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 10 Jun 2026 20:20:02 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+zGDy93rB1DYgqXEzkIxkMsUHXhpUViBeCTDJx1Nj2M=; b=vD3+IkNimOFKKPeO4TwC+XwZm7AubKxgtFEYfQ1HNLzG0x7ULcYGIKK7TmnrQTpUhCJr1YaToDT+gZt6IL4VT2dhXzr8vr9Cw9FfiZhxEC8fsApg4sGlMTkAsV0MnNf3F2oRoKffUtFdB36/flErsruVmXT/L6izQ0IuOy1sFya55KvWRgT64fJJvAatjPk75ZsaRggVlhyoN2ieVClFC9yQ/0juk6JWKF9x8pe5zZPr/2Qamzr50oTwubbtKbjyaYuxKOyEWXVK4Lpw7DmJ72stufZCIBPsfkTdlbrWqHtCfDr9R4TRuiZe58uwK6/IBgA+AdEGk7bJU1JA3jQ0MQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=FbLr4AnaqRi692JDcIz+3YWLbJGQrXQ9WifI90DEIBN3V0QA/97/7gPdKqa/Xihv5NVf80EpoypoKUur4YJG22QOU2k4y3GjG5fEfhk4kbWaMkAXvp0CXQ1Z3zpYIO52s1hfASFXZC6tNqcMbjLnjTCFinHUYGVTM4VlWUWy7FME4u+tfqvBpwAVmoML7inTyBM41EnP4v43LADpX9x2N4CqyzGGn/iaU9p6rgH0+P+m60ZMTdsdJRAwfZ3G9hPpnEh8adW6+CqTTO9ll1+99sSjVN13UqEjwrFiELc9z66tyIR2M0pQMzrakwxn5hXIc4TEZRDmMcR/nMC1xwwAZg==

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>

Delivery-date: Wed, 10 Jun 2026 18:20:33 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 10, 2026 at 11:23:46AM -0400, Jason Andryuk wrote: > On 2026-06-10 11:01, Roger Pau Monné wrote: > > On Mon, Jun 08, 2026 at 08:06:38PM -0400, Jason Andryuk wrote: > > > Today the inline tracking of the stub page is problematic. 0xcc is used > > > to indicate unused, but it is also a "clear value." A !CONFIG_PV build > > > with smt=0 will bring up CPU0, bring up CPU1, bring down CPU1, and free > > > the in-use stub page. Subsequent CPU onlining can write to the re-used > > > page. > > > > > > The new approach uses a global, CPU-indexed array of stub pages. > > > However, to handle NUMA aware allocations, we cannot allocate all the > > > pages in advance because the NUMA information is not available. Keep > > > track of 1 current page for each NUMA node, allocated on demand, and > > > allocate the stub buffers out of those pages. > > > > > > The current NUMA allocation approach is opportunistic sharing among the > > > groups of 32 processors. The new approach will allocate buffers densely > > > in a NUMA node. > > > > > > stub pages are no longer freed. They remain referenced in the global > > > CPU-indexed array and are re-used if the CPU is re-onlined. > > > > > > stubs and node_stubs don't have an explicit lock. During boot they are > > > accessed single threaded. During runtime, &cpu_add_remove_lock > > > serializes access. > > > > > > Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack") > > > Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx> > > > --- > > > I'm not sure how to test the NUMA part - I don't have an NUMA system. > > > Also, if NUMA is active, is a cpu node of NUMA_NO_NODE still possible? > > > I used the MAX_NUMNODES + 1 array sizing to handle that, but it's not > > > obvious to me if that is necessary. > > > > > > Roger mentioned removing the per-cpu stubs.mfn. We'd need to replace > > > that with exposing the stubs array for traps and the emulator. I have > > > no idea if that will be an improvement and am looking for agreement on > > > this patch before attempting. > > > --- > > > xen/arch/x86/include/asm/stubs.h | 2 +- > > > xen/arch/x86/setup.c | 3 +- > > > xen/arch/x86/smpboot.c | 110 +++++++++++++++++++++---------- > > > 3 files changed, 77 insertions(+), 38 deletions(-) > > > > > > diff --git a/xen/arch/x86/include/asm/stubs.h > > > b/xen/arch/x86/include/asm/stubs.h > > > index a520928e9a..9d776f81dd 100644 > > > --- a/xen/arch/x86/include/asm/stubs.h > > > +++ b/xen/arch/x86/include/asm/stubs.h > > > @@ -32,6 +32,6 @@ struct stubs { > > > }; > > > DECLARE_PER_CPU(struct stubs, stubs); > > > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn); > > > +unsigned long assign_stub_page(unsigned int cpu); > > > #endif /* X86_ASM_STUBS_H */ > > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c > > > index 19ee857abf..0cac94cbdb 100644 > > > --- a/xen/arch/x86/setup.c > > > +++ b/xen/arch/x86/setup.c > > > @@ -2089,8 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void) > > > init_idle_domain(); > > > - this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(), > > > - &this_cpu(stubs).mfn); > > > + this_cpu(stubs.addr) = assign_stub_page(0); > > > > Given stub pages is first used quite late in the boot process, the above > > arrays would better be dynamically allocated using xvmalloc_array(). > > Ok. At some point I intended to dynamically allocate. But x86 doesn't have > num_possible_cpus(), and I thought num_present_cpus() wouldn't have the > correct value. nr_cpu_ids seemed close to the value, but then I convinced > myself NR_CPUS would be okay. I will double check, but I think num_present_cpus() accounts for the maximum number of online CPUs possible at any point. > > diff --git a/xen/arch/x86/include/asm/stubs.h > > b/xen/arch/x86/include/asm/stubs.h > > index a520928e9a50..d575f1eb0631 100644 > > --- a/xen/arch/x86/include/asm/stubs.h > > +++ b/xen/arch/x86/include/asm/stubs.h > > @@ -32,6 +32,7 @@ struct stubs { > > }; > > DECLARE_PER_CPU(struct stubs, stubs); > > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn); > > +unsigned long assign_stub_page(unsigned int cpu); > > +void init_bsp_stub(void); > > With init_bsp_stub(), assign_stub_page can be static and not exported. Oh, nice one. > > > #endif /* X86_ASM_STUBS_H */ > > > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c > > index b3045eac5b5e..dd0972a3025e 100644 > > --- a/xen/arch/x86/smpboot.c > > +++ b/xen/arch/x86/smpboot.c > > @@ -20,6 +20,7 @@ > > #include <xen/serial.h> > > #include <xen/softirq.h> > > #include <xen/tasklet.h> > > +#include <xen/xvmalloc.h> > > #include <asm/apic.h> > > #include <asm/cpuidle.h> > > @@ -641,41 +642,61 @@ static int do_boot_cpu(int apicid, int cpu) > > return rc; > > } > > -#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) * > > STUB_BUF_SIZE) > > +/* Dynamically allocated, indexed by CPU. Store physical address of > > stubs. */ > > +static paddr_t *__ro_after_init stubs; > > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn) > > +unsigned long assign_stub_page(unsigned int cpu) > > { > > unsigned long stub_va; > > - struct page_info *pg; > > + paddr_t addr = stubs[cpu]; > > - BUILD_BUG_ON(STUBS_PER_PAGE & (STUBS_PER_PAGE - 1)); > > - > > - if ( *mfn ) > > - pg = mfn_to_page(_mfn(*mfn)); > > - else > > + if ( addr == INVALID_PADDR ) > > { > > - nodeid_t node = cpu_to_node(cpu); > > - unsigned int memflags = node != NUMA_NO_NODE ? MEMF_node(node) : 0; > > + nodeid_t nid = cpu_to_node(cpu); > > - pg = alloc_domheap_page(NULL, memflags); > > - if ( !pg ) > > - return 0; > > + /* > > + * Attempt to use the same page as the previous CPU if possible, > > + * otherwise allocate a new one. > > + */ > > + if ( cpu && nid == cpu_to_node(cpu - 1) && > > + PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE) ) > PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE) > is to ensure we it remains inside the allocated stub page? Yup, if there's no offset it means we have consumed the full page, and we need to allocate a new one (at least that was my intention). > > > + addr = stubs[cpu - 1] + STUB_BUF_SIZE; > > + else > > + { > > + struct page_info *pg = alloc_domheap_page(NULL, > > MEMF_node(nid)); > > > @@ -1092,15 +1106,7 @@ static int cpu_smpboot_alloc(unsigned int cpu) > > memcpy(per_cpu(idt, cpu), bsp_idt, sizeof(bsp_idt)); > > disable_each_ist(per_cpu(idt, cpu)); > > - for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1); > > - i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i ) > > - if ( cpu_online(i) && cpu_to_node(i) == node ) > > - { > > - per_cpu(stubs.mfn, cpu) = per_cpu(stubs.mfn, i); > > This loop tries hard to re-use the same page for a NUMA node. My posted > approach will densely allocate the stubs. Your approach would re-use less, > unless the CPUs are contiguous in a node. >From what I saw on a couple of systems, CPUs are contiguous inside a node, for example: (XEN) [ 2384.850395] CPU0...39 -> NODE0 (XEN) [ 2384.850397] CPU40...79 -> NODE1 > This is just an observation. I have no idea how NUMA nodes are allocated. > The "round robin" code in numa_init_array() made me worry that CPUs are more > likely to be non-contiguous. Well, the approach here would still work for non-contiguously assigned CPUs, it would just be sub-optimal. We can adjust if we ever find such a system, but I think it's unlikely. > If you have NUMA systems in XenRT, I think it would be worthwhile to test. > Some printks to see how many pages are allocated would be useful. Sure, I will give it a try across what we have on the lab. > > Thanks, > Jason

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.