WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH 2/2] Add physical CPU hotplug support in PV_ops d

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH 2/2] Add physical CPU hotplug support in PV_ops dom0
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Thu, 24 Sep 2009 10:18:53 -0700
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Thu, 24 Sep 2009 10:19:28 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <E2263E4A5B2284449EEBD0AAB751098418E4231438@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E2263E4A5B2284449EEBD0AAB751098418E4231438@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090814 Fedora/3.0-2.6.b3.fc11 Lightning/1.0pre Thunderbird/3.0b3
On 09/24/09 08:31, Jiang, Yunhong wrote:
> This patch add physical CPU hotplug support to PV_ops dom0.
>   

Is there a patch 1/2 for the kernel?

> The basic workflow is:  after CPU hot add, instead of mark CPU to be present 
> to kernel, we will try to invoke hypercall to Xen hypervisor to mark this CPU 
> persent in hypervisor directly and then return. Xen hypervisor will send a 
> vIRQ notification, with this notification, dom0 will online the new CPUs.
>   
I'm a little confused here.  When you say "dom0 will online the new
CPUs", do you mean in the sense that Xen will start using them to
schedule vcpus contexts onto them?  Or do you mean that it will online
them within dom0 as new vcpus?

> We utilize processor external controller logic in pv_ops dom0 to add some 
> hook in dom0's ACPI code.
>
> We add /sys/device/system/xen_pcpu to present all physical CPU in the system, 
> so that we can utilize the udev rules to online the CPU after it is added, 
> this sepertaed directory will not cause conflict with vcpu hotplug logic. The 
> method is echo "1" > /sys/device/system/xen_pcpuXX/online, which is quite 
> similar to current vCPU online method, except in different sysfs entry.
>   

Yes, that makes sense.  Would it eventually be worth mirroring the
normal cpu heirarchy as much as possible to add an interface for stats
and power management?

And I guess offlining works as expected?

> Currently Linux kernel has a lot of callback for logical/physical CPU O*L, we 
> checked them and think the MCE/microcode callback is related with physical 
> CPU O*L, while all other callback is just for virtual CPU.
>   

I guess in general the microcode driver never cares about hotplug of vcpus.

Overall comment: this looks fairly reasonable.   I don't have any good
idea about how we're going to get these core acpi changes into an
upstreamable form, but it looks OK within the context of what we have.

Though I guess its another of those cases where ACPI is uncomfortably
straddling dom0 and xen's responsibilities.  Ideally Xen would handle
pcpu hotplug events by itself and then just notify dom0 to make any
policy decisions.

I'd like to see this split up into a few patches though:

    * generic core acpi change (get_acpi_id)
    * core pcpu hotplug
    * microcode changes
    * mce changes

You should also give the patches a going-over with checkpatch.pl.

More detailed comments below.

Thanks,
    J

>  arch/x86/kernel/microcode_xen.c  |    4 +-
>  drivers/acpi/processor_core.c    |   30 ++-
>  drivers/xen/Makefile             |    1 +
>  drivers/xen/acpi_processor.c     |   50 +++++-
>  drivers/xen/mce.c                |   75 ++++++--
>  drivers/xen/pcpu.c               |  406 
> ++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h             |    1 +
>  include/xen/interface/platform.h |   56 ++++++
>  include/xen/interface/xen.h      |    1 +
>  include/xen/pcpu.h               |    8 +
>  10 files changed, 601 insertions(+), 31 deletions(-)
>
> Signed-off-by: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx>
>
>
> diff --git a/arch/x86/kernel/microcode_xen.c b/arch/x86/kernel/microcode_xen.c
> index 397cba1..fed9368 100644
> --- a/arch/x86/kernel/microcode_xen.c
> +++ b/arch/x86/kernel/microcode_xen.c
> @@ -27,7 +27,7 @@ struct xen_microcode {
>         char data[0];
>  };
>
> -static int xen_microcode_update(int cpu)
> +int xen_microcode_update(int cpu)
>  {
>         int err;
>         struct xen_platform_op op;
> @@ -54,7 +54,7 @@ static int xen_microcode_update(int cpu)
>
>         return err;
>  }
> -
> +EXPORT_SYMBOL_GPL(xen_microcode_update);
>   

I think adding a new notifier would be better here, to avoid having an
explicit dependency between pcpu hotplug and the microcode driver.

If nothing else, I want to maintain the microcode driver and pcpu
hotplug on different branches, and having one patch touch both is
awkward to manage.

>  static enum ucode_state xen_request_microcode_fw(int cpu, struct device 
> *device)
>  {
>         char name[30];
> diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
> index 98010d5..69d45f6 100644
> --- a/drivers/acpi/processor_core.c
> +++ b/drivers/acpi/processor_core.c
> @@ -561,6 +561,17 @@ static int get_cpu_id(acpi_handle handle, int type, u32 
> acpi_id)
>         }
>         return -1;
>  }
> +
> +int get_apic_id(acpi_handle handle, int type, u32 acpi_id)
> +{
> +       int apic_id = -1;
> +
> +       apic_id = map_mat_entry(handle, type, acpi_id);
> +       if (apic_id == -1)
> +               apic_id = map_madt_entry(type, acpi_id);
> +       return apic_id;
> +}
> +EXPORT_SYMBOL_GPL(get_apic_id);
>  #endif
>
>  /* --------------------------------------------------------------------------
> @@ -647,7 +658,7 @@ static int acpi_processor_get_info(struct acpi_device 
> *device)
>          *  less than the max # of CPUs. They should be ignored _iff
>          *  they are physically not present.
>          */
> -       if (pr->id == -1) {
> +       if ( !xen_pv_domain() && (pr->id == -1) ) {
>                 if (ACPI_FAILURE
>                     (acpi_processor_hotadd_init(pr->handle, &pr->id))) {
>                         return -ENODEV;
> @@ -735,6 +746,10 @@ static int __cpuinit acpi_processor_start(struct 
> acpi_device *device)
>
>         per_cpu(processors, pr->id) = pr;
>
> +       if (processor_cntl_xen())
> +               processor_cntl_xen_notify(pr,
> +                               PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD);
> +
>         result = acpi_processor_add_fs(device);
>         if (result)
>                 goto end;
> @@ -974,10 +989,6 @@ int acpi_processor_device_add(acpi_handle handle, struct 
> acpi_device **device)
>         if (!pr)
>                 return -ENODEV;
>
> -       if (processor_cntl_xen())
> -               processor_cntl_xen_notify(pr,
> -                       PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD);
> -
>         if ((pr->id >= 0) && (pr->id < nr_cpu_ids)) {
>                 kobject_uevent(&(*device)->dev.kobj, KOBJ_ONLINE);
>         }
> @@ -1023,13 +1034,12 @@ static void __ref 
> acpi_processor_hotplug_notify(acpi_handle handle,
>
>                 if (pr->id >= 0 && (pr->id < nr_cpu_ids)) {
>                         kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
> +
> +                       if (processor_cntl_xen())
> +                               processor_cntl_xen_notify(pr, 
> PROCESSOR_HOTPLUG,
> +                                               HOTPLUG_TYPE_REMOVE);
>                         break;
>                 }
> -
> -               if (processor_cntl_xen())
> -                       processor_cntl_xen_notify(pr, PROCESSOR_HOTPLUG,
> -                                                       HOTPLUG_TYPE_REMOVE);
> -
>                 result = acpi_processor_start(device);
>                 if ((!result) && ((pr->id >= 0) && (pr->id < nr_cpu_ids))) {
>                         kobject_uevent(&device->dev.kobj, KOBJ_ONLINE);
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 8e7ce48..d3c6bb3 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -6,6 +6,7 @@ CFLAGS_features.o                       := $(nostackp)
>
>  obj-$(CONFIG_PCI)                      += pci.o
>  obj-$(CONFIG_HOTPLUG_CPU)              += cpu_hotplug.o
> +obj-$(CONFIG_HOTPLUG_CPU)              += pcpu.o
>   
Does this have any other dependencies?  CONFIG_HOTPLUG_CPU doesn't
depend on ACPI.

Perhaps adding a new CONFIG_XEN_HOTPLUG_PCPU config flag with the
appropriate dependencies would be better (which may or may not be
user-settable; I'd prefer not unless there's a good reason to want to
explicitly disable it).

>  obj-$(CONFIG_XEN_XENCOMM)              += xencomm.o
>  obj-$(CONFIG_XEN_BALLOON)              += balloon.o
>  obj-$(CONFIG_XEN_DEV_EVTCHN)           += evtchn.o
> diff --git a/drivers/xen/acpi_processor.c b/drivers/xen/acpi_processor.c
> index 6a4e8e4..a42a3d0 100644
> --- a/drivers/xen/acpi_processor.c
> +++ b/drivers/xen/acpi_processor.c
> @@ -32,6 +32,7 @@
>  #include <linux/cpufreq.h>
>  #include <acpi/processor.h>
>  #include <xen/acpi.h>
> +#include <xen/pcpu.h>
>
>  #include <asm/xen/hypercall.h>
>  #include <asm/xen/hypervisor.h>
> @@ -71,6 +72,8 @@ int processor_cntl_xen_power_cache(int cpu, int cx,
>
>  int processor_cntl_xen(void)
>  {
> +       if (!xen_initial_domain())
> +               return 0;
>         return 1;
>  }
>
> @@ -122,6 +125,7 @@ static int processor_notify_smm(void)
>  int processor_cntl_xen_notify(struct acpi_processor *pr, int event, int type)
>  {
>         int ret = -EINVAL;
> +       int apic_id;
>
>         switch (event) {
>         case PROCESSOR_PM_INIT:
> @@ -135,6 +139,8 @@ int processor_cntl_xen_notify(struct acpi_processor *pr, 
> int event, int type)
>         case PROCESSOR_HOTPLUG:
>                 if (xen_ops.hotplug)
>                         ret = xen_ops.hotplug(pr, type);
> +               apic_id = get_apic_id(pr->handle, 1, pr->acpi_id);
> +               xen_pcpu_hotplug(type, apic_id);
>                 break;
>         default:
>                 printk(KERN_ERR "Unsupport processor events %d.\n", event);
> @@ -423,9 +429,51 @@ static int xen_tx_notifier(struct acpi_processor *pr, 
> int action)
>  {
>         return -EINVAL;
>  }
> +
>  static int xen_hotplug_notifier(struct acpi_processor *pr, int event)
>  {
> -       return -EINVAL;
> +       int ret = -EINVAL;
> +       uint32_t apic_id;
> +       unsigned long long pxm;
> +       acpi_status status = 0;
> +
> +       xen_platform_op_t op = {
> +               .cmd                    = XENPF_resource_hotplug,
> +               .interface_version      = XENPF_INTERFACE_VERSION,
> +               .u.resource.u.sadd.acpi_id = pr->acpi_id,
> +               .u.resource.u.sadd.apic_id = pr->id,
> +       };
> +
> +       apic_id = get_apic_id(pr->handle, 1, pr->acpi_id);
> +       if (apic_id < 0)
> +       {
> +               printk(KERN_WARNING "Can't get apic_id for acpi_id %x\n", 
> pr->acpi_id);
> +               return -1;
> +       }
> +       op.u.resource.u.sadd.apic_id = apic_id;
> +
> +       status = acpi_evaluate_integer(pr->handle, "_PXM",
> +                       NULL, &pxm);
> +       if (ACPI_FAILURE(status))
> +       {
> +               printk(KERN_WARNING "can't get pxm for acpi_id %x\n", 
> pr->acpi_id);
> +               return -1;
> +       }
> +       op.u.resource.u.sadd.pxm = pxm;
> +
> +       switch (event)
> +       {
> +               case HOTPLUG_TYPE_ADD:
> +                       op.u.resource.sub_cmd   = XEN_CPU_add;
> +                       ret = HYPERVISOR_dom0_op(&op);
> +                       break;
> +               case HOTPLUG_TYPE_REMOVE:
> +                       op.u.resource.sub_cmd = XEN_CPU_remove;
> +                       ret = HYPERVISOR_dom0_op(&op);
> +                       break;
> +       }
> +
> +       return ret;
>  }
>
>  static int __init xen_acpi_processor_extcntl_init(void)
> diff --git a/drivers/xen/mce.c b/drivers/xen/mce.c
> index b354dc8..79616de 100644
> --- a/drivers/xen/mce.c
> +++ b/drivers/xen/mce.c
> @@ -45,9 +45,14 @@
>  static mc_info_t *g_mi;
>  static mcinfo_logical_cpu_t *g_physinfo;
>  static uint32_t ncpus;
> +static DEFINE_SPINLOCK(pcpu_spinlock);
>   

Split the mce changes out into a separate patch too, if possible.

> +
> +#define enter_pcpu_list()  spin_lock_irqsave(&pcpu_spinlock, flags);
> +#define leave_pcpu_list()  spin_unlock_irqrestore(&pcpu_spinlock, flags);
>   

Another definition of these?  Please, no.  (I found the
drivers/xen/pcpu.c versions of these first; see below.)

>  static int convert_log(struct mc_info *mi)
>  {
> +       unsigned long flags;
>         struct mcinfo_common *mic = NULL;
>         struct mcinfo_global *mc_global;
>         struct mcinfo_bank *mc_bank;
> @@ -61,6 +66,7 @@ static int convert_log(struct mc_info *mi)
>         mc_global = (struct mcinfo_global *)mic;
>         m.mcgstatus = mc_global->mc_gstatus;
>         m.apicid = mc_global->mc_apicid;
> +       enter_pcpu_list();
>         for (i = 0; i < ncpus; i++) {
>                 if (g_physinfo[i].mc_apicid == m.apicid) {
>                         found = 1;
> @@ -72,6 +78,8 @@ static int convert_log(struct mc_info *mi)
>         m.socketid = g_physinfo[i].mc_chipid;
>         m.cpu = m.extcpu = g_physinfo[i].mc_cpunr;
>         m.cpuvendor = (__u8)g_physinfo[i].mc_vendor;
> +       leave_pcpu_list();
> +
>         x86_mcinfo_lookup(&mic, mi, MC_TYPE_BANK);
>         do {
>                 if (mic == NULL || mic->size == 0)
> @@ -143,46 +151,77 @@ end:
>         return IRQ_HANDLED;
>  }
>
> -static int bind_virq_for_mce(void)
> +static int sync_pcpu_info(void)
>  {
> -       int ret;
> +       int ret = 0;
> +       unsigned long flags;
>         xen_mc_t mc_op;
> +       mcinfo_logical_cpu_t *pcpu_info = NULL;
>
> -       g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL);
> -
> -       if (!g_mi)
> -               return -ENOMEM;
> -
> +       enter_pcpu_list();
>         /* Fetch physical CPU Numbers */
>         mc_op.cmd = XEN_MC_physcpuinfo;
>         mc_op.interface_version = XEN_MCA_INTERFACE_VERSION;
> -       set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo);
> +       set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, NULL);
>         ret = HYPERVISOR_mca(&mc_op);
>         if (ret) {
>                 printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical CPU 
> numbers\n");
> -               kfree(g_mi);
> -               return ret;
> +        goto failed;
>         }
>
>         /* Fetch each CPU Physical Info for later reference*/
>         ncpus = mc_op.u.mc_physcpuinfo.ncpus;
> -       g_physinfo = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus,
> +       pcpu_info = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus,
>                                         GFP_KERNEL);
> -       if (!g_physinfo) {
> -               kfree(g_mi);
> -               return -ENOMEM;
> -       }
> -       set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo);
> +       if (!pcpu_info)
> +        goto failed;
> +
> +       set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, pcpu_info);
>         ret = HYPERVISOR_mca(&mc_op);
>         if (ret) {
>                 printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical CPUs 
> info\n");
> -               kfree(g_mi);
> +               kfree(pcpu_info);
> +               goto failed;
> +       }
> +       /* replace the old pointer */
> +       if (g_physinfo)
>                 kfree(g_physinfo);
> +       g_physinfo = pcpu_info;
> +       leave_pcpu_list();
> +       return 0;
> +failed:
> +       if (pcpu_info)
> +               kfree(pcpu_info);
> +       leave_pcpu_list();
> +       return ret;
> +}
> +
> +int xen_mce_virq_update_cpu(uint32_t xen_id)
> +{
> +       return sync_pcpu_info();
> +}
> +
> +EXPORT_SYMBOL(xen_mce_virq_update_cpu);
> +
> +static int bind_virq_for_mce(void)
> +{
> +       int ret;
> +       xen_mc_t mc_op;
> +
> +       g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL);
> +
> +       if (!g_mi)
> +               return -ENOMEM;
> +
> +       /* Fetch physical CPU Numbers */
> +       if ((ret = sync_pcpu_info()))
> +       {
> +               kfree(g_mi);
>                 return ret;
>         }
>
>         ret  = bind_virq_to_irqhandler(VIRQ_MCA, 0,
> -               mce_dom_interrupt, 0, "mce", NULL);
> +                       mce_dom_interrupt, 0, "mce", NULL);
>   

Don't make unrelated formatting changes.

>         if (ret < 0) {
>                 printk(KERN_ERR "MCE_DOM0_LOG: bind_virq for DOM0 failed\n");
> diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c
> new file mode 100644
> index 0000000..3872c20
> --- /dev/null
> +++ b/drivers/xen/pcpu.c
> @@ -0,0 +1,406 @@
> +/*
> + * pcpu.c - management physical cpu in dom0 environment
> + */
> +#include <linux/interrupt.h>
> +#include <linux/spinlock.h>
> +#include <asm/xen/hypervisor.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/cpu.h>
> +#include <xen/xenbus.h>
> +#include <xen/pcpu.h>
> +#include <xen/events.h>
> +#include <xen/acpi.h>
> +
> +void xen_pcpu_dpc(struct work_struct *work);
> +static struct sysdev_class xen_pcpu_sysdev_class = {
> +       .name = "xen_pcpu",
> +};
> +
> +DECLARE_WORK(work, xen_pcpu_dpc);
>   

static?  Also "work" is a very generic name; how about something a bit
more descriptive, or at least identifying.

> +static DEFINE_SPINLOCK(pcpu_spinlock);
> +
> +/* No need for irq disable since hotplug notify is in workqueue context */
> +#define enter_pcpu_list()  spin_lock(&pcpu_spinlock);
> +#define leave_pcpu_list()  spin_unlock(&pcpu_spinlock);
>   

Don't wrap spinlocks up like this.  Always leave them open-coded, or at
the very least put "lock"/"unlock" in the wrapper names.  I didn't
realize enter/leave meant lock/unlock on my first read over the code.

> +
> +struct pcpu
> +{
> +       struct list_head pcpu_list;
> +       struct sys_device sysdev;
> +       uint32_t xen_id;
> +       uint32_t apic_id;
> +#define PCPU_LOOPED 0x10000000
> +       uint32_t flags;
> +};
> +
> +struct xen_pcpus
> +{
> +       struct list_head list;
> +       int possible;
> +       int present;
> +};
> +struct xen_pcpus xen_pcpus;
> +
> +#define XEN_PCPU_ONLINE     0x01
> +#define XEN_PCPU_OFFLINE    0x02
> +#define XEN_PCPU_ADD        0x04
> +#define XEN_PCPU_REMOVE     0x08
> +
> +static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute 
> *attr,
> +               char *buf)
> +{
> +       struct pcpu *cpu = container_of(dev, struct pcpu, sysdev);
> +
> +       return sprintf(buf, "%u\n", cpu->flags & XEN_PCPU_FLAGS_ONLINE);
>   

This relies on XEN_PCPU_FLAGS_ONLINE equalling 1 to guarantee that the
file contains 0/1.  You should be more explicit about it by adding "!= 0".

> +}
> +
> +static int xen_pcpu_down(uint32_t xen_id)
> +{
> +       int ret;
> +       xen_platform_op_t op = {
> +               .cmd                    = XENPF_resource_hotplug,
> +               .interface_version      = XENPF_INTERFACE_VERSION,
> +               .u.resource.u.cpu_ol.cpuid = xen_id,
> +       };
> +
> +       op.u.resource.sub_cmd   = XEN_CPU_offline;
> +       ret = HYPERVISOR_dom0_op(&op);
> +       return ret;
> +}
> +
> +static int xen_pcpu_up(uint32_t xen_id)
> +{
> +       int ret;
> +       xen_platform_op_t op = {
> +               .cmd                    = XENPF_resource_hotplug,
> +               .interface_version      = XENPF_INTERFACE_VERSION,
> +               .u.resource.u.cpu_ol.cpuid = xen_id,
> +       };
> +
> +       op.u.resource.sub_cmd   = XEN_CPU_online;
> +       ret = HYPERVISOR_dom0_op(&op);
> +       return ret;
> +}
> +
> +static ssize_t __ref store_online(struct sys_device *dev,
> +               struct sysdev_attribute *attr,
> +               const char *buf, size_t count)
> +{
> +       struct pcpu *cpu = container_of(dev, struct pcpu, sysdev);
> +       ssize_t ret;
> +
> +       switch (buf[0]) {
> +               case '0':
> +                       ret = xen_pcpu_down(cpu->xen_id);
> +                       break;
> +               case '1':
> +                       ret = xen_pcpu_up(cpu->xen_id);
> +                       break;
> +               default:
> +                       ret = -EINVAL;
> +       }
> +
> +       if (ret >= 0)
> +               ret = count;
> +       return ret;
> +}
> +
> +static SYSDEV_ATTR(online, 0644, show_online, store_online);
> +
> +static int xen_free_physical_cpu(struct pcpu * pcpu)
> +{
> +       if (!pcpu)
> +               return 0;
> +
> +       sysdev_remove_file(&pcpu->sysdev, &attr_online);
> +       sysdev_unregister(&pcpu->sysdev);
> +       list_del(&pcpu->pcpu_list);
> +       kfree(pcpu);
> +
> +       return 0;
> +}
> +
> +static struct pcpu * xen_add_physical_cpu(struct xen_physical_cpuinfo *info)
> +{
> +       struct pcpu *cpu;
> +       int error;
> +
> +       cpu = kzalloc(sizeof(struct pcpu), GFP_KERNEL);
> +       if (!cpu)
> +               return NULL;
> +
> +       INIT_LIST_HEAD(&cpu->pcpu_list);
> +       cpu->xen_id = info->xen_cpuid;
> +       cpu->apic_id = info->apic_id;
> +       cpu->flags = info->flags;
> +
> +       cpu->sysdev.cls = &xen_pcpu_sysdev_class;
> +       cpu->sysdev.id = info->xen_cpuid;
> +       printk("get cpuid %x\n", cpu->sysdev.id);
> +
> +       error = sysdev_register(&cpu->sysdev);
> +       if (error)
> +       {
> +               printk("failed to register pcpu\n");
> +               kfree(cpu);
> +               return NULL;
> +       }
> +       sysdev_create_file(&cpu->sysdev, &attr_online);
> +       if (error)
> +       {
> +               printk("failed to create attr\n");
> +               sysdev_unregister(&cpu->sysdev);
> +               kfree(cpu);
> +               return NULL;
> +       }
> +
> +       return cpu;
> +}
> +
> +static struct xen_physical_cpuinfo * xen_fetch_cpu_info(int *num,
> +               int *possible)
> +{
> +       int cpu_num, ret = 0;
> +       struct xen_physical_cpuinfo *info;
> +       xen_platform_op_t op = {
> +               .cmd            = XENPF_get_cpuinfo,
> +               .interface_version  = XENPF_INTERFACE_VERSION,
> +               .u.pcpu_info.ncpus    = 0,
> +       };
> +
> +       set_xen_guest_handle(op.u.pcpu_info.info, NULL);
> +
> +       ret = HYPERVISOR_dom0_op(&op);
> +       if (ret)
> +               return NULL;
> +       cpu_num = op.u.pcpu_info.max_cpus;
> +
> +       info = kzalloc(cpu_num * sizeof(struct xen_physical_cpuinfo), 
> GFP_KERNEL);
> +       if (!info)
> +               return NULL;
> +       op.u.pcpu_info.ncpus = cpu_num;
> +       set_xen_guest_handle(op.u.pcpu_info.info, info);
> +
> +       ret = HYPERVISOR_dom0_op(&op);
> +       if (ret)
> +       {
> +               kfree(info);
> +               printk(KERN_WARNING "Error for pcpu info fetch\n");
> +               return NULL;
> +       }
> +       if (possible)
> +               *possible = op.u.pcpu_info.max_cpus;
> +       if (num)
> +               *num = op.u.pcpu_info.ncpus;
> +
> +       return info;
> +}
> +
> +static int xen_pcpu_callback(uint32_t xen_id, int action)
> +{
> +       switch (action)
> +       {
> +               case XEN_PCPU_ONLINE:
> +                       {
> +#ifdef CONFIG_MICROCODE_XEN
> +                               xen_microcode_update(xen_id);
> +#endif
> +#ifdef CONFIG_MCE_XEN
> +                               xen_mce_virq_update_cpu(xen_id);
> +#endif
> +                       }
> +                       break;
> +               case XEN_PCPU_OFFLINE:
> +                       {
> +#ifdef CONFIG_MCE_XEN
> +                               xen_mce_virq_update_cpu(xen_id);
> +#endif
>   

Yeah, these are definitely notifier material.

> +                       }
> +                       break;
> +               default:
> +                       break;
> +       }
> +       return 0;
> +}
> +
> +/*
> + * Sync dom0's pcpu information with xen hypervisor's
> + */
> +static int xen_pcpu_sync(void)
> +{
> +       struct xen_physical_cpuinfo *info;
> +       int cpu_num, i;
> +       struct list_head *elem, *tmp;
> +       struct pcpu *pcpu;
> +
> +       info = xen_fetch_cpu_info(&cpu_num, NULL);
> +
> +       if (!info)
> +               return IRQ_HANDLED;
> +       enter_pcpu_list();
> +
> +       /* Check for current cpu list */
> +       for (i = 0; i < cpu_num; i++)
> +       {
> +               int found = 0;
> +               list_for_each_entry(pcpu, &xen_pcpus.list, pcpu_list)
> +               {
> +                       if ( (pcpu->apic_id == info[i].apic_id) &&
> +                                       (pcpu->xen_id == info[i].xen_cpuid) )
> +                       {
>   

Could you hoist some of this inner code into helper functions to reduce
the nesting here?

> +                               found = 1;
> +                               if( (info[i].flags & XEN_PCPU_FLAGS_ONLINE) &&
> +                                               !(pcpu->flags 
> &XEN_PCPU_FLAGS_ONLINE))
> +                               {
> +                                       /* the pcpu is onlined now */
> +                                       pcpu->flags |= XEN_PCPU_FLAGS_ONLINE;
> +                                       kobject_uevent(&pcpu->sysdev.kobj, 
> KOBJ_ONLINE);
> +                                       xen_pcpu_callback(pcpu->xen_id, 
> XEN_PCPU_ONLINE);
> +                               }
> +                               else if ( !(info[i].flags & 
> XEN_PCPU_FLAGS_ONLINE) &&
> +                                               (pcpu->flags & 
> XEN_PCPU_FLAGS_ONLINE))
> +                               {
> +                                       /* The pcpu is offlined now */
> +                                       pcpu->flags &= ~XEN_PCPU_FLAGS_ONLINE;
> +                                       kobject_uevent(&pcpu->sysdev.kobj, 
> KOBJ_OFFLINE);
> +                                       xen_pcpu_callback(pcpu->xen_id, 
> XEN_PCPU_OFFLINE);
> +                               }
> +                               pcpu->flags |= PCPU_LOOPED;
> +                       }
> +               }
> +               if (!found)
> +               {
> +                       struct pcpu *cpu;
> +
> +                       if (info[i].flags & XEN_PCPU_FLAGS_ONLINE)
> +                               printk(KERN_WARNING "A hotadd cpu is onlined 
> also \n");
>   

This could do with some clarification.  Is this case that a newly added
pcpu already appears to be online?  Also it should be prefixed with some
clue about what code/operation is printing the message.

> +                       cpu = xen_add_physical_cpu(&info[i]);
> +                       if (cpu == NULL)
> +                               goto failed;
> +                       list_add_tail(&cpu->pcpu_list, &xen_pcpus.list);
> +                       xen_pcpu_callback(cpu->xen_id, XEN_PCPU_ADD);
> +                       cpu->flags |= PCPU_LOOPED;
> +               }
> +       }
> +
> +       list_for_each_safe(elem, tmp, &xen_pcpus.list)
> +       {
> +               pcpu = list_entry(elem, struct pcpu, pcpu_list);
> +               if (pcpu->flags & PCPU_LOOPED)
> +                       pcpu->flags &= ~PCPU_LOOPED;
> +               else
> +               {
> +                       /* The pcpu does not exist any more, remove it */
> +                       xen_free_physical_cpu(pcpu);
> +                       xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_REMOVE);
> +               }
> +       }
> +
> +failed:
> +       leave_pcpu_list();
> +       return 0;
> +}
> +
> +static int init_xen_pcpu_info(void)
>   
__init?

Also I prefer names of the form subsystem_action, so xen_pcpu_info_init().

> +{
> +       int possible, cpu_num, i;
> +       struct xen_physical_cpuinfo *info = NULL;
> +       struct list_head *elem, *tmp;
> +       struct pcpu *pcpu;
> +
> +       xen_pcpus.possible = xen_pcpus.present = 0;
> +       INIT_LIST_HEAD(&xen_pcpus.list);
> +       info = xen_fetch_cpu_info(&cpu_num, &possible);
> +       if (!info)
> +               return -1;
> +       xen_pcpus.possible = possible;
> +       xen_pcpus.present = cpu_num;
> +       enter_pcpu_list();
> +       for (i = 0; i < cpu_num; i++)
> +       {
> +               pcpu = xen_add_physical_cpu(&info[i]);
> +               if (!pcpu)
> +               {
> +                       leave_pcpu_list();
> +                       goto failed;
> +               }
> +               list_add_tail(&pcpu->pcpu_list, &xen_pcpus.list);
> +       }
> +       leave_pcpu_list();
> +       kfree(info);
> +
> +       return 0;
> +failed:
> +       list_for_each_safe(elem, tmp, &xen_pcpus.list)
> +       {
> +               pcpu = list_entry(elem, struct pcpu, pcpu_list);
> +               xen_free_physical_cpu(pcpu);
> +       }
> +       if (info)
> +               kfree(info);
> +       xen_pcpus.possible = xen_pcpus.present = 0;
> +       INIT_LIST_HEAD(&xen_pcpus.list);
> +       return -1;
> +}
> +
> +static irqreturn_t xen_pcpu_interrupt(int irq, void *dev_id)
> +{
> +       schedule_work(&work);
> +       return IRQ_HANDLED;
> +}
> +
> +/*
> + * type: 0 add, 1 remove
>   

Looks like "type" is really an enum.

> + */
> +int xen_pcpu_hotplug(int type, uint32_t apic_id)
> +{
> +       struct pcpu *pcpu;
> +       int found = 0;
> +
> +       xen_pcpu_sync();
> +       enter_pcpu_list();
> +       list_for_each_entry(pcpu, &xen_pcpus.list, pcpu_list)
> +       {
> +               if (pcpu->apic_id == apic_id)
> +               {
> +                       found = 1;
> +                       break;
> +               }
> +       }
> +       leave_pcpu_list();
> +
> +       if (!found && (type == HOTPLUG_TYPE_ADD))
> +               printk(KERN_WARNING "The cpu is not added into Xen HV?\n");
> +
> +       if (found && (type == HOTPLUG_TYPE_REMOVE) )
> +               printk(KERN_WARNING "The cpu still exits in Xen HV?\n");
> +       return 0;
> +}
> +EXPORT_SYMBOL(xen_pcpu_hotplug);
>   
_GPL?

> +
> +void xen_pcpu_dpc(struct work_struct *work)
> +{
> +       xen_pcpu_sync();
> +}
> +
> +static int __init xen_physical_cpu_init(void)
> +{
> +       int err;
> +
> +       if (!xen_initial_domain())
> +               return 0;
> +
> +       err = sysdev_class_register(&xen_pcpu_sysdev_class);
> +       if (err)
> +       {
> +               printk(KERN_WARNING "error to register sysdev for 
> xen_pcpu\n");
> +               return err;
> +       }
> +
> +       err = init_xen_pcpu_info();
> +       if (!err)
> +               bind_virq_to_irqhandler(VIRQ_PCPU_STATE, 0, 
> xen_pcpu_interrupt, 0, "pcpu", NULL);
>   

So when does this interrupt get raised?  Is the full flow:

   1. ACPI raises SCI
   2. dom0 catches that and gets the new pcpu event
   3. dom0 notifies Xen that the pcpu exists
   4. xen interrupts dom0 saying that a new pcpu exists
   5. dom0 processes the list and adds it to sysfs
   6. usermode onlines the cpu for Xen's use

?  Or does Xen interrupt dom0 once dom0 does the online?

> +       return err;
> +}
> +
> +subsys_initcall(xen_physical_cpu_init);
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 34321cf..e414fcc 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -85,6 +85,7 @@ int acpi_boot_init (void);
>  int acpi_boot_table_init (void);
>  int acpi_mps_check (void);
>  int acpi_numa_init (void);
> +int get_apic_id(acpi_handle handle, int type, u32 acpi_id);
>
>  int acpi_table_init (void);
>  int acpi_table_parse (char *id, acpi_table_handler handler);
> diff --git a/include/xen/interface/platform.h 
> b/include/xen/interface/platform.h
> index 6783fce..851a7c1 100644
> --- a/include/xen/interface/platform.h
> +++ b/include/xen/interface/platform.h
> @@ -312,6 +312,60 @@ struct xenpf_set_processor_pminfo {
>  typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t;
>  DEFINE_GUEST_HANDLE_STRUCT(xenpf_set_processor_pminfo);
>
> +struct xenpf_hotadd_cpu
> +{
> +       uint32_t apic_id;
> +       uint32_t acpi_id;
> +       uint32_t pxm;
> +};
> +
> +struct xenpf_hotremove_cpu
> +{
> +       uint32_t apic_id;
> +};
> +
> +struct xenpf_cpu_ol
> +{
> +       uint32_t cpuid;
> +};
> +
> +#define XENPF_resource_hotplug 55
> +struct xenpf_resource_hotplug {
> +       uint32_t sub_cmd;
> +#define XEN_CPU_add      1
> +#define XEN_CPU_remove   2
> +#define XEN_CPU_online      3
> +#define XEN_CPU_offline     4
> +       union {
> +               struct xenpf_hotadd_cpu sadd;
> +               struct xenpf_hotremove_cpu sremove;
> +               struct xenpf_cpu_ol   cpu_ol;
> +       }u;
> +};
> +
> +#define XENPF_get_cpuinfo 56
> +struct xen_physical_cpuinfo {
> +       uint32_t xen_cpuid;
> +       uint32_t apic_id;
> +#define XEN_PCPU_FLAGS_ONLINE 1
> +       uint32_t flags;
> +       uint32_t pad;
> +};
> +typedef struct xen_physical_cpuinfo xen_physical_cpuinfo_t;
> +DEFINE_GUEST_HANDLE_STRUCT(xen_physical_cpuinfo);
> +
> +struct xenpf_pcpu_info
> +{
> +       /* IN/OUT */
> +       uint32_t ncpus;
> +       /* OUT */
> +       /* The possible CPU */
> +       uint32_t max_cpus;
> +       GUEST_HANDLE(xen_physical_cpuinfo) info;
> +};
> +typedef struct xenpf_pcpu_info xenpf_pcpu_info_t;
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_pcpu_info);
> +
>  struct xen_platform_op {
>         uint32_t cmd;
>         uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
> @@ -327,6 +381,8 @@ struct xen_platform_op {
>                 struct xenpf_change_freq       change_freq;
>                 struct xenpf_getidletime       getidletime;
>                 struct xenpf_set_processor_pminfo set_pminfo;
> +               struct xenpf_resource_hotplug   resource;
> +               struct xenpf_pcpu_info          pcpu_info;
>                 uint8_t                        pad[128];
>         } u;
>  };
> diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
> index 812ffd5..c11920b 100644
> --- a/include/xen/interface/xen.h
> +++ b/include/xen/interface/xen.h
> @@ -79,6 +79,7 @@
>  #define VIRQ_CONSOLE    2  /* (DOM0) Bytes received on emergency console. */
>  #define VIRQ_DOM_EXC    3  /* (DOM0) Exceptional event for some domain.   */
>  #define VIRQ_DEBUGGER   6  /* (DOM0) A domain has paused for debugging.   */
> +#define VIRQ_PCPU_STATE 9  /* G. (DOM0) PCPU state changed                   
> */
>
>  /* Architecture-specific VIRQ definitions. */
>  #define VIRQ_ARCH_0    16
> diff --git a/include/xen/pcpu.h b/include/xen/pcpu.h
> new file mode 100644
> index 0000000..0d71149
> --- /dev/null
> +++ b/include/xen/pcpu.h
> @@ -0,0 +1,8 @@
> +#ifndef _XEN_PCPU_H
> +#define _XEN_PCPU_H
> +
> +extern int xen_pcpu_hotplug(int type, uint32_t apic_id);
> +extern int xen_mce_virq_update_cpu(uint32_t xen_id);
> +extern int xen_microcode_update(int cpu);
> +
> +#endif
>
>   


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel