WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] RE: [PATCH 2/2] Add physical CPU hotplug support in PV_ops d

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "Yu, Ke" <ke.yu@xxxxxxxxx>
Subject: [Xen-devel] RE: [PATCH 2/2] Add physical CPU hotplug support in PV_ops dom0
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Fri, 25 Sep 2009 08:27:11 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: Keir, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Thu, 24 Sep 2009 17:28:12 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4ABBA9FD.6030808@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E2263E4A5B2284449EEBD0AAB751098418E4231438@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <4ABBA9FD.6030808@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Aco9OxvhLF23xpa6QVSX/9E3fva3swAN3DVQ
Thread-topic: [PATCH 2/2] Add physical CPU hotplug support in PV_ops dom0

Jeremy Fitzhardinge wrote:
> On 09/24/09 08:31, Jiang, Yunhong wrote:
>> This patch add physical CPU hotplug support to PV_ops dom0.
>>
>
> Is there a patch 1/2 for the kernel?

Seems I named both patch to be 2/2 :(
This is the only patch for kernel, while the other one is for Xen hypervisor.

>
>> The basic workflow is:  after CPU hot add, instead of mark
> CPU to be present to kernel, we will try to invoke hypercall
> to Xen hypervisor to mark this CPU persent in hypervisor
> directly and then return. Xen hypervisor will send a vIRQ
> notification, with this notification, dom0 will online the new CPUs.
>>
> I'm a little confused here.  When you say "dom0 will online the new
> CPUs", do you mean in the sense that Xen will start using them to
> schedule vcpus contexts onto them?  Or do you mean that it will online
> them within dom0 as new vcpus?

"Online the new CPUs" means dom0 will notify Xen to online the new CPU, so that 
they will be used for schedule.

>
>> We utilize processor external controller logic in pv_ops
> dom0 to add some hook in dom0's ACPI code.
>>
>> We add /sys/device/system/xen_pcpu to present all physical
> CPU in the system, so that we can utilize the udev rules to
> online the CPU after it is added, this sepertaed directory
> will not cause conflict with vcpu hotplug logic. The method is
> echo "1" > /sys/device/system/xen_pcpuXX/online, which is
> quite similar to current vCPU online method, except in
> different sysfs entry.
>>
>
> Yes, that makes sense.  Would it eventually be worth mirroring the
> normal cpu heirarchy as much as possible to add an interface for
> stats and power management?

As stated in the [patch 0/2] mail, the xen_pcpu is entry just for online now, 
but it maybe useful for other purpose.
I think it should be useful for hierarchy, stats. Ke should have more idea of 
the power management side. Do you mean I need do these staff in this patch, we 
will hold for future work?

>
> And I guess offlining works as expected?

Do you mean logical offlining or physical offlining? I think logical offlining 
is tested long before. For physical offlining, I implement it but I have no 
platform to test still.

In fact , it is  a tricky to physical offlining a cpu (or, more precisely 
speaking, socket) because that means we need offline all memory behind a 
socket. But it can be used for socket migration, i.e. put down the old CPU and 
bring up a spare CPU, in that situation, we need CPU offlining. We are still 
working on socket migration.


>
>> Currently Linux kernel has a lot of callback for
> logical/physical CPU O*L, we checked them and think the
> MCE/microcode callback is related with physical CPU O*L, while
> all other callback is just for virtual CPU.
>>
>
> I guess in general the microcode driver never cares about
> hotplug of vcpus.

Yes, also MCE.
And, are you sure currently the microcode driver really not called in hotplug 
of vcpus? Will vcpu online not trigger the CPU_online notifier?

>
> Overall comment: this looks fairly reasonable.   I don't have any good
> idea about how we're going to get these core acpi changes into an
> upstreamable form, but it looks OK within the context of what we have.
>
> Though I guess its another of those cases where ACPI is uncomfortably
> straddling dom0 and xen's responsibilities.  Ideally Xen would handle
> pcpu hotplug events by itself and then just notify dom0 to make any
> policy decisions.

Yes, ideally we should have xen handle the event, but xen have no idea of the 
DSDT :(

>
> I'd like to see this split up into a few patches though:
>
>    * generic core acpi change (get_acpi_id)
>    * core pcpu hotplug
>    * microcode changes
>    * mce changes

Sure.

>
> You should also give the patches a going-over with checkpatch.pl.
>
> More detailed comments below.
>
> Thanks,
>    J
>
>>  arch/x86/kernel/microcode_xen.c  |    4 +-
>>  drivers/acpi/processor_core.c    |   30 ++-
>>  drivers/xen/Makefile             |    1 +
>>  drivers/xen/acpi_processor.c     |   50 +++++-
>>  drivers/xen/mce.c                |   75 ++++++--
>>  drivers/xen/pcpu.c               |  406
> ++++++++++++++++++++++++++++++++++++++
>>  include/linux/acpi.h             |    1 +
>>  include/xen/interface/platform.h |   56 ++++++
>>  include/xen/interface/xen.h      |    1 +
>>  include/xen/pcpu.h               |    8 +
>>  10 files changed, 601 insertions(+), 31 deletions(-)
>>
>> Signed-off-by: Jiang, Yunhong <yunhong.jiang@xxxxxxxxx>
>>
>>
>> diff --git a/arch/x86/kernel/microcode_xen.c
> b/arch/x86/kernel/microcode_xen.c
>> index 397cba1..fed9368 100644
>> --- a/arch/x86/kernel/microcode_xen.c
>> +++ b/arch/x86/kernel/microcode_xen.c
>> @@ -27,7 +27,7 @@ struct xen_microcode {
>>         char data[0];
>>  };
>>
>> -static int xen_microcode_update(int cpu)
>> +int xen_microcode_update(int cpu)
>>  {
>>         int err;
>>         struct xen_platform_op op;
>> @@ -54,7 +54,7 @@ static int xen_microcode_update(int cpu)
>>
>>         return err;
>>  }
>> -
>> +EXPORT_SYMBOL_GPL(xen_microcode_update);
>>
>
> I think adding a new notifier would be better here, to avoid having an
> explicit dependency between pcpu hotplug and the microcode driver.
>
> If nothing else, I want to maintain the microcode driver and pcpu
> hotplug on different branches, and having one patch touch both is
> awkward to manage.

Yes, that's cleaner, I will try that way.

>
>>  static enum ucode_state xen_request_microcode_fw(int cpu, struct
>>         device *device)  { char name[30];
>> diff --git a/drivers/acpi/processor_core.c
> b/drivers/acpi/processor_core.c
>> index 98010d5..69d45f6 100644
>> --- a/drivers/acpi/processor_core.c
>> +++ b/drivers/acpi/processor_core.c
>> @@ -561,6 +561,17 @@ static int get_cpu_id(acpi_handle handle, int
>>         type, u32 acpi_id)         } return -1;
>>  }
>> +
>> +int get_apic_id(acpi_handle handle, int type, u32 acpi_id) +{
>> +       int apic_id = -1;
>> +
>> +       apic_id = map_mat_entry(handle, type, acpi_id); +       if
>> (apic_id == -1) +               apic_id = map_madt_entry(type,
>> acpi_id); +       return apic_id; +}
>> +EXPORT_SYMBOL_GPL(get_apic_id);
>>  #endif
>>
>>  /*
> ---------------------------------------------------------------
> -----------
>> @@ -647,7 +658,7 @@ static int
> acpi_processor_get_info(struct acpi_device *device)
>>          *  less than the max # of CPUs. They should be ignored _iff
>>          *  they are physically not present.
>>          */
>> -       if (pr->id == -1) {
>> +       if ( !xen_pv_domain() && (pr->id == -1) ) {
>>                 if (ACPI_FAILURE
>>                     (acpi_processor_hotadd_init(pr->handle,
>>                         &pr->id))) { return -ENODEV;
>> @@ -735,6 +746,10 @@ static int __cpuinit
> acpi_processor_start(struct acpi_device *device)
>>
>>         per_cpu(processors, pr->id) = pr;
>>
>> +       if (processor_cntl_xen())
>> +               processor_cntl_xen_notify(pr,
>> +                               PROCESSOR_HOTPLUG,
>>         HOTPLUG_TYPE_ADD); + result = acpi_processor_add_fs(device);
>>         if (result)
>>                 goto end;
>> @@ -974,10 +989,6 @@ int
> acpi_processor_device_add(acpi_handle handle, struct
> acpi_device **device)
>>         if (!pr)
>>                 return -ENODEV;
>>
>> -       if (processor_cntl_xen())
>> -               processor_cntl_xen_notify(pr,
>> -                       PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD); -
>>         if ((pr->id >= 0) && (pr->id < nr_cpu_ids)) {
>>                 kobject_uevent(&(*device)->dev.kobj, KOBJ_ONLINE);
>> } @@ -1023,13 +1034,12 @@ static void __ref
> acpi_processor_hotplug_notify(acpi_handle handle,
>>
>>                 if (pr->id >= 0 && (pr->id < nr_cpu_ids)) {
>>                         kobject_uevent(&device->dev.kobj,
>> KOBJ_OFFLINE); + +                       if (processor_cntl_xen())
>> +
> processor_cntl_xen_notify(pr, PROCESSOR_HOTPLUG,
>> +                                               HOTPLUG_TYPE_REMOVE);
>>                         break;
>>                 }
>> -
>> -               if (processor_cntl_xen())
>> -                       processor_cntl_xen_notify(pr,
>> PROCESSOR_HOTPLUG, -
> HOTPLUG_TYPE_REMOVE);
>> -
>>                 result = acpi_processor_start(device);
>>                 if ((!result) && ((pr->id >= 0) && (pr->id <
>>                         nr_cpu_ids))) {
>> kobject_uevent(&device->dev.kobj, KOBJ_ONLINE);
>> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
>> index 8e7ce48..d3c6bb3 100644
>> --- a/drivers/xen/Makefile
>> +++ b/drivers/xen/Makefile
>> @@ -6,6 +6,7 @@ CFLAGS_features.o                       :=
>> $(nostackp)
>>
>>  obj-$(CONFIG_PCI)                      += pci.o
>>  obj-$(CONFIG_HOTPLUG_CPU)              += cpu_hotplug.o
>> +obj-$(CONFIG_HOTPLUG_CPU)              += pcpu.o
>>
> Does this have any other dependencies?  CONFIG_HOTPLUG_CPU doesn't
> depend on ACPI.
>
> Perhaps adding a new CONFIG_XEN_HOTPLUG_PCPU config flag with the
> appropriate dependencies would be better (which may or may not be
> user-settable; I'd prefer not unless there's a good reason to want to
> explicitly disable it).

Yes, thanks for suggestion.

>
>>  obj-$(CONFIG_XEN_XENCOMM)              += xencomm.o
>>  obj-$(CONFIG_XEN_BALLOON)              += balloon.o
>>  obj-$(CONFIG_XEN_DEV_EVTCHN)           += evtchn.o
>> diff --git a/drivers/xen/acpi_processor.c
>> b/drivers/xen/acpi_processor.c index 6a4e8e4..a42a3d0 100644 ---
>> a/drivers/xen/acpi_processor.c +++ b/drivers/xen/acpi_processor.c
>> @@ -32,6 +32,7 @@
>>  #include <linux/cpufreq.h>
>>  #include <acpi/processor.h>
>>  #include <xen/acpi.h>
>> +#include <xen/pcpu.h>
>>
>>  #include <asm/xen/hypercall.h>
>>  #include <asm/xen/hypervisor.h>
>> @@ -71,6 +72,8 @@ int processor_cntl_xen_power_cache(int cpu, int cx,
>>
>>  int processor_cntl_xen(void)
>>  {
>> +       if (!xen_initial_domain())
>> +               return 0;
>>         return 1;
>>  }
>>
>> @@ -122,6 +125,7 @@ static int processor_notify_smm(void)
>>  int processor_cntl_xen_notify(struct acpi_processor *pr, int event,
>>         int type)  { int ret = -EINVAL;
>> +       int apic_id;
>>
>>         switch (event) {
>>         case PROCESSOR_PM_INIT:
>> @@ -135,6 +139,8 @@ int processor_cntl_xen_notify(struct
> acpi_processor *pr, int event, int type)
>>         case PROCESSOR_HOTPLUG:
>>                 if (xen_ops.hotplug)
>>                         ret = xen_ops.hotplug(pr, type);
>> +               apic_id = get_apic_id(pr->handle, 1, pr->acpi_id);
>> +               xen_pcpu_hotplug(type, apic_id);
>>                 break;
>>         default:
>>                 printk(KERN_ERR "Unsupport processor events %d.\n",
>> event); @@ -423,9 +429,51 @@ static int xen_tx_notifier(struct
> acpi_processor *pr, int action)
>>  {
>>         return -EINVAL;
>>  }
>> +
>>  static int xen_hotplug_notifier(struct acpi_processor *pr, int
>> event)  { -       return -EINVAL;
>> +       int ret = -EINVAL;
>> +       uint32_t apic_id;
>> +       unsigned long long pxm;
>> +       acpi_status status = 0;
>> +
>> +       xen_platform_op_t op = {
>> +               .cmd                    = XENPF_resource_hotplug,
>> +               .interface_version      = XENPF_INTERFACE_VERSION,
>> +               .u.resource.u.sadd.acpi_id = pr->acpi_id,
>> +               .u.resource.u.sadd.apic_id = pr->id, +       };
>> +
>> +       apic_id = get_apic_id(pr->handle, 1, pr->acpi_id); +
>> if (apic_id < 0) +       {
>> +               printk(KERN_WARNING "Can't get apic_id for acpi_id
>> %x\n", pr->acpi_id); +               return -1; +       }
>> +       op.u.resource.u.sadd.apic_id = apic_id;
>> +
>> +       status = acpi_evaluate_integer(pr->handle, "_PXM",
>> +                       NULL, &pxm);
>> +       if (ACPI_FAILURE(status))
>> +       {
>> +               printk(KERN_WARNING "can't get pxm for acpi_id
>> %x\n", pr->acpi_id); +               return -1; +       }
>> +       op.u.resource.u.sadd.pxm = pxm;
>> +
>> +       switch (event)
>> +       {
>> +               case HOTPLUG_TYPE_ADD:
>> +                       op.u.resource.sub_cmd   = XEN_CPU_add;
>> +                       ret = HYPERVISOR_dom0_op(&op);
>> +                       break;
>> +               case HOTPLUG_TYPE_REMOVE:
>> +                       op.u.resource.sub_cmd = XEN_CPU_remove;
>> +                       ret = HYPERVISOR_dom0_op(&op);
>> +                       break;
>> +       }
>> +
>> +       return ret;
>>  }
>>
>>  static int __init xen_acpi_processor_extcntl_init(void)
>> diff --git a/drivers/xen/mce.c b/drivers/xen/mce.c
>> index b354dc8..79616de 100644
>> --- a/drivers/xen/mce.c
>> +++ b/drivers/xen/mce.c
>> @@ -45,9 +45,14 @@
>>  static mc_info_t *g_mi;
>>  static mcinfo_logical_cpu_t *g_physinfo;
>>  static uint32_t ncpus;
>> +static DEFINE_SPINLOCK(pcpu_spinlock);
>>
>
> Split the mce changes out into a separate patch too, if possible.

After using the notifier, yes, we can possible do that.
In fact, I'm considering whether xen's mce driver need hypercall to get the 
pcpu information, if we can export all the information through pcpu interface, 
I will check it.

>
>> +
>> +#define enter_pcpu_list()  spin_lock_irqsave(&pcpu_spinlock, flags);
>> +#define leave_pcpu_list()
> spin_unlock_irqrestore(&pcpu_spinlock, flags);
>>
>
> Another definition of these?  Please, no.  (I found the
> drivers/xen/pcpu.c versions of these first; see below.)

It should be static.

>
>>  static int convert_log(struct mc_info *mi)
>>  {
>> +       unsigned long flags;
>>         struct mcinfo_common *mic = NULL;
>>         struct mcinfo_global *mc_global;
>>         struct mcinfo_bank *mc_bank;
>> @@ -61,6 +66,7 @@ static int convert_log(struct mc_info *mi)
>>         mc_global = (struct mcinfo_global *)mic;
>>         m.mcgstatus = mc_global->mc_gstatus;
>>         m.apicid = mc_global->mc_apicid;
>> +       enter_pcpu_list();
>>         for (i = 0; i < ncpus; i++) {
>>                 if (g_physinfo[i].mc_apicid == m.apicid) {
>>                         found = 1;
>> @@ -72,6 +78,8 @@ static int convert_log(struct mc_info *mi)
>>         m.socketid = g_physinfo[i].mc_chipid;
>>         m.cpu = m.extcpu = g_physinfo[i].mc_cpunr;
>>         m.cpuvendor = (__u8)g_physinfo[i].mc_vendor; +
>> leave_pcpu_list(); +
>>         x86_mcinfo_lookup(&mic, mi, MC_TYPE_BANK);
>>         do {
>>                 if (mic == NULL || mic->size == 0)
>> @@ -143,46 +151,77 @@ end:
>>         return IRQ_HANDLED;
>>  }
>>
>> -static int bind_virq_for_mce(void)
>> +static int sync_pcpu_info(void)
>>  {
>> -       int ret;
>> +       int ret = 0;
>> +       unsigned long flags;
>>         xen_mc_t mc_op;
>> +       mcinfo_logical_cpu_t *pcpu_info = NULL;
>>
>> -       g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL); -
>> -       if (!g_mi)
>> -               return -ENOMEM;
>> -
>> +       enter_pcpu_list();
>>         /* Fetch physical CPU Numbers */
>>         mc_op.cmd = XEN_MC_physcpuinfo;
>>         mc_op.interface_version = XEN_MCA_INTERFACE_VERSION;
>> -       set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info,
>> g_physinfo); +
>>         set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, NULL); ret
>>         = HYPERVISOR_mca(&mc_op); if (ret) {
>>                 printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical
>> CPU numbers\n");
>> -               kfree(g_mi);
>> -               return ret;
>> +        goto failed;
>>         }
>>
>>         /* Fetch each CPU Physical Info for later reference*/
>>         ncpus = mc_op.u.mc_physcpuinfo.ncpus;
>> -       g_physinfo = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus,
>> +       pcpu_info = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus,
>>                                         GFP_KERNEL);
>> -       if (!g_physinfo) {
>> -               kfree(g_mi);
>> -               return -ENOMEM;
>> -       }
>> -       set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info,
>> g_physinfo); +       if (!pcpu_info) +        goto failed;
>> +
>> +       set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, pcpu_info);
>>         ret = HYPERVISOR_mca(&mc_op);
>>         if (ret) {
>>                 printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical
>> CPUs info\n"); -               kfree(g_mi); +
>> kfree(pcpu_info); +               goto failed;
>> +       }
>> +       /* replace the old pointer */
>> +       if (g_physinfo)
>>                 kfree(g_physinfo);
>> +       g_physinfo = pcpu_info;
>> +       leave_pcpu_list();
>> +       return 0;
>> +failed:
>> +       if (pcpu_info)
>> +               kfree(pcpu_info);
>> +       leave_pcpu_list();
>> +       return ret;
>> +}
>> +
>> +int xen_mce_virq_update_cpu(uint32_t xen_id)
>> +{
>> +       return sync_pcpu_info();
>> +}
>> +
>> +EXPORT_SYMBOL(xen_mce_virq_update_cpu);
>> +
>> +static int bind_virq_for_mce(void)
>> +{
>> +       int ret;
>> +       xen_mc_t mc_op;
>> +
>> +       g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL); +
>> +       if (!g_mi)
>> +               return -ENOMEM;
>> +
>> +       /* Fetch physical CPU Numbers */
>> +       if ((ret = sync_pcpu_info()))
>> +       {
>> +               kfree(g_mi);
>>                 return ret;
>>         }
>>
>>         ret  = bind_virq_to_irqhandler(VIRQ_MCA, 0,
>> -               mce_dom_interrupt, 0, "mce", NULL);
>> +                       mce_dom_interrupt, 0, "mce", NULL);
>>
>
> Don't make unrelated formatting changes.
>
>>         if (ret < 0) {
>>                 printk(KERN_ERR "MCE_DOM0_LOG: bind_virq for DOM0
>> failed\n"); diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c
>> new file mode 100644
>> index 0000000..3872c20
>> --- /dev/null
>> +++ b/drivers/xen/pcpu.c
>> @@ -0,0 +1,406 @@
>> +/*
>> + * pcpu.c - management physical cpu in dom0 environment + */
>> +#include <linux/interrupt.h>
>> +#include <linux/spinlock.h>
>> +#include <asm/xen/hypervisor.h>
>> +#include <asm/xen/hypercall.h>
>> +#include <asm/cpu.h>
>> +#include <xen/xenbus.h>
>> +#include <xen/pcpu.h>
>> +#include <xen/events.h>
>> +#include <xen/acpi.h>
>> +
>> +void xen_pcpu_dpc(struct work_struct *work);
>> +static struct sysdev_class xen_pcpu_sysdev_class = { +       .name
>> = "xen_pcpu", +};
>> +
>> +DECLARE_WORK(work, xen_pcpu_dpc);
>>
>
> static?  Also "work" is a very generic name; how about something a bit
> more descriptive, or at least identifying.

Sure.

>
>> +static DEFINE_SPINLOCK(pcpu_spinlock);
>> +
>> +/* No need for irq disable since hotplug notify is in workqueue
>> context */ +#define enter_pcpu_list()  spin_lock(&pcpu_spinlock);
>> +#define leave_pcpu_list()  spin_unlock(&pcpu_spinlock);
>>
>
> Don't wrap spinlocks up like this.  Always leave them open-coded, or
> at the very least put "lock"/"unlock" in the wrapper names.  I didn't
> realize enter/leave meant lock/unlock on my first read over the code.
>
>> +
>> +struct pcpu
>> +{
>> +       struct list_head pcpu_list;
>> +       struct sys_device sysdev;
>> +       uint32_t xen_id;
>> +       uint32_t apic_id;
>> +#define PCPU_LOOPED 0x10000000
>> +       uint32_t flags;
>> +};
>> +
>> +struct xen_pcpus
>> +{
>> +       struct list_head list;
>> +       int possible;
>> +       int present;
>> +};
>> +struct xen_pcpus xen_pcpus;
>> +
>> +#define XEN_PCPU_ONLINE     0x01
>> +#define XEN_PCPU_OFFLINE    0x02
>> +#define XEN_PCPU_ADD        0x04
>> +#define XEN_PCPU_REMOVE     0x08
>> +
>> +static ssize_t show_online(struct sys_device *dev, struct
>> sysdev_attribute *attr, +               char *buf) +{
>> +       struct pcpu *cpu = container_of(dev, struct pcpu, sysdev); +
>> +       return sprintf(buf, "%u\n", cpu->flags &
>> XEN_PCPU_FLAGS_ONLINE);
>>
>
> This relies on XEN_PCPU_FLAGS_ONLINE equalling 1 to guarantee that the
> file contains 0/1.  You should be more explicit about it by
> adding "!= 0".

Sure.

>
>> +}
>> +
>> +static int xen_pcpu_down(uint32_t xen_id)
>> +{
>> +       int ret;
>> +       xen_platform_op_t op = {
>> +               .cmd                    = XENPF_resource_hotplug,
>> +               .interface_version      = XENPF_INTERFACE_VERSION,
>> +               .u.resource.u.cpu_ol.cpuid = xen_id, +       };
>> +
>> +       op.u.resource.sub_cmd   = XEN_CPU_offline;
>> +       ret = HYPERVISOR_dom0_op(&op);
>> +       return ret;
>> +}
>> +
>> +static int xen_pcpu_up(uint32_t xen_id)
>> +{
>> +       int ret;
>> +       xen_platform_op_t op = {
>> +               .cmd                    = XENPF_resource_hotplug,
>> +               .interface_version      = XENPF_INTERFACE_VERSION,
>> +               .u.resource.u.cpu_ol.cpuid = xen_id, +       };
>> +
>> +       op.u.resource.sub_cmd   = XEN_CPU_online;
>> +       ret = HYPERVISOR_dom0_op(&op);
>> +       return ret;
>> +}
>> +
>> +static ssize_t __ref store_online(struct sys_device *dev,
>> +               struct sysdev_attribute *attr,
>> +               const char *buf, size_t count)
>> +{
>> +       struct pcpu *cpu = container_of(dev, struct pcpu, sysdev); +
>> ssize_t ret; +
>> +       switch (buf[0]) {
>> +               case '0':
>> +                       ret = xen_pcpu_down(cpu->xen_id);
>> +                       break;
>> +               case '1':
>> +                       ret = xen_pcpu_up(cpu->xen_id);
>> +                       break;
>> +               default:
>> +                       ret = -EINVAL;
>> +       }
>> +
>> +       if (ret >= 0)
>> +               ret = count;
>> +       return ret;
>> +}
>> +
>> +static SYSDEV_ATTR(online, 0644, show_online, store_online); +
>> +static int xen_free_physical_cpu(struct pcpu * pcpu) +{
>> +       if (!pcpu)
>> +               return 0;
>> +
>> +       sysdev_remove_file(&pcpu->sysdev, &attr_online);
>> +       sysdev_unregister(&pcpu->sysdev);
>> +       list_del(&pcpu->pcpu_list);
>> +       kfree(pcpu);
>> +
>> +       return 0;
>> +}
>> +
>> +static struct pcpu * xen_add_physical_cpu(struct
>> xen_physical_cpuinfo *info) +{ +       struct pcpu *cpu;
>> +       int error;
>> +
>> +       cpu = kzalloc(sizeof(struct pcpu), GFP_KERNEL); +       if
>> (!cpu) +               return NULL;
>> +
>> +       INIT_LIST_HEAD(&cpu->pcpu_list);
>> +       cpu->xen_id = info->xen_cpuid;
>> +       cpu->apic_id = info->apic_id;
>> +       cpu->flags = info->flags;
>> +
>> +       cpu->sysdev.cls = &xen_pcpu_sysdev_class;
>> +       cpu->sysdev.id = info->xen_cpuid;
>> +       printk("get cpuid %x\n", cpu->sysdev.id);
>> +
>> +       error = sysdev_register(&cpu->sysdev);
>> +       if (error)
>> +       {
>> +               printk("failed to register pcpu\n"); +
>> kfree(cpu); +               return NULL;
>> +       }
>> +       sysdev_create_file(&cpu->sysdev, &attr_online); +       if
>> (error) +       {
>> +               printk("failed to create attr\n");
>> +               sysdev_unregister(&cpu->sysdev);
>> +               kfree(cpu);
>> +               return NULL;
>> +       }
>> +
>> +       return cpu;
>> +}
>> +
>> +static struct xen_physical_cpuinfo * xen_fetch_cpu_info(int *num,
>> +               int *possible)
>> +{
>> +       int cpu_num, ret = 0;
>> +       struct xen_physical_cpuinfo *info;
>> +       xen_platform_op_t op = {
>> +               .cmd            = XENPF_get_cpuinfo,
>> +               .interface_version  = XENPF_INTERFACE_VERSION,
>> +               .u.pcpu_info.ncpus    = 0,
>> +       };
>> +
>> +       set_xen_guest_handle(op.u.pcpu_info.info, NULL); +
>> +       ret = HYPERVISOR_dom0_op(&op);
>> +       if (ret)
>> +               return NULL;
>> +       cpu_num = op.u.pcpu_info.max_cpus;
>> +
>> +       info = kzalloc(cpu_num * sizeof(struct
> xen_physical_cpuinfo), GFP_KERNEL);
>> +       if (!info)
>> +               return NULL;
>> +       op.u.pcpu_info.ncpus = cpu_num;
>> +       set_xen_guest_handle(op.u.pcpu_info.info, info); +
>> +       ret = HYPERVISOR_dom0_op(&op);
>> +       if (ret)
>> +       {
>> +               kfree(info);
>> +               printk(KERN_WARNING "Error for pcpu info fetch\n");
>> +               return NULL; +       }
>> +       if (possible)
>> +               *possible = op.u.pcpu_info.max_cpus; +       if (num)
>> +               *num = op.u.pcpu_info.ncpus;
>> +
>> +       return info;
>> +}
>> +
>> +static int xen_pcpu_callback(uint32_t xen_id, int action) +{
>> +       switch (action)
>> +       {
>> +               case XEN_PCPU_ONLINE:
>> +                       {
>> +#ifdef CONFIG_MICROCODE_XEN
>> +                               xen_microcode_update(xen_id); +#endif
>> +#ifdef CONFIG_MCE_XEN
>> +                               xen_mce_virq_update_cpu(xen_id);
>> +#endif +                       }
>> +                       break;
>> +               case XEN_PCPU_OFFLINE:
>> +                       {
>> +#ifdef CONFIG_MCE_XEN
>> +                               xen_mce_virq_update_cpu(xen_id);
>> +#endif
>>
>
> Yeah, these are definitely notifier material.

Ok, I will do it that way.

>
>> +                       }
>> +                       break;
>> +               default:
>> +                       break;
>> +       }
>> +       return 0;
>> +}
>> +
>> +/*
>> + * Sync dom0's pcpu information with xen hypervisor's + */
>> +static int xen_pcpu_sync(void)
>> +{
>> +       struct xen_physical_cpuinfo *info;
>> +       int cpu_num, i;
>> +       struct list_head *elem, *tmp;
>> +       struct pcpu *pcpu;
>> +
>> +       info = xen_fetch_cpu_info(&cpu_num, NULL);
>> +
>> +       if (!info)
>> +               return IRQ_HANDLED;
>> +       enter_pcpu_list();
>> +
>> +       /* Check for current cpu list */
>> +       for (i = 0; i < cpu_num; i++)
>> +       {
>> +               int found = 0;
>> +               list_for_each_entry(pcpu, &xen_pcpus.list,
>> pcpu_list) +               { +                       if (
>> (pcpu->apic_id == info[i].apic_id) && +
>> (pcpu->xen_id == info[i].xen_cpuid) ) +                       {
>>
>
> Could you hoist some of this inner code into helper functions to
> reduce the nesting here?

Yes.

>
>> +                               found = 1;
>> +                               if( (info[i].flags &
>> XEN_PCPU_FLAGS_ONLINE) && +
> !(pcpu->flags &XEN_PCPU_FLAGS_ONLINE))
>> +                               {
>> +                                       /* the pcpu is onlined now */
>> +                                       pcpu->flags |=
>> XEN_PCPU_FLAGS_ONLINE; +
> kobject_uevent(&pcpu->sysdev.kobj, KOBJ_ONLINE);
>> +
> xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_ONLINE);
>> +                               }
>> +                               else if ( !(info[i].flags &
>> XEN_PCPU_FLAGS_ONLINE) && +
>> (pcpu->flags & XEN_PCPU_FLAGS_ONLINE)) +
>> { +                                       /* The pcpu is offlined
>> now */ +                                       pcpu->flags &=
>> ~XEN_PCPU_FLAGS_ONLINE; +
> kobject_uevent(&pcpu->sysdev.kobj, KOBJ_OFFLINE);
>> +
> xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_OFFLINE);
>> +                               }
>> +                               pcpu->flags |= PCPU_LOOPED; +
>> } +               }
>> +               if (!found)
>> +               {
>> +                       struct pcpu *cpu;
>> +
>> +                       if (info[i].flags & XEN_PCPU_FLAGS_ONLINE)
>> +                               printk(KERN_WARNING "A
> hotadd cpu is onlined also \n");
>>
>
> This could do with some clarification.  Is this case that a newly
> added pcpu already appears to be online?

Because the vIRQ notification for pcpu hotplug from xen hypervisor is async, so 
maybe it happens after the pcpu is onlined by user. (i.e., the online 
notification and hotplug notification is merged)

>Also it should be prefixed
> with some
> clue about what code/operation is printing the message.
>
>> +                       cpu = xen_add_physical_cpu(&info[i]);
>> +                       if (cpu == NULL)
>> +                               goto failed;
>> +                       list_add_tail(&cpu->pcpu_list,
>> &xen_pcpus.list); +
>> xen_pcpu_callback(cpu->xen_id, XEN_PCPU_ADD); +
>> cpu->flags |= PCPU_LOOPED; +               }
>> +       }
>> +
>> +       list_for_each_safe(elem, tmp, &xen_pcpus.list) +       {
>> +               pcpu = list_entry(elem, struct pcpu, pcpu_list);
>> +               if (pcpu->flags & PCPU_LOOPED)
>> +                       pcpu->flags &= ~PCPU_LOOPED; +
>> else +               {
>> +                       /* The pcpu does not exist any more, remove
>> it */ +                       xen_free_physical_cpu(pcpu);
>> +                       xen_pcpu_callback(pcpu->xen_id,
>> XEN_PCPU_REMOVE); +               } +       }
>> +
>> +failed:
>> +       leave_pcpu_list();
>> +       return 0;
>> +}
>> +
>> +static int init_xen_pcpu_info(void)
>>
> __init?
>
> Also I prefer names of the form subsystem_action, so
> xen_pcpu_info_init().

Sorry, what do you mean of subsystem_action? I didn't find definition for it.

>
>> +{
>> +       int possible, cpu_num, i;
>> +       struct xen_physical_cpuinfo *info = NULL;
>> +       struct list_head *elem, *tmp;
>> +       struct pcpu *pcpu;
>> +
>> +       xen_pcpus.possible = xen_pcpus.present = 0;
>> +       INIT_LIST_HEAD(&xen_pcpus.list);
>> +       info = xen_fetch_cpu_info(&cpu_num, &possible); +       if
>> (!info) +               return -1;
>> +       xen_pcpus.possible = possible;
>> +       xen_pcpus.present = cpu_num;
>> +       enter_pcpu_list();
>> +       for (i = 0; i < cpu_num; i++)
>> +       {
>> +               pcpu = xen_add_physical_cpu(&info[i]); +
>> if (!pcpu) +               {
>> +                       leave_pcpu_list();
>> +                       goto failed;
>> +               }
>> +               list_add_tail(&pcpu->pcpu_list, &xen_pcpus.list); +
>> } +       leave_pcpu_list();
>> +       kfree(info);
>> +
>> +       return 0;
>> +failed:
>> +       list_for_each_safe(elem, tmp, &xen_pcpus.list) +       {
>> +               pcpu = list_entry(elem, struct pcpu, pcpu_list);
>> +               xen_free_physical_cpu(pcpu);
>> +       }
>> +       if (info)
>> +               kfree(info);
>> +       xen_pcpus.possible = xen_pcpus.present = 0;
>> +       INIT_LIST_HEAD(&xen_pcpus.list);
>> +       return -1;
>> +}
>> +
>> +static irqreturn_t xen_pcpu_interrupt(int irq, void *dev_id) +{
>> +       schedule_work(&work);
>> +       return IRQ_HANDLED;
>> +}
>> +
>> +/*
>> + * type: 0 add, 1 remove
>>
>
> Looks like "type" is really an enum.
>
>> + */
>> +int xen_pcpu_hotplug(int type, uint32_t apic_id)
>> +{
>> +       struct pcpu *pcpu;
>> +       int found = 0;
>> +
>> +       xen_pcpu_sync();
>> +       enter_pcpu_list();
>> +       list_for_each_entry(pcpu, &xen_pcpus.list, pcpu_list) +
>> { +               if (pcpu->apic_id == apic_id)
>> +               {
>> +                       found = 1;
>> +                       break;
>> +               }
>> +       }
>> +       leave_pcpu_list();
>> +
>> +       if (!found && (type == HOTPLUG_TYPE_ADD))
>> +               printk(KERN_WARNING "The cpu is not added into Xen
>> HV?\n"); + +       if (found && (type == HOTPLUG_TYPE_REMOVE) )
>> +               printk(KERN_WARNING "The cpu still exits in Xen
>> HV?\n"); +       return 0; +}
>> +EXPORT_SYMBOL(xen_pcpu_hotplug);
>>
> _GPL?

Yes.

>
>> +
>> +void xen_pcpu_dpc(struct work_struct *work)
>> +{
>> +       xen_pcpu_sync();
>> +}
>> +
>> +static int __init xen_physical_cpu_init(void)
>> +{
>> +       int err;
>> +
>> +       if (!xen_initial_domain())
>> +               return 0;
>> +
>> +       err = sysdev_class_register(&xen_pcpu_sysdev_class); +
>> if (err) +       {
>> +               printk(KERN_WARNING "error to register sysdev for
>> xen_pcpu\n"); +               return err; +       }
>> +
>> +       err = init_xen_pcpu_info();
>> +       if (!err)
>> +               bind_virq_to_irqhandler(VIRQ_PCPU_STATE, 0,
> xen_pcpu_interrupt, 0, "pcpu", NULL);
>>
>
> So when does this interrupt get raised?  Is the full flow:
>
>   1. ACPI raises SCI
>   2. dom0 catches that and gets the new pcpu event
>   3. dom0 notifies Xen that the pcpu exists
>   4. xen interrupts dom0 saying that a new pcpu exists
>   5. dom0 processes the list and adds it to sysfs
>   6. usermode onlines the cpu for Xen's use

And there is also step 7, Xen interrupt dom0 saying a new pCPU is onlined. 
Through step 7, dom0 will have the life cycle of the pCPU.

Thanks
Yunhong Jiang

>
> ?  Or does Xen interrupt dom0 once dom0 does the online?
>
>> +       return err;
>> +}
>> +
>> +subsys_initcall(xen_physical_cpu_init);
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index 34321cf..e414fcc 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -85,6 +85,7 @@ int acpi_boot_init (void);
>>  int acpi_boot_table_init (void);
>>  int acpi_mps_check (void);
>>  int acpi_numa_init (void);
>> +int get_apic_id(acpi_handle handle, int type, u32 acpi_id);
>>
>>  int acpi_table_init (void);
>>  int acpi_table_parse (char *id, acpi_table_handler handler);
>> diff --git a/include/xen/interface/platform.h
> b/include/xen/interface/platform.h
>> index 6783fce..851a7c1 100644
>> --- a/include/xen/interface/platform.h
>> +++ b/include/xen/interface/platform.h
>> @@ -312,6 +312,60 @@ struct xenpf_set_processor_pminfo {
>>  typedef struct xenpf_set_processor_pminfo
>>  xenpf_set_processor_pminfo_t;
>> DEFINE_GUEST_HANDLE_STRUCT(xenpf_set_processor_pminfo);
>>
>> +struct xenpf_hotadd_cpu
>> +{
>> +       uint32_t apic_id;
>> +       uint32_t acpi_id;
>> +       uint32_t pxm;
>> +};
>> +
>> +struct xenpf_hotremove_cpu
>> +{
>> +       uint32_t apic_id;
>> +};
>> +
>> +struct xenpf_cpu_ol
>> +{
>> +       uint32_t cpuid;
>> +};
>> +
>> +#define XENPF_resource_hotplug 55
>> +struct xenpf_resource_hotplug {
>> +       uint32_t sub_cmd;
>> +#define XEN_CPU_add      1
>> +#define XEN_CPU_remove   2
>> +#define XEN_CPU_online      3
>> +#define XEN_CPU_offline     4
>> +       union {
>> +               struct xenpf_hotadd_cpu sadd;
>> +               struct xenpf_hotremove_cpu sremove;
>> +               struct xenpf_cpu_ol   cpu_ol;
>> +       }u;
>> +};
>> +
>> +#define XENPF_get_cpuinfo 56
>> +struct xen_physical_cpuinfo {
>> +       uint32_t xen_cpuid;
>> +       uint32_t apic_id;
>> +#define XEN_PCPU_FLAGS_ONLINE 1
>> +       uint32_t flags;
>> +       uint32_t pad;
>> +};
>> +typedef struct xen_physical_cpuinfo xen_physical_cpuinfo_t;
>> +DEFINE_GUEST_HANDLE_STRUCT(xen_physical_cpuinfo);
>> +
>> +struct xenpf_pcpu_info
>> +{
>> +       /* IN/OUT */
>> +       uint32_t ncpus;
>> +       /* OUT */
>> +       /* The possible CPU */
>> +       uint32_t max_cpus;
>> +       GUEST_HANDLE(xen_physical_cpuinfo) info;
>> +};
>> +typedef struct xenpf_pcpu_info xenpf_pcpu_info_t;
>> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_pcpu_info);
>> +
>>  struct xen_platform_op {
>>         uint32_t cmd;
>>         uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
>> @@ -327,6 +381,8 @@ struct xen_platform_op {
>>                 struct xenpf_change_freq       change_freq;
>>                 struct xenpf_getidletime       getidletime;
>>                 struct xenpf_set_processor_pminfo set_pminfo;
>> +               struct xenpf_resource_hotplug   resource;
>> +               struct xenpf_pcpu_info          pcpu_info;
>>                 uint8_t                        pad[128];         } u;
>>  };
>> diff --git a/include/xen/interface/xen.h
>> b/include/xen/interface/xen.h index 812ffd5..c11920b 100644 ---
>> a/include/xen/interface/xen.h +++ b/include/xen/interface/xen.h
>> @@ -79,6 +79,7 @@
>>  #define VIRQ_CONSOLE    2  /* (DOM0) Bytes received on emergency
>>  console. */ #define VIRQ_DOM_EXC    3  /* (DOM0) Exceptional event
>>  for some domain.   */ #define VIRQ_DEBUGGER   6  /* (DOM0) A domain
>> has paused for debugging.   */ +#define VIRQ_PCPU_STATE 9  /* G.
>> (DOM0) PCPU state changed
>                 */
>>
>>  /* Architecture-specific VIRQ definitions. */
>>  #define VIRQ_ARCH_0    16
>> diff --git a/include/xen/pcpu.h b/include/xen/pcpu.h
>> new file mode 100644
>> index 0000000..0d71149
>> --- /dev/null
>> +++ b/include/xen/pcpu.h
>> @@ -0,0 +1,8 @@
>> +#ifndef _XEN_PCPU_H
>> +#define _XEN_PCPU_H
>> +
>> +extern int xen_pcpu_hotplug(int type, uint32_t apic_id);
>> +extern int xen_mce_virq_update_cpu(uint32_t xen_id);
>> +extern int xen_microcode_update(int cpu);
>> +
>> +#endif

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel