Jeremy, thanks for the comments. It is helpful. Please see below.
Jeremy Fitzhardinge wrote:
>Yu, Ke wrote:
>Hi Jeremy,
>
>
>Thanks for doing this. I looked at the changes in 2.6.18-xen and put them
>into the
>too-hard basket.
>
>Some general comments about patch submission:
>* please send patches one-per-mail, with [PATCH N/M] in the subject line
>to
>indicate ordering
>* in each mail, include the full changelog description like you have below
>* each person signing off the mail should have a Signed-off-by: line
>This makes it easier to do line-by-line commentary on the patches and also
>process
>it with the tools I have on hand.
Yes, I will follow this way next time.
>
>I just committed these patches to my tree, rebased onto xen-tip/dom0/core as
>xen-tip/dom0/acpi-parser, but I haven't even compile-tested to see if the
>rebase
>worked. But I did cut-and-paste the overview into the first patch comment so
>its
>recorded properly.
>
>
>=== Overview ===
>
>Requirement: Xen hypervisor need Cx/Px ACPI info to do the Cx/Px states power
>management. This info is provided by BIOS ACPI table. Since hypervisor has no
>ACPI parser, this info has to be parsed by domain0 kernel ACPI sub-system, and
>then passed to hypervisor by hypercall.
>
>
>Couldn't this be done with a variant of the cpufreq_acpi driver? It needs to
>parse the
>tables, but rather than acting on them directly it passes the information to
>Xen?
Oh, you raise a good point. This is a more clean design. I go through the
related code again. And find this method seems feasible. Basically, the xen
cpufreq driver need to do two things:
1. pass the initial cpufreq frequency info to xen. This can be done in the xen
cpufreq driver init phase.
2. if hardware cpufreq frequency info changed (see
acpi_processor_ppc_has_changed()), notify xen the changed info. This can be
done by registering one notify callback in cpufreq (see
cpufreq_register_notifier) .
This is just initial thought, I will follow up this with more detail.
And another thing is that cpuidle (Cx info) has no such arch specific place for
xen, we probably still need to use current logic for cpuidle.
>
>
>To make this happen, the key point is to add hook in the kernel ACPI
>sub-system.
>Fortunately, kernel already has good abstraction, and only several places need
>to
>add hook. To be more detail, there is an acpi_processor_driver (in
>drivers/acpi/processor_core.c) , which all the Cx/Px parsing event will go to.
>This
>driver will call its acpi processor event handler, e.g. add/remove,
>start/stop, notify to
>handle these events. These event handlers in turn will call some library
>functions (in
>drivers/acpi/processor_perflib.c), e.g. acpi_processor_ppc_has_changed,
>acpi_processor_ppc_has_changed, acpi_processor_cst_has_changed, to finish the
>acpi info parsing.
>
>So the conclusion is: adding hooks in acpi_processor_driver and those related
>library functions will satisfy our requirement.
>
>To make the added hook cleaner, we introduce an interface called external
>control
>operation (struct processor_extcntl_ops). All the hooks are encapsulated in
>this
>interface processor_extcntl_ops . Here the "external" means the acpi processor
>is
>controlled by external entity, e.g. VMM. Every kind of external entity can has
>its
>implementation of this interface. In this patch, the interface for Xen is
>implemented.
>
>
>Is the issue that the hypervisor will change the CPUs states asynchronously
>under
>the kernel, and the kernel wants to be informed about those state changes,
>along
>with some kind of mapping from pcpu to vcpu?
In pv_ops dom0, kernel do not need to be informed on the cpu state changes.
Kernel only need to pass the acpi processor info to hypervisor, and let
hypervisor control the CPU states.
>
>What use does the kernel make of that information?
>
>I guess I don't really understand what the larger problem domain is.
Let me explain more detail. In native linux, the Cx/Px processing flow is like
this: firstly, acpi subsystem will parse the ACPI table, and get all Cx/Px
related acpi info (stored as struct acpi_processor, struct
acpi_processor_power, and struct acpi_processor_performance). and then acpi
subsystem will notify cpuidle driver and cpufreq driver to use the parsed acpi
info to control physical processor Cx/Px transition.
In pv_ops dom0, kernel can see physical ACPI table, but it only see vCPUs, so
we want kernel to parse ACPI table, but don't want kernel to control physical
processor Cx/Px transition. So the ideal flow control is: firstly, kernel acpi
subsystem will parse the ACPI table, and get all Cx/Px related acpi info. Then,
kernel pass this info to hypervisor. Kernel cpuidle and cpufreq driver is
turned off.
You may ask why not moving all stuff to hypervisor, i.e. parsing acpi table in
hypervisor too. the major reason is that acpi parser is unfortunately very
complex, and at meantime kernel already have pretty good acpi parser, so it is
a better choice to leverage kernel's acpi parser.
So the larger problem domain is: how to cleanly pass the acpi info to
hypervisor, and meanwhile, keep kernel cpuidle and cpufreq turned off.
>
>Do you have any plans for other hypervisor backends? Does a kvm one make
>sense?
In KVM case, since kernel see the physical processor, the Cx/Px transition is
controlled by kernel. So KVM don't need this logic. This logic only applies to
hypervisor who want to control Cx/Px itself. By far, I did not find other
hypervisor need this logic.
>
>
>== Patch Set Description ==
>
>This patch set is based the xen/dom0/hackery branch. It has three patches:
>
>1. acpi_parser_framework.patch
>This patch introduces the interface of external control operation, and adds
>hooks to
>the acpi sub-system, including the acpi_processor_driver, and the related
>library
>functions.
>
>2. acpi_parser_xen_driver.patch
>This patch implements the Xen version of the external control operation
>interface.
>
>3. acpi_parser_platform_hypercall.patch
>This patches implements the xen_platform_op hypercall, to pass the parsed ACPI
>info to hypervisor.
>
>== Misc ==
>
>To make this patch works correctly, there is a corner case need to consider:
>processor in domain0 is virtual CPU, while BIOS ACPI table provides the
>physical
>CPU info, since the vCPU and pCPU is not 1:1 mapped, this may caused unexpected
>result. E.g. in UP domain0, Xen hypervisor may only get one physical processor
>info. Current linux-2.6.18-xen.hg tree solve this issue by using acpi_id
>instead of
>processor_id. However, this code is a bit tricky and hacky. We are still
>trying to find
>a cleaner way to solve this issue.
>
>
>Is this a corner case? I would think it's the common case.
This is common case, but currently only affects Cx/Px logic. If any code need
to use the physical acpi processor info, it need to care about this.
>
>
>
>OK, some more specific comments:
>
>Cut down on the number of #ifdefs. If
>CONFIG_PROCESSOR_EXTERNAL_CONTROL is not set, define the functions to
>appropriate no-op definitions. For example, rather than:
>+#ifdef CONFIG_PROCESSOR_EXTERNAL_CONTROL
>+ if (!processor_cntl_external()) {
>+ return -ENODEV;
>+ }
>+#else
> return -ENODEV;
>+#endif /* CONFIG_PROCESSOR_EXTERNAL_CONTROL */
>just define processor_cntl_external() to always return 0. (Oh, it looks like
>you
>already do this. Why all the #ifdefs then?)
Oh, we just want to make sure all changes are embraced by
CONFIG_PROCESSOR_EXTERNAL_CONTROL, for better tracking purpose. Will be remove
this in the final version.
>
>Also, rather than:
>+ if (processor_cntl_external())
>+ processor_notify_external(pr, PROCESSOR_HOTPLUG,
>+ HOTPLUG_TYPE_REMOVE);
>can't processor_notify_external() do the test itself? (In addition to the
>#ifdef
>comments.)
Good point. will revise it.
>
>This seems pointless:
>-static int acpi_processor_get_performance_info(struct acpi_processor *pr)
>+#ifndef CONFIG_PROCESSOR_EXTERNAL_CONTROL
>+static
>+#endif
>+int acpi_processor_get_performance_info(struct acpi_processor *pr)
For tracking purpose only. Probably better to simply remove the "static"
>
>
> config CPU_FREQ
> bool "CPU Frequency scaling"
>+ depends on !PROCESSOR_EXTERNAL_CONTROL
>
>This isn't acceptible. A single kernel image could be running on bare
>hardware, or
>be booted as a Xen dom0 kernel. It needs to be able to handle both CPUFREQ and
>external processor control. Unless external processor control somehow subsumes
>all the cpufreq driver functions? Anyway, all decisions need to be made at
>runtime.
I see. When use the xen cpufreq driver approach as you suggested, this change
is not necessary any more.
>
>Put all the Xen-specific code into a Xen-specific header. I've already added
>include/xen/acpi.h for the S3 changes, so perhaps that would be suitable.
Ok.
>
>+ifneq ($(CONFIG_PROCESSOR_EXTERNAL_CONTROL),)
>+obj-$(CONFIG_XEN) += processor_extcntl_xen.o
>+endif
>No, do this in Kconfig, with something like:
>config XEN_PROCESSOR_EXTERNAL_CONTROL
> def_bool y
> depends on XEN && PROCESSOR_EXTERNAL_CONTROL
>And maybe XEN_ACPI and/or XEN_DOM0? If this is useful for non-x86 (ia64), then
>put it in drivers/xen.
Ok. and yes, it will be extended to ia64 in the future.
>
> J
Thanks for the above comments. I will investigate the xen cpufreq driver
approach as you suggested, and sent the revised patch when it is ready.
Best Regards
Ke
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|