>-----Original Message-----
>From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
>Sent: Wednesday, July 29, 2009 1:54 AM
>To: Yu, Ke
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Tian, Kevin
>Subject: Re: [PATCH][pvops_dom0][4/4] use physical acpi_id in acpi processor
>parsing logic
>
>On 07/21/09 01:07, Yu, Ke wrote:
>> To use acpi id in native, I can see there are at least two kind of conflicts
>> need to
>be resolved:
>> 1. kernel assume it only cares about the present CPU. For non present CPU, it
>will simply stop going further and return, or trigger BUG(). When switch to
>acpi id,
>the acpi processor object may refer to a non present cpu, so the code need to
>be
>able to handle the non-present CPU situation.
>>
>
>The percpu subsystem should be able to deal with accesses to percpu data
>of non-present cpus (though it might need some advance preparation to
>make sure the memory is allocated). In general the percpu subsystem is
>concerned with making sure that the amount of memory allocated is
>"reasonable" - ie, for cpus which are actually present or could be
>present, rather than cpus which can never exist on this system (like
>running a kernel compiled for 1024 processors on a dual-core laptop).
>
>I assume that ACPI processor IDs are always going to be in the realm of
>sensible for the hardware: ie, either CPUs which actually exist, or
>which have sockets which could potentially be hotplugged. In that case
>I don't see a problem with making sure they have percpu data allocated.
This looks not true unfortunately. When I set dom0 vcpu number to 1, I observe
the per-CPU data only allocate memory for 1 CPU. probably we need to manipulate
the possible CPU set as you suggested.
>
>(Of course in the Xen case this needs a bit more care, since the domain
>VCPU count has nothing to do with the host PCPUs, but we can do things
>like manipulate the possible CPU set if that helps.)
>
>> 2. native kernel use per_cpu data extensively, which is indexed by general
>> cpu id.
>when switch to acpi id, these per_cpu data should be changed to the array
>indexed by acpi id.
>>
>
>How is the acpi id derived? Is the the same as the local apic id? Is
>it typically the same as the kernel's smp_processor_id, or does it tend
>to be different? If they're different, is the mapping fixed or can it vary?
acpi id is derived from the acpi DSDT table acpi processor object. For example,
the following sniper declare 4 processors in socket 0. And the second field in
processor declaration is the acpi id.
Scope (\_SB)
{
Device (\_SB.SCK0)
{
Processor (CPU0, 0x00, 0x00000410, 0x06) { ... }
Processor (CPU1, 0x01, 0x00000410, 0x06) { ... }
Processor (CPU2, 0x02, 0x00000410, 0x06) { ... }
Processor (CPU3, 0x03, 0x00000410, 0x06) { ... }
}
}
acpi id is different from local apic id which is derived from ACPI MADT table,
and also different from kernel's smp_processor_id.
Their mapping is fixed. note that in dom0 case, their mapping may be
incomplete, since some processor may not presented to dom0.
>
>> Take the acpi processor core code (driver/acpi/ processor_core.c) as example,
>the condition check " BUG_ON((pr->id >= nr_cpu_ids) || (pr->id < 0)); " need
>change. the per-cpu data processor_device_array, processors need change. And
>the cpu_sys_devices in get_cpu_sysdev need more thoughts before changing,
>since it is globally used by other component.
>>
>> Another example is the cpufreq case. if we want to use acpi id in cpufreq
>> case,
>we also need to resolve the above two conflicts. For example, in
>drivers/cpufreq/cpufreq.c, its core data struct " cpufreq_policy " is per-CPU,
>thus
>need many changes in every place it is used. and the condition checking, like
>" if
>(cpu >= nr_cpu_ids) goto err_out;" also need change. Compared with the change
>in the driver/acpi/ processor_core.c, the change in cpufreq is more intrusive.
>Since the acpi processor core code already has the Px info parsing
>functionality, it
>may be better not changing cpufreq.
>>
>
>OK, to summarize:
>
>The cpufreq subsystem provides two services to the rest of the kernel:
>
> * the ability to set the overall power management policy
> (performance, powersave, etc)
> * the mechanism and drivers to implement that policy
>
>In this case we still want a way to set the policy, but Xen itself will
>implement the mechanism internally without dom0's further involvement
>(aside from some info culled from the ACPI tables), right?
Right. both policy and mechanism are done in hypervisor. and xen already
provide the platform_hypercall for management tool to set the overall power
management policy. Currently we have one user space tool xenpm
(xen-unstable/tools/misc/xenpm.c) to set the PM policy.
>
>But even then, cpufreq is oriented towards controlling the
>kernel-visible CPUs, and is ill-suited to controlling the policy of the
>host CPUs from the context of one particular domain.
>
>Therefore we need to have new interfaces which:
>
> 1. insert ACPI info that dom0 extracts from various tables into Xen
> (assuming its impractical for Xen to do this itself)
> 2. set the overall power-management policy
> 3. Xen implements that policy without further interaction with dom0
>
>(And what's missing from this is some way for each individual domain to
>set the "importance" of the work being done on each VCPU to allow Xen to
>determine what's the appropriate operating point for each PCPU from
>timeslice to timeslice.)
>
>Is that accurate?
In current design, cpufreq will not start in dom0 or domU. And there is no way
for VM to set the "importance" of its work. but administrator can manually set
the policy by the management tool.
In summary, Xen already implement the cpufreq policy and mechanism in
hypervisor, and also provide hypercall for management tool to set the overall
policy, so the only thing need dom0 involvement is acpi info parsing. The
change to dom0 cpufreq code is avoidable.
Best Regards
Ke
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|