xen-devel
[Xen-devel] Power aware credit scheduler
Existing credit scheduler is not power aware. To achieve better
power saving ability with negligible performance impact, following
areas may be tweaked and listed here for comments first.
Goal is not to silly save power with sacrifice of performance, e.g.
we don't want to prevent migration when there're free cpus with
some pending runqueues. But when free computing power is more
than existing requirement, power aware policy can be pushed to
choose a less power-intrusive decision. Of course even in latter
case, it's controllable with a scheduler parameter like
csched_private.power and exposed to user.
----
a) when there's more idle cpus than required
a.1) csched_cpu_pick
Existing policy is to pick one with more idle neighbours,
to avoid shared resource contention among cores or threads.
However from power P.O.V, package C-state saves much more
power than per-core C-state vehicle. From this angle, it might be
better to keep idle package continuously idle, while picking idle
cores/threads with busy neighbours already, if csched_private.
power is set. The performance/watt ratio is positively incremented
though absolute performance is kicked a bit.
a.2) csched_vcpu_wake
Similar as above, instead of blindly kick all idle cpus in
a rush, some selective knock can be pushed with power factor
concerned.
----
b) when physical cpu resides in idle C-state
Avoid unnecessary work to keep longer C-state residency.
For example, accouting process (tick timer, more specifically)
can be stopped before C-state entrance and then resumed after
waking up. The point is that no accounting is required when current
cpu is idle, and any runqueue change triggering from other cpus
incurs a IPI to this cpu which effectively breaks it back to C0
state with accounting resumed. Since the residency period may
be longer than accouting period (30ms), csched_tick should be
aware of resume event to adjust elapsed credits.
----
c) when cpu's freq is scaled dynamically
When cpufreq/Px is enabled, cpu's frequency is adjusted
to different operation points driven by a on-demand governor. So
csched_acct may need take frequency difference among cpus into
consideration and total available credits won't be a simple 300 *
online cpu_number.
----
Of course there're bunch of research areas to add more power
factor into scheduler policy. But above is fundamental stuff which
we believe would help scheduler understand power requirement
and not incurs bad impact to performance/watt first.
Comments are appreciated.
Thanks,
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|