Re: [Xen-devel] [PATCH] Yield to VCPU hcall, spinlock yielding
On Wednesday 08 June 2005 13:40, Bryan S Rosenburg wrote:
> "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx> wrote on 06/08/2005 02:25:56
> > > The key point is that with
> > > kernel-level preemption notification, VCPUs are always in
> > > kernel mode when suspended, never in user mode. Application
> > > state is always saved in Linux, not in Xen, and is available
> > > to be resumed on another VCPU if Linux so chooses.
> > In principle, but...
> > Do you believe this is going to interact well with Linux's work
> > stealing CPU migration? I haven't looked closely at the current
> > code, but from Linux's scheduler's POV the de-scheduled (yielded)
> > CPU looks like a perfectly healthy CPU, so there's no particular
> > reason that another CPU would steal work from it (without hacking
> > the algorithm, which I suppose we could do). Also, do you have to
> > do something special in your yield routine to ensure that no real
> > process is currently running on the yielded processor so that all
> > processes on the run queue are available for stealing?
> > Ian
> In our original posting, we proposed that the Linux interrupt handler
> for preemption notifications would create (or unblock) a
> high-priority kernel thread which would then yield back to the
> hypervisor. To Linux on other CPUs, the de-scheduled CPU would
> appear to be busy running the high-priority thread, and all real work
> that that CPU had been doing would be eligible for stealing.
IMO, I don't think this alone is enough to encourage task migration.
The primary motivator to steal is a 25% or more load imbalance, and one
extra fake kernel thread will probably not be enough to trigger this.
To solve this and other issues, I believe we need an extra modifier to
the Linux kernel cpus' load value, which Xen could modify to hint the
kernel what cpus' relative processing power is. The Linux kernel
scheduler's per cpu load values would be something like (max_cpu_power
/ cpu_power * nr_running). Xen could update cpu_power for a number of
situations, a "long" preemption, a much faster alternative to a vcpu
hot-unplug (don't unplug, just set cpu_power to 0), and to normalize
load values for vcpus which have different time-slice lengths on the
I would hope something like this could also be used without Xen on Linux
so it has wider appeal. One thing that comes to mind is normalizing
cpus' load when some cpus may be "speed stepped" down for power
management or heat issues.
Xen-devel mailing list