[Xen-devel] Re: lazy context switching
On Aug 26, 2005, at 4:37 AM, Keir Fraser wrote:
On 25 Aug 2005, at 22:55, Hollis Blanchard wrote:
Later on, if it turns out we are switching domains, we save/restore
state we can, then return to the exception handler which saves the
old set of
nonvolatiles and loads the new one. Until that point, some domain
spread arbitrarily across our stack.
That means that context_switch() cannot actually save all of @prev's
memory (and neither can __sync_lazy_execstate()) -- only by returning
way to assembly can we accomplish that.
What you need is a synchronisation point, visible to other CPUs,
beyond which things like DOM0_GETVCPUCONTEXT can be sure to read
consistent current state for the descheduled vcpu. See
domain_sleep_sync() for the current way we ensure that state is
committed to memory.
Hmmmmm. I think the basic problem is that in the exception handler we
don't usually know we will need this state. The exception is a debug
exception, where we know we will need it for the GDB stub.
However, we also have a hypervisor-dedicated timer, HDEC (hypervisor
decrementer). Rather than using it as a plain tick which may or may not
cause a scheduler exception, we can use it to *always* mean a context
switch. In that case, we would always save the full state on HDEC
entry, because we know it will always cause a context switch. Judging
by set_ac_timer() callers, it seems that only the scheduler really uses
the Xen timer tick. If non-scheduler components start using
Xen-internal ticks, this approach wouldn't hold up (or rather, it would
start becoming less efficient).
Would that also work for DOM0_GETVCPUCONTEXT? Let's assume the dom0
vcpu and the target vcpu are running on separate dedicated processors.
In that case, dom0 could wait for the target vcpu to take an HDEC at
some point in the future, but if it really is a dedicated vcpu then we
would want the schedule interval to be the maximum, so that could be a
long time. Another option is to have vcpu_pause() end up resetting the
target vcpu's processor's HDEC via an IPI, which would cause a fake
scheduler HDEC to go off, syncronizing the target vcpu's state.
What do you think?
If you have a lot of register state, have you considered maintaining a
Xen stack per VCPU? The context-switch interface already supports
this, for ia64.
We have plenty of space on the per-CPU stack for the register state (we
use it anyways on a debug exception for the GDB stub). And even if we
had one stack per VCPU, we would still want to avoid unnecessarily
saving/restoring the nonvolatiles...
IBM Linux Technology Center
Xen-devel mailing list