This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: lazy context switching

On Aug 26, 2005, at 4:37 AM, Keir Fraser wrote:

On 25 Aug 2005, at 22:55, Hollis Blanchard wrote:

Later on, if it turns out we are switching domains, we save/restore all the state we can, then return to the exception handler which saves the old set of nonvolatiles and loads the new one. Until that point, some domain state is
spread arbitrarily across our stack.

That means that context_switch() cannot actually save all of @prev's state to memory (and neither can __sync_lazy_execstate()) -- only by returning all the
way to assembly can we accomplish that.


What you need is a synchronisation point, visible to other CPUs, beyond which things like DOM0_GETVCPUCONTEXT can be sure to read consistent current state for the descheduled vcpu. See domain_sleep_sync() for the current way we ensure that state is committed to memory.

Hmmmmm. I think the basic problem is that in the exception handler we don't usually know we will need this state. The exception is a debug exception, where we know we will need it for the GDB stub.

However, we also have a hypervisor-dedicated timer, HDEC (hypervisor decrementer). Rather than using it as a plain tick which may or may not cause a scheduler exception, we can use it to *always* mean a context switch. In that case, we would always save the full state on HDEC entry, because we know it will always cause a context switch. Judging by set_ac_timer() callers, it seems that only the scheduler really uses the Xen timer tick. If non-scheduler components start using Xen-internal ticks, this approach wouldn't hold up (or rather, it would start becoming less efficient).

Would that also work for DOM0_GETVCPUCONTEXT? Let's assume the dom0 vcpu and the target vcpu are running on separate dedicated processors. In that case, dom0 could wait for the target vcpu to take an HDEC at some point in the future, but if it really is a dedicated vcpu then we would want the schedule interval to be the maximum, so that could be a long time. Another option is to have vcpu_pause() end up resetting the target vcpu's processor's HDEC via an IPI, which would cause a fake scheduler HDEC to go off, syncronizing the target vcpu's state.

What do you think?

If you have a lot of register state, have you considered maintaining a Xen stack per VCPU? The context-switch interface already supports this, for ia64.

We have plenty of space on the per-CPU stack for the register state (we use it anyways on a debug exception for the GDB stub). And even if we had one stack per VCPU, we would still want to avoid unnecessarily saving/restoring the nonvolatiles...

Hollis Blanchard
IBM Linux Technology Center

Xen-devel mailing list