This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] x86's context switch ordering of operations

To: Jan Beulich <jbeulich@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] x86's context switch ordering of operations
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Tue, 29 Apr 2008 13:50:30 +0100
Delivery-date: Tue, 29 Apr 2008 05:50:34 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <48173326.76E4.0078.0@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acip95rZ2WtDFhXqEd2QpAAWy6hiGQ==
Thread-topic: [Xen-devel] x86's context switch ordering of operations
User-agent: Microsoft-Entourage/
On 29/4/08 13:39, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote:

> To do so, I was considering using {un,}map_domain_page() from
> the context switch path, but there are two major problems with the
> ordering of operations:
> - for the outgoing task, 'current' is being changed before the
> ctxt_switch_from() hook is being called
> - for the incoming task, write_ptbase() happens only after the
> ctxt_switch_to() hook was already called
> I'm wondering whether there are hidden dependencies that require
> this particular (somewhat non-natural) ordering.

ctxt_switch_{from,to} exist only in x86 Xen and are called from a single
hook point out from the common scheduler. Thus either they both happen
before, or both happen after, current is changed by the common scheduler. It
took a while for the scheduler interfaces to settle down to something both
x86 and ia64 was happy with so I'm not particularly excited about revisiting
them. I'm not sure why you'd want to map_domain_page() on context switch
anyway. The map_domain_page() 32-bit implementation is inherently per-domain

> 1) How does the storing of vcpu_info_mfn in the hypervisor survive
> migration or save/restore? The mainline Linux code, which uses this
> hypercall, doesn't appear to make any attempt to revert to using the
> default location during suspend or to re-setup the alternate location
> during resume (but of course I'm not sure that guest is save/restore/
> migrate ready in the first place). I would imagine it to be at least
> difficult for the guest to manage its state post resume without the
> hypervisor having restored the previously established alternative
> placement.

I don't see that it would be hard for the guest to do it itself before
bringing back all VCPUs (either by bringing them up or by exiting the
stopmachine state). Is save/restore even supported by pv_ops kernels yet?

> 2) The implementation in the hypervisor seems to have added yet another
> scalibility issue (on 32-bits), as this is being carried out using
> map_domain_page_global() - if there are sufficiently many guests with
> sufficiently many vCPU-s, there just won't be any space left at some
> point. This worries me especially in the context of seeing a call to
> sh_map_domain_page_global() that is followed by a BUG_ON() checking
> whether the call failed.

The hypervisor generally assumes that vcpu_info's are permanently and
globally mapped. That obviously places an unavoidable scalability limit for
32-bit Xen. I have no problem with telling people who are concerned about
the limit to use 64-bit Xen instead.

 -- Keir

Xen-devel mailing list