This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] RE: One question to IST stack for PV guest

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: [Xen-devel] RE: One question to IST stack for PV guest
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Sat, 19 Dec 2009 22:24:22 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "Kleen, Andi" <andi.kleen@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date: Sat, 19 Dec 2009 06:24:53 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4B2BF269.5040608@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C8EDE645B81E5141A8C6B2F73FD9265105AE092F76@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <4B2BF269.5040608@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcqAKB1N/2aLbABzS++6UEntC4/uSgAhQNEA
Thread-topic: One question to IST stack for PV guest
>-----Original Message-----
>From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
>Sent: Saturday, December 19, 2009 5:22 AM
>To: Jiang, Yunhong
>Cc: Keir Fraser; Jan Beulich; xen-devel@xxxxxxxxxxxxxxxxxxx; Kleen, Andi
>Subject: Re: One question to IST stack for PV guest
>On 12/18/2009 01:05 AM, Jiang, Yunhong wrote:
>> Jeremy/Keir, I'm trying to add vMCA injection to pv_ops dom0.  Because 
>> currently
>we didn't have virtual IST stack support, so I plan to use the kernel stack 
>for vMCE.
>But Andi told me that this method should have issue if MCE is injected before 
>handler switches to kernel stack. After checking the code, seems this apply in 
>dom0, since undo_xen_syscall will switch to user space stack firstly (see 
>What are the requirements here?  Are these events delivered to dom0 to
>indicate that something needs attention on the machine, or are they
>delivered synchronously to whatever domain is currently running to say
>that something bad needs immediate attention?

Whatever domain impacted, as Andi Kleen pointed out, and it can be a 
synchronous event, depends on the error type.

>> I'm not sure if we really need to switch to user space stack, or we can 
>> simply place
>user stack to oldrsp and don't switch the stack at all, since xen hypervisor 
>has use
>the kernel stack already.
>> Another option is to add vIST stack, but that requires changes for dom0/xen
>interface and is a bit complex.
>What about making the call a bit like the failsafe callback, which
>always uses the kernel stack, to deliver these exceptions?  That could
>reshape the kernel stack to conform to the normal stack frame and then
>call the usual arch/x86 handlers.

The issue comes from the syscall, not the vMCE/vNMI exception. The vMCE can be 
injected into guest at any time, that means, it may be injected when guest is 
in syscall's entry point, but before the stack has been switched to kernel 
Considering following situation:
1) A syscall happens from dom0 application to dom0 kernel (in 64 environment)
2) The syscall is trapped firstly by hypervisor, and it will creat bounce frame 
to re-inject the syscall to kernel. (please notice this frame will be kernel 
stack), and mark guest in kernel model.
3) In current dom0, the syscall entry (i.e. xen_syscall_target) will firstly 
undo_xen_syscall(), which will switch stack from kernel stack to user stack, 
later the system_call_after_swapgs() will switch the stack to kernel stack 
4) A MCE happens in hardware before the . system_call_after_swapgs() , and 
hypervisor will be invoked. After hypervisor handle the MCE, it decide need to 
inject a virtual MCE to guest immediately. (As said, sometimes the vMCE should 
be synchronous injected).
5) Hypervisor check guest state and find it is in kernel mode, then it will use 
guest's current stack to inject the vMCE . However, in fact, currently, the 
stack is user stack. That means the MCE handler in dom0 will use user stack. 
This will cause a lot of issue.

>> I checked the 2.6.18 kernel and seems it have no such issue, because syscall 
>> entry
>in arch/x86_64/kernel/entry-xen.S will use kernel stack directly. (But vMCE 
>may have issue still because it use zeroentry).
>> BTW, Jeremy, seems vNMI support is not included in pvops dom0, will it be
>supported in future?
>There's been no call for it so far, so I hadn't worried about it much.
>I was thinking it might be useful as a debug tool, but I don't know what
>it gets used for normally.

I remember Jan stated that "Dom0 can get hardware generated NMIs, and any 
domain can get
software injected ones", but I have not much background on it. (see  


>     J

Xen-devel mailing list