>From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
>Sent: Saturday, December 19, 2009 5:22 AM
>To: Jiang, Yunhong
>Cc: Keir Fraser; Jan Beulich; xen-devel@xxxxxxxxxxxxxxxxxxx; Kleen, Andi
>Subject: Re: One question to IST stack for PV guest
>On 12/18/2009 01:05 AM, Jiang, Yunhong wrote:
>> Jeremy/Keir, I'm trying to add vMCA injection to pv_ops dom0. Because
>we didn't have virtual IST stack support, so I plan to use the kernel stack
>But Andi told me that this method should have issue if MCE is injected before
>handler switches to kernel stack. After checking the code, seems this apply in
>dom0, since undo_xen_syscall will switch to user space stack firstly (see
>What are the requirements here? Are these events delivered to dom0 to
>indicate that something needs attention on the machine, or are they
>delivered synchronously to whatever domain is currently running to say
>that something bad needs immediate attention?
Whatever domain impacted, as Andi Kleen pointed out, and it can be a
synchronous event, depends on the error type.
>> I'm not sure if we really need to switch to user space stack, or we can
>> simply place
>user stack to oldrsp and don't switch the stack at all, since xen hypervisor
>the kernel stack already.
>> Another option is to add vIST stack, but that requires changes for dom0/xen
>interface and is a bit complex.
>What about making the call a bit like the failsafe callback, which
>always uses the kernel stack, to deliver these exceptions? That could
>reshape the kernel stack to conform to the normal stack frame and then
>call the usual arch/x86 handlers.
The issue comes from the syscall, not the vMCE/vNMI exception. The vMCE can be
injected into guest at any time, that means, it may be injected when guest is
in syscall's entry point, but before the stack has been switched to kernel
Considering following situation:
1) A syscall happens from dom0 application to dom0 kernel (in 64 environment)
2) The syscall is trapped firstly by hypervisor, and it will creat bounce frame
to re-inject the syscall to kernel. (please notice this frame will be kernel
stack), and mark guest in kernel model.
3) In current dom0, the syscall entry (i.e. xen_syscall_target) will firstly
undo_xen_syscall(), which will switch stack from kernel stack to user stack,
later the system_call_after_swapgs() will switch the stack to kernel stack
4) A MCE happens in hardware before the . system_call_after_swapgs() , and
hypervisor will be invoked. After hypervisor handle the MCE, it decide need to
inject a virtual MCE to guest immediately. (As said, sometimes the vMCE should
be synchronous injected).
5) Hypervisor check guest state and find it is in kernel mode, then it will use
guest's current stack to inject the vMCE . However, in fact, currently, the
stack is user stack. That means the MCE handler in dom0 will use user stack.
This will cause a lot of issue.
>> I checked the 2.6.18 kernel and seems it have no such issue, because syscall
>in arch/x86_64/kernel/entry-xen.S will use kernel stack directly. (But vMCE
>may have issue still because it use zeroentry).
>> BTW, Jeremy, seems vNMI support is not included in pvops dom0, will it be
>supported in future?
>There's been no call for it so far, so I hadn't worried about it much.
>I was thinking it might be useful as a debug tool, but I don't know what
>it gets used for normally.
I remember Jan stated that "Dom0 can get hardware generated NMIs, and any
domain can get
software injected ones", but I have not much background on it. (see
Xen-devel mailing list