WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: xl/xm save -c fails - set_vcpucontext EOPNOTSUPP (was Re: [Xen-devel

To: Jan Beulich <JBeulich@xxxxxxxxxx>
Subject: Re: xl/xm save -c fails - set_vcpucontext EOPNOTSUPP (was Re: [Xen-devel] xl save -c issues with Windows 7 Ultimate)
From: Shriram Rajagopalan <rshriram@xxxxxxxxx>
Date: Wed, 11 May 2011 14:50:46 -0500
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Delivery-date: Wed, 11 May 2011 12:52:28 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <BANLkTi=z=p1Hui=zEEUxk=B5cSoMYfu55w@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <BANLkTi=a4=uNLYSA+0FEX+oX=iBmStn3aA@xxxxxxxxxxxxxx> <1305016915.26692.261.camel@xxxxxxxxxxxxxxxxxxxxxx> <BANLkTinURoEKajLD54zQQVNf1=7uKaXb3A@xxxxxxxxxxxxxx> <4DC96FA50200007800040C69@xxxxxxxxxxxxxxxxxx> <BANLkTi=FUk7GMDaLu2zOhZ+1QTmOa0=dSg@xxxxxxxxxxxxxx> <4DC97E000200007800040CFF@xxxxxxxxxxxxxxxxxx> <BANLkTinjZnS2_vyvg_7Q7sHCj=zwx5anGQ@xxxxxxxxxxxxxx> <4DCA5B3A0200007800040EC4@xxxxxxxxxxxxxxxxxx> <BANLkTi=z=p1Hui=zEEUxk=B5cSoMYfu55w@xxxxxxxxxxxxxx>
Reply-to: rshriram@xxxxxxxxx
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Wed, May 11, 2011 at 1:37 PM, Shriram Rajagopalan <rshriram@xxxxxxxxx> wrote:
On Wed, May 11, 2011 at 2:47 AM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote:
>>> On 11.05.11 at 04:30, Shriram Rajagopalan <rshriram@xxxxxxxxx> wrote:
>> I tried out a simple program that just gets and sets the VCPU 0's context
> (no change
> whatsoever to anything). There is no intermediate code involved (except for
> the hypercall
> bounce buffer stuff). If all is well, then this should work. But it doesnt!!
> even for a PV guest.
>  I get the same Operation Not supported error when I try to "set" the vcpu
> context with the
> same struct obtained via the get_vcpucontext hypercall!
>...
> and I get - setcontext: operation not supported!

Again, you'll want to add debugging code to the hypervisor to check
what really is inconsistent.

> now for the weirdness:
>  Since the the setcontext failed I thought I should be able
> to run the above sample code again and again with no side effect
> (please correct my assumption if I am wrong).
>
> But when I run the above code for the second time, I get a XEN panic!
>
> (XEN) Xen BUG at domctl.c:1724
> (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    2
> (XEN) RIP:    e008:[<ffff82c48014dd57>] arch_get_info_guest+0x5f7/0x7b0
> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
> (XEN) rax: 0000000000000001   rbx: ffff8300228c4000   rcx: ffff8300228c4040
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff830450652210
> (XEN) rbp: ffff83082a357da8   rsp: ffff83082a357d68   r8:  0000000000000002
> (XEN) r9:  0000000000000002   r10: 0000000000000040   r11: 0000000000000000
> (XEN) r12: ffff830450652010   r13: 0000000000000001   r14: ffff830829db9000
> (XEN) r15: ffff830450652010   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 000000047beef000   cr2: 0000000000d44048
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff83082a357d68:
> (XEN)    ffff830829db9000 ffff8300228c4000 ffff83082a357d98 fffffffffffffff4
> (XEN)    0000000000d40004 ffff8300228c4000 ffff830829db9000 ffff830450652010
> (XEN)    ffff83082a357ef8 ffff82c48010351f ffff83082a357e48 ffff82c48016af84
> (XEN)    0000000000000000 0000000000000070 ffff83082a357e28 000000000047beea
> (XEN)    0000000000000000 ffff83082a30b000 ffff830450652010 ffff830450652010
> (XEN)    ffff83082a357e48 0000000080164c7d aaaaaaaaaaaaaaaa ffff83082a30b000
> (XEN)    ffff83082a357ef8 ffff82c480113d73 000000070000000d 0000000000000001
> (XEN)    0000000000000000 0000000000d42004 0000000000000000 00007fef43c4a791
> (XEN)    0000000000000001 0000000000000000 00007fff27dc7db0 00007fef43a1bd58
> (XEN)    0000000000000024 0000000000000001 00007fff27dc9710 0000000000000001
> (XEN)    0000000000d3f050 00007fef43c51325 0000000000000011 00007fff27dc7dd0
> (XEN)    ffff83082a357ed8 ffff8300bf656000 0000000000000003 00007fff27dc7c60
> (XEN)    00007fff27dc7c60 0000000000000000 00007cf7d5ca80c7 ffff82c48020e1e8
> (XEN)    ffffffff8100948a 0000000000000024 0000000000000000 00007fff27dc7c60
> (XEN)    00007fff27dc7c60 0000000000000003 ffff8807a0f2fe68 ffffffff8148d700
> (XEN)    0000000000000282 0000000000000024 0000000000d3f050 0000000000d40004
> (XEN)    0000000000000024 ffffffff8100948a 0000000100000000 00007fff27dc7ce0
> (XEN)    0000000000d40004 0000010000000000 ffffffff8100948a 000000000000e033
> (XEN)    0000000000000282 ffff8807a0f2fe20 000000000000e02b 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000002
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48014dd57>] arch_get_info_guest+0x5f7/0x7b0
> (XEN)    [<ffff82c48010351f>] do_domctl+0x10ad/0x195e
> (XEN)    [<ffff82c48020e1e8>] syscall_enter+0xc8/0x122
>
> I would appreciate any pointers on how to go about this.

This now indeed looks like an inconsistency between
arch_get_info_guest() and the newly introduced error path in
arch_set_info_guest() - the code to put v->arch.user_eflags into
the necessary state now simply doesn't run anymore. It simply
needs to be pulled up in that function (and a few other adjustments
seem also necessary):

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -856,6 +856,15 @@ int arch_set_info_guest(
        goto out;
    }

+    init_int80_direct_trap(v);
+
+    /* IOPL privileges are virtualised. */
+    v->arch.pv_vcpu.iopl = (v->arch.user_regs.eflags >> 12) & 3;
+    v->arch.user_regs.eflags &= ~X86_EFLAGS_IOPL;
+
+    /* Ensure real hardware interrupts are enabled. */
+    v->arch.user_regs.eflags |= X86_EFLAGS_IF;
+
    if ( !v->is_initialised )
    {
        v->arch.pv_vcpu.ldt_base = c(ldt_base);
@@ -866,7 +875,11 @@ int arch_set_info_guest(
        bool_t fail = v->arch.pv_vcpu.ctrlreg[3] != c(ctrlreg[3]);

 #ifdef CONFIG_X86_64
-        fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);
+        if ( !compat )
+        {
+            fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);
+            fail |= !v->arch.pv_vcpu.ctrlreg[1] && !(flags & VGCF_in_kernel);
+        }
 #endif

        for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i )
@@ -907,15 +920,6 @@ int arch_set_info_guest(
    v->arch.pv_vcpu.ctrlreg[0] &= X86_CR0_TS;
    v->arch.pv_vcpu.ctrlreg[0] |= read_cr0() & ~X86_CR0_TS;

-    init_int80_direct_trap(v);
-
-    /* IOPL privileges are virtualised. */
-    v->arch.pv_vcpu.iopl = (v->arch.user_regs.eflags >> 12) & 3;
-    v->arch.user_regs.eflags &= ~X86_EFLAGS_IOPL;
-
-    /* Ensure real hardware interrupts are enabled. */
-    v->arch.user_regs.eflags |= X86_EFLAGS_IF;
-
    cr4 = v->arch.pv_vcpu.ctrlreg[4];
    v->arch.pv_vcpu.ctrlreg[4] = cr4 ? pv_guest_cr4_fixup(v, cr4) :
        real_cr4_to_pv_guest_cr4(mmu_cr4_features);

Can you give this a try?
Ok. This patch solves the Xen panic issue but not the EOPNOTSUPP
error. That is, I can use my sample program to "try" to get/set the same vcpu
context. As usual, only get context succeeded and set context failed with
same EOPNOTSUPP error, for 2.6.18 32-bit domU and 2.6.39 64 bit dom0

And as you said, I added more debugging.

(XEN) domain.c:893:d0 incoming cr3 42b33e000, cur cr3 827ba5000, fail = 1
(XEN) domain.c:901:d0 incoming cr1 42ba6c000, cur cr1 00000000, !(flags & VGCF_in_kernel)=0,fail=1

Looking at arch_get_info_guest in domctl.c , I see that cr3 is first copied verbatim from the vcpu and
then modified in the if-else block
if ( !is_pv_32on64_domain(v->domain) )
        {
            c.nat->ctrlreg[3] = xen_pfn_to_cr3(
                pagetable_get_pfn(v->arch.guest_table));
#ifdef __x86_64__
            c.nat->ctrlreg[1] =
                pagetable_is_null(v->arch.guest_table_user) ? 0
                : xen_pfn_to_cr3(pagetable_get_pfn(v->arch.guest_table_user));
#endif
....
   } else {
            l4_pgentry_t *l4e = __va(pagetable_get_paddr(v->arch.guest_table));
            c.cmp->ctrlreg[3] = compat_pfn_to_cr3(l4e_get_pfn(*l4e));
}

This seems to account for the difference in the values that libxc supplies (obtained from get context)
and the one validated against by arch_set_info_guest
 arch_set_context validates cr3 and cr1 against the wrong values (the vcpu.cr[1/3]) while it should
 be validated against the value that results from the operation done in the if-else loop in arch_get_info_guest

I have verified this too, with both a 32bit domU and 64bit domU.

64-bit PV domU (2.6.39..)
--------------------------------------
get_vcpu_context(): (debug output from arch_get_info_guest)
(XEN) domctl.c:1707:d0  copying cr1 00000000
(XEN) domctl.c:1707:d0  copying cr3 827bd5000
(XEN) domctl.c:1743:d0 not pv_32on64, outgoing cr3 42b85b000, cur cr3 827bd5000
(XEN) domctl.c:1746:d0 not pv_32on64, outgoing cr1 42b85c000, cur cr1 00000000

set_vcpu_context(): (debug output from arch_set_info_guest)
(XEN) domain.c:893:d0 incoming cr3 42b85b000, cur cr3 827bd5000, fail = 1
(XEN) domain.c:901:d0 incoming cr1 42b85c000, cur cr1 00000000, !(flags & VGCF_in_kernel)=0,fail=1

32-bit PV domU (2.6.18)
----------------------------------
get_vcpu_context()
(XEN) domctl.c:1707:d0 copying cr1 00000000
(XEN) domctl.c:1707:d0 copying cr3 2960e008
(XEN) domctl.c:1758:d0 is pv_32on64, outgoing cr3 4f0ac004, cur cr3 2960e008

set_vcpu_context()
(XEN) domain.c:893:d0 incoming cr3 4f0ac004, cur cr3 2960e008, fail = 1


shriram
corresponding code:

bool_t fail = v->arch.pv_vcpu.ctrlreg[3] != c(ctrlreg[3]);
gdprintk(XENLOG_WARNING,
            "incoming cr3 %08lx, cur cr3 %08lx, fail = %d\n",
             c(ctrlreg[3]), v->arch.pv_vcpu.ctrlreg[3], fail);

#ifdef CONFIG_X86_64

if ( !compat )
{
      fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);
      gdprintk(XENLOG_WARNING,
                "incoming cr1 %08lx, cur cr1 %08lx, !(flags & VGCF_in_kernel)=%d,fail=%d\n",
                 c(ctrlreg[1]), v->arch.pv_vcpu.ctrlreg[1], !(flags & VGCF_in_kernel),fail);

      fail |= !v->arch.pv_vcpu.ctrlreg[1] && !(flags & VGCF_in_kernel);
...

shriram

The question is whether there are other inconsistencies lurking, and
hence whether it wouldn't be better to mark a vCPU on which setting
the context failed, not allowing it to resume or have its context
obtained anymore. That appears quite drastic though - Keir, what's
your opinion here?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>