Re: [Xen-devel] [PATCH 2 of 6] REDO: mem_access & mem_access 2:

Hi, 

Thanks for the patches.  As Keir points out, we'll need them in a form
that applies cleanly (maybe send as attachments next time).  It's also
useful when reviewing patches if they're in 'diff -p' format.  In
mercurial, you can arrabnge that by adding these lines to your ~/.hgrc
file:

 [diff]
 showfunc = True

Further comments inline below.  Nothing too hard to address, I think.

At 22:07 +0000 on 04 Jan (1294178833), Joe Epstein wrote:
> diff -r a3cec4b94150 -r 06b0916eb91d xen/include/public/domctl.h
> --- a/xen/include/public/domctl.h       Tue Jan 04 10:30:15 2011 -0800
> +++ b/xen/include/public/domctl.h       Tue Jan 04 11:59:48 2011 -0800
> @@ -714,7 +714,7 @@
>  /*
>   * Page memory in and out.
>   */
> -#define XEN_DOMCTL_MEM_EVENT_OP_PAGING (1 << 0)
> +#define XEN_DOMCTL_MEM_EVENT_OP_PAGING            1
> 
>  /* Domain memory paging */
>  #define XEN_DOMCTL_MEM_EVENT_OP_PAGING_NOMINATE   0
> @@ -722,6 +722,12 @@
>  #define XEN_DOMCTL_MEM_EVENT_OP_PAGING_PREP       2
>  #define XEN_DOMCTL_MEM_EVENT_OP_PAGING_RESUME     3
> 
> +/*
> + * Access permissions
> + */
> +#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS            2
> +#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_RESUME     0

These could do with a nice big block comment that describes what they
do. 

[...]

> diff -r a3cec4b94150 -r 06b0916eb91d xen/include/asm-x86/mem_event.h
> --- a/xen/include/asm-x86/mem_event.h   Tue Jan 04 10:30:15 2011 -0800
> +++ b/xen/include/asm-x86/mem_event.h   Tue Jan 04 11:59:48 2011 -0800
> @@ -24,6 +24,8 @@
>  #ifndef __MEM_EVENT_H__
>  #define __MEM_EVENT_H__
> 
> +/* Returns true if a listener exists, else pauses VCPU */
> +int mem_event_check_listener(struct domain *d);

That's a pretty odd thing for a function to do.  I see below that the
only caller already knows there's no listener and ignores the return
value.   I think you can just get rid of this function entirely.

[...] 

> +
> +    /*
> +     * If this GFN is emulated MMIO or marked as read-only (which old
> MMIO is),
> +     * pass the fault to the mmio handler first.
> +     */

Why did you change this comment?  Pages marked as p2m_ram_ro are not
"old MMIO", they're ROM images &c, and go through the emulator so the
write can be correctly discarded. 

> +    if ( (p2mt == p2m_mmio_dm) || (p2mt == p2m_ram_ro) )
>      {
> -        /*
> -         * Page log dirty is always done with order 0. If this mfn resides in
> -         * a large page, we do not change other pages type within that large
> -         * page.
> -         */
> -        paging_mark_dirty(v->domain, mfn_x(mfn));
> -        p2m_change_type(p2m, gfn, p2m_ram_logdirty, p2m_ram_rw);
> +        if ( !handle_mmio() )
> +        {
> +            guest_fault = 1;
> +            goto check_access_handler;
> +        }
>          return 1;
>      }
> 
> -    /* Shouldn't happen: Maybe the guest was writing to a r/o grant mapping? 
> */
> -    if ( p2mt == p2m_grant_map_ro )
> +    /* Was it a write access: log-dirty, etc... */
> +    if ( access_w ) {
> +        /* PoD and log-dirty also take this path. */

Moving the log-dirty check inside the test for access_w reintroduces a
race condition in multi-vcpu systems (where the same log-dirty fault
happens on two CPUs and the first CPU has changed the type back to 
p2m_ram_rw by the time the second one looks up the type).  Please put
this case back unconditionally.

> +        if ( p2mt != p2m_ram_rw && p2m_is_ram(p2mt) )
> +        {
> +            /*
> +             * Page log dirty is always done with order 0. If this
> mfn resides in
> +             * a large page, we do not change other pages type within
> that large
> +             * page.
> +             */
> +            paging_mark_dirty(v->domain, mfn_x(mfn));
> +            p2m_change_type(p2m, gfn, p2m_ram_logdirty, p2m_ram_rw);
> +            return 1;
> +        }
> +
> +        /* Shouldn't happen: Maybe the guest was writing to a r/o
> grant mapping? */
> +        if ( p2mt == p2m_grant_map_ro )
> +        {
> +            gdprintk(XENLOG_WARNING,
> +                     "trying to write to read-only grant mapping\n");
> +            guest_fault = 1;
> +            goto check_access_handler;
> +        }
> +    } /* end access_w */
> +
> + check_access_handler:
> +    /* Even if it is the guest's "fault", check with the mem_event
> interface instead if
> +     * one is there */
> +    if ( p2m_mem_access_check(gpa, gla_valid, gla, access_r,
> access_w, access_x) )
> +        return 1;

p2m_mem_access_check() appears to _always_ return 1.  Was that the
intention? 

> +    /* If there is no handler, then fault if guest_fault = 1 */
> +    if ( guest_fault )
>      {
> -        gdprintk(XENLOG_WARNING,
> -                 "trying to write to read-only grant mapping\n");
>          hvm_inject_exception(TRAP_gp_fault, 0, 0);
>          return 1;
>      }
> 
> +    /* Nothing handled it: it must be an access error with no memory
> handler, so fail */
>      return 0;
>  }
> 
> diff -r a3cec4b94150 -r 06b0916eb91d xen/arch/x86/hvm/svm/svm.c
> --- a/xen/arch/x86/hvm/svm/svm.c        Tue Jan 04 10:30:15 2011 -0800
> +++ b/xen/arch/x86/hvm/svm/svm.c        Tue Jan 04 11:59:48 2011 -0800
> @@ -979,7 +979,7 @@
>          __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
>      }
> 
> -    if ( hvm_hap_nested_page_fault(gfn) )
> +    if ( hvm_hap_nested_page_fault(gpa, 0, ~0ull, 0, 0, 0) )

Surely this totally breaks the AMD NPT case -- you need to get the
access-type arguments right or else not rely on them in
hvm_hap_nested_page_fault().

[...]

> diff -r a3cec4b94150 -r 06b0916eb91d xen/arch/x86/mm/mem_access.c
> --- /dev/null   Thu Jan 01 00:00:00 1970 +0000
> +++ b/xen/arch/x86/mm/mem_access.c      Tue Jan 04 11:59:48 2011 -0800
> @@ -0,0 +1,59 @@
> +/******************************************************************************
> + * arch/x86/mm/mem_access.c
> + *
> + * Memory access support.
> + *
> + * Copyright (c) 2009 Citrix Systems, Inc.

Eh, probably not. :)

[...]

> diff -r a3cec4b94150 -r 06b0916eb91d xen/arch/x86/mm/p2m.c
> --- a/xen/arch/x86/mm/p2m.c     Tue Jan 04 10:30:15 2011 -0800
> +++ b/xen/arch/x86/mm/p2m.c     Tue Jan 04 11:59:48 2011 -0800
> @@ -2853,6 +2853,98 @@
>  }
>  #endif /* __x86_64__ */
> 
> +int p2m_mem_access_check(unsigned long gpa, bool_t gla_valid,
> unsigned long gla,
> +                         bool_t access_r, bool_t access_w, bool_t access_x)
> +{
> +    struct vcpu *v = current;
> +    mem_event_request_t req;
> +    unsigned long gfn = gpa >> PAGE_SHIFT;
> +    struct domain *d = v->domain;
> +    struct p2m_domain* p2m = p2m_get_hostp2m(d);
> +    int res;
> +    mfn_t mfn;
> +    p2m_type_t p2mt;
> +    p2m_access_t p2ma;
> +
> +    /* First, handle rx2rw conversion automatically */
> +    mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, p2m_query);
> +
> +    if ( access_w && p2ma == p2m_access_rx2rw ) {
> +        p2m_lock(p2m);
> +        p2m->set_entry(p2m, gfn, mfn, 0, p2mt, p2m_access_rw);

Might be better to use p2m_change_type here; it's atomic so avoids
potential races. 

> +        p2m_unlock(p2m);
> +
> +        return 1;  /* handled */
> +    }
> +
> +    /* Otherwise, check if there is a memory event listener, and send
> the message along */
> +    res = mem_event_check_ring(d);
> +    if ( res < 0 )
> +    {
> +        /* No listener */
> +        if ( p2m->access_required )
> +        {
> +            printk(XENLOG_INFO
> +                   "Memory access permissions failure, no mem_event
> listener: pausing VCPU %d, dom %d\n",
> +                   v->vcpu_id, d->domain_id);
> +
> +            /* Will pause the VCPU */
> +            (void) mem_event_check_listener(d);

Why does this need a special case?  Could you just post the violation 
to the ring and pause as normal, and then if a listener ever arrives it
can handle it?

In fact, I don't see why access_required needs to be a separate flag at
all - it's implied by setting the access permissions on a page or
setting the default to anything other than rwx.  That would address
Keir's concern about adding a separate domcrf flag.

> +        }
> +        else
> +        {
> +            /* A listener is not required, so clear the access restrictions 
> */
> +            p2m_lock(p2m);
> +            p2m->set_entry(p2m, gfn, mfn, 0, p2mt, p2m_access_rwx);
> +            p2m_unlock(p2m);
> +        }
> +
> +        return 1;
> +    }
> +    else if ( res > 0 )
> +        return 1;  /* No space in buffer */

DYM "return 0" here?  I think this function needs a comment describing
what the return value means. 

[...]
> diff -r a3cec4b94150 -r 06b0916eb91d xen/include/asm-x86/mem_access.h
> --- /dev/null   Thu Jan 01 00:00:00 1970 +0000
> +++ b/xen/include/asm-x86/mem_access.h  Tue Jan 04 11:59:48 2011 -0800
> @@ -0,0 +1,35 @@
> +/******************************************************************************
> + * include/asm-x86/mem_paging.h
> + *
> + * Memory access support.
> + *
> + * Copyright (c) 2009 Citrix Systems, Inc.

Again, no. 

[...]

> diff -r a3cec4b94150 -r 06b0916eb91d tools/xenpaging/xenpaging.c
> --- a/tools/xenpaging/xenpaging.c       Tue Jan 04 10:30:15 2011 -0800
> +++ b/tools/xenpaging/xenpaging.c       Tue Jan 04 11:59:48 2011 -0800
> @@ -658,7 +658,7 @@
>              {
>                  DPRINTF("page already populated (domain = %d; vcpu = %d;"
>                          " p2mt = %x;"
> -                        " gfn = %"PRIx64"; paused = %"PRId64")\n",
> +                        " gfn = %"PRIx64"; paused = %d)\n",
>                          paging->mem_event.domain_id, req.vcpu_id,
>                          req.p2mt,
>                          req.gfn, req.flags & MEM_EVENT_FLAG_VCPU_PAUSED);
> 

this belongs in a separate patch; it's not part of the change described
at the top.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [PATCH 2 of 6] REDO: mem_access & mem_access 2: mem even