WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH 2 of 6] REDO: mem_access & mem_access 2: mem even

To: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 2 of 6] REDO: mem_access & mem_access 2: mem event additions for access
From: Joe Epstein <jepstein@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 5 Jan 2011 06:47:37 -0800
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 05 Jan 2011 06:48:45 -0800
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:received :in-reply-to:references:from:date:x-google-sender-auth:message-id :subject:to:cc:content-type; bh=ny++GEiniQVVAowNwKBBcFTlMRKAztDrXv+rc9I+NBc=; b=P5QA4Y1oxX6i3gGicdAbUjUuiGPO5CXl+AUtMTBhLoEUbVSN3IWHCKZCctztpG7WeJ 5WKmcNVuqlO0mxzcdPqZwcxE5upDm+WChCfM6lTFT8Vpei3JO47EjPcLcKuqNVMyILNt jls9TwlaJO1pfpv72xWFgYiTvDWhztxy+lAOU=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; b=crHUYKWuQnNO0r2D9TYzomeq2x7zUeT9AaXAI40Eqn9JQqkHdSkZVsTpA252R6vHTM KEUn5Mk2VJ6QgbEEvC35lvKDo+yahQozU/CRxjyMGTQgJV2kR4eox5AuUh56uqXbQpCF rxKDt1xUCrAhrFIz5BYUczWkINdzBbI+Ox9fQ=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110105141519.GD21948@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTikGw=9=2jM8H5mFk9-v1DLzj_UUwSfhwWEG2wvD@xxxxxxxxxxxxxx> <20110105141519.GD21948@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Tim,

Thanks for the quick response.  I have some questions and comments, which I put in line.  As you mentioned, I think this should be fairly easy to knock out.

On Wed, Jan 5, 2011 at 6:15 AM, Tim Deegan <Tim.Deegan@xxxxxxxxxx> wrote:
Hi,

Thanks for the patches.  As Keir points out, we'll need them in a form
that applies cleanly (maybe send as attachments next time).  It's also
useful when reviewing patches if they're in 'diff -p' format.  In
mercurial, you can arrabnge that by adding these lines to your ~/.hgrc
file:

 [diff]
 showfunc = True

Will do
 

Further comments inline below.  Nothing too hard to address, I think.

At 22:07 +0000 on 04 Jan (1294178833), Joe Epstein wrote:
> diff -r a3cec4b94150 -r 06b0916eb91d xen/include/public/domctl.h
> --- a/xen/include/public/domctl.h       Tue Jan 04 10:30:15 2011 -0800
> +++ b/xen/include/public/domctl.h       Tue Jan 04 11:59:48 2011 -0800
> @@ -714,7 +714,7 @@
>  /*
>   * Page memory in and out.
>   */
> -#define XEN_DOMCTL_MEM_EVENT_OP_PAGING (1 << 0)
> +#define XEN_DOMCTL_MEM_EVENT_OP_PAGING            1
>
>  /* Domain memory paging */
>  #define XEN_DOMCTL_MEM_EVENT_OP_PAGING_NOMINATE   0
> @@ -722,6 +722,12 @@
>  #define XEN_DOMCTL_MEM_EVENT_OP_PAGING_PREP       2
>  #define XEN_DOMCTL_MEM_EVENT_OP_PAGING_RESUME     3
>
> +/*
> + * Access permissions
> + */
> +#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS            2
> +#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_RESUME     0

These could do with a nice big block comment that describes what they
do.

Will do
 

[...]

> diff -r a3cec4b94150 -r 06b0916eb91d xen/include/asm-x86/mem_event.h
> --- a/xen/include/asm-x86/mem_event.h   Tue Jan 04 10:30:15 2011 -0800
> +++ b/xen/include/asm-x86/mem_event.h   Tue Jan 04 11:59:48 2011 -0800
> @@ -24,6 +24,8 @@
>  #ifndef __MEM_EVENT_H__
>  #define __MEM_EVENT_H__
>
> +/* Returns true if a listener exists, else pauses VCPU */
> +int mem_event_check_listener(struct domain *d);

That's a pretty odd thing for a function to do.  I see below that the
only caller already knows there's no listener and ignores the return
value.   I think you can just get rid of this function entirely.


True.  I had this as a separate function only in case others found it useful in the future, and as a bit of a layer separation, where knowledge of the _VPF_mem_event bit would happen only in mem_event.c.  If you don't mind me breaking that, I can move it over.
 
[...]

> +
> +    /*
> +     * If this GFN is emulated MMIO or marked as read-only (which old
> MMIO is),
> +     * pass the fault to the mmio handler first.
> +     */

Why did you change this comment?  Pages marked as p2m_ram_ro are not
"old MMIO", they're ROM images &c, and go through the emulator so the
write can be correctly discarded.

No idea.  :)  I'll revert that comment.
 

> +    if ( (p2mt == p2m_mmio_dm) || (p2mt == p2m_ram_ro) )
>      {
> -        /*
> -         * Page log dirty is always done with order 0. If this mfn resides in
> -         * a large page, we do not change other pages type within that large
> -         * page.
> -         */
> -        paging_mark_dirty(v->domain, mfn_x(mfn));
> -        p2m_change_type(p2m, gfn, p2m_ram_logdirty, p2m_ram_rw);
> +        if ( !handle_mmio() )
> +        {
> +            guest_fault = 1;
> +            goto check_access_handler;
> +        }
>          return 1;
>      }
>
> -    /* Shouldn't happen: Maybe the guest was writing to a r/o grant mapping? */
> -    if ( p2mt == p2m_grant_map_ro )
> +    /* Was it a write access: log-dirty, etc... */
> +    if ( access_w ) {
> +        /* PoD and log-dirty also take this path. */

Moving the log-dirty check inside the test for access_w reintroduces a
race condition in multi-vcpu systems (where the same log-dirty fault
happens on two CPUs and the first CPU has changed the type back to
p2m_ram_rw by the time the second one looks up the type).  Please put
this case back unconditionally.

Question: the conditional I'm replacing just simply says   if ( p2m_is_ram(p2mt) ).  I have to better qualify that to know that this was an access to a page based on its type and not based on its access permissions.  Otherwise, we'll either take memory events on logdirty, etc., or we'll miss real ones that got sent to change the page type instead.  What is the right conditional then, besides access_w, that will direct the handling appropriately? 
 

> +        if ( p2mt != p2m_ram_rw && p2m_is_ram(p2mt) )
> +        {
> +            /*
> +             * Page log dirty is always done with order 0. If this
> mfn resides in
> +             * a large page, we do not change other pages type within
> that large
> +             * page.
> +             */
> +            paging_mark_dirty(v->domain, mfn_x(mfn));
> +            p2m_change_type(p2m, gfn, p2m_ram_logdirty, p2m_ram_rw);
> +            return 1;
> +        }
> +
> +        /* Shouldn't happen: Maybe the guest was writing to a r/o
> grant mapping? */
> +        if ( p2mt == p2m_grant_map_ro )
> +        {
> +            gdprintk(XENLOG_WARNING,
> +                     "trying to write to read-only grant mapping\n");
> +            guest_fault = 1;
> +            goto check_access_handler;
> +        }
> +    } /* end access_w */
> +
> + check_access_handler:
> +    /* Even if it is the guest's "fault", check with the mem_event
> interface instead if
> +     * one is there */
> +    if ( p2m_mem_access_check(gpa, gla_valid, gla, access_r,
> access_w, access_x) )
> +        return 1;

p2m_mem_access_check() appears to _always_ return 1.  Was that the
intention?

Will change.  Originally, the thought was that if there were no memory listener, then crash the guest.  But that was changed, and the interface wasn't changed as well.


> +    /* If there is no handler, then fault if guest_fault = 1 */
> +    if ( guest_fault )
>      {
> -        gdprintk(XENLOG_WARNING,
> -                 "trying to write to read-only grant mapping\n");
>          hvm_inject_exception(TRAP_gp_fault, 0, 0);
>          return 1;
>      }
>
> +    /* Nothing handled it: it must be an access error with no memory
> handler, so fail */
>      return 0;
>  }
>
> diff -r a3cec4b94150 -r 06b0916eb91d xen/arch/x86/hvm/svm/svm.c
> --- a/xen/arch/x86/hvm/svm/svm.c        Tue Jan 04 10:30:15 2011 -0800
> +++ b/xen/arch/x86/hvm/svm/svm.c        Tue Jan 04 11:59:48 2011 -0800
> @@ -979,7 +979,7 @@
>          __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
>      }
>
> -    if ( hvm_hap_nested_page_fault(gfn) )
> +    if ( hvm_hap_nested_page_fault(gpa, 0, ~0ull, 0, 0, 0) )

Surely this totally breaks the AMD NPT case -- you need to get the
access-type arguments right or else not rely on them in
hvm_hap_nested_page_fault().

Good pont! :)  That will get fixed.

[...]

> diff -r a3cec4b94150 -r 06b0916eb91d xen/arch/x86/mm/mem_access.c
> --- /dev/null   Thu Jan 01 00:00:00 1970 +0000
> +++ b/xen/arch/x86/mm/mem_access.c      Tue Jan 04 11:59:48 2011 -0800
> @@ -0,0 +1,59 @@
> +/******************************************************************************
> + * arch/x86/mm/mem_access.c
> + *
> + * Memory access support.
> + *
> + * Copyright (c) 2009 Citrix Systems, Inc.

Eh, probably not. :)

[...]

Hmm.  It's a direct copy of mem_paging.c, which is copyright Citrix.  Since I changed only two lines out of it, it seemed rather inappropriate to claim copyright on derivative work.  But, if you'd like, I can make that change to claim it for ourselves.


> diff -r a3cec4b94150 -r 06b0916eb91d xen/arch/x86/mm/p2m.c
> --- a/xen/arch/x86/mm/p2m.c     Tue Jan 04 10:30:15 2011 -0800
> +++ b/xen/arch/x86/mm/p2m.c     Tue Jan 04 11:59:48 2011 -0800
> @@ -2853,6 +2853,98 @@
>  }
>  #endif /* __x86_64__ */
>
> +int p2m_mem_access_check(unsigned long gpa, bool_t gla_valid,
> unsigned long gla,
> +                         bool_t access_r, bool_t access_w, bool_t access_x)
> +{
> +    struct vcpu *v = current;
> +    mem_event_request_t req;
> +    unsigned long gfn = gpa >> PAGE_SHIFT;
> +    struct domain *d = v->domain;
> +    struct p2m_domain* p2m = p2m_get_hostp2m(d);
> +    int res;
> +    mfn_t mfn;
> +    p2m_type_t p2mt;
> +    p2m_access_t p2ma;
> +
> +    /* First, handle rx2rw conversion automatically */
> +    mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, p2m_query);
> +
> +    if ( access_w && p2ma == p2m_access_rx2rw ) {
> +        p2m_lock(p2m);
> +        p2m->set_entry(p2m, gfn, mfn, 0, p2mt, p2m_access_rw);

Might be better to use p2m_change_type here; it's atomic so avoids
potential races.

Question: p2m_change_type doesn't have the access permissions: only set_entry does, so I can't use it.  Should I rather just move p2m_lock above the get_entry?

> +        p2m_unlock(p2m);
> +
> +        return 1;  /* handled */
> +    }
> +
> +    /* Otherwise, check if there is a memory event listener, and send
> the message along */
> +    res = mem_event_check_ring(d);
> +    if ( res < 0 )
> +    {
> +        /* No listener */
> +        if ( p2m->access_required )
> +        {
> +            printk(XENLOG_INFO
> +                   "Memory access permissions failure, no mem_event
> listener: pausing VCPU %d, dom %d\n",
> +                   v->vcpu_id, d->domain_id);
> +
> +            /* Will pause the VCPU */
> +            (void) mem_event_check_listener(d);

Why does this need a special case?  Could you just post the violation
to the ring and pause as normal, and then if a listener ever arrives it
can handle it?

In fact, I don't see why access_required needs to be a separate flag at
all - it's implied by setting the access permissions on a page or
setting the default to anything other than rwx.  That would address
Keir's concern about adding a separate domcrf flag.
 
The idea is to have two modes: fail-closed, and fail-open.  If access_required is true, then the VCPU must pause.  But if it is false, I want to revert the access flags and let the VCPU chug along.  That's to make sure that life works if the memory event handler crashes, of course, and explains why the page type is set to rwx and not default_access in that case.

So I'll add a DOMCTL.


> +        }
> +        else
> +        {
> +            /* A listener is not required, so clear the access restrictions */
> +            p2m_lock(p2m);
> +            p2m->set_entry(p2m, gfn, mfn, 0, p2mt, p2m_access_rwx);
> +            p2m_unlock(p2m);
> +        }
> +
> +        return 1;
> +    }
> +    else if ( res > 0 )
> +        return 1;  /* No space in buffer */

DYM "return 0" here?  I think this function needs a comment describing
what the return value means.

Hmm... it should be return 1, because no space in buffer is satisfied (no error condition) by pausing the VCPU.  I'm just going to remove the return code entirely.

[...]
> diff -r a3cec4b94150 -r 06b0916eb91d xen/include/asm-x86/mem_access.h
> --- /dev/null   Thu Jan 01 00:00:00 1970 +0000
> +++ b/xen/include/asm-x86/mem_access.h  Tue Jan 04 11:59:48 2011 -0800
> @@ -0,0 +1,35 @@
> +/******************************************************************************
> + * include/asm-x86/mem_paging.h
> + *
> + * Memory access support.
> + *
> + * Copyright (c) 2009 Citrix Systems, Inc.

Again, no.

[...]

> diff -r a3cec4b94150 -r 06b0916eb91d tools/xenpaging/xenpaging.c
> --- a/tools/xenpaging/xenpaging.c       Tue Jan 04 10:30:15 2011 -0800
> +++ b/tools/xenpaging/xenpaging.c       Tue Jan 04 11:59:48 2011 -0800
> @@ -658,7 +658,7 @@
>              {
>                  DPRINTF("page already populated (domain = %d; vcpu = %d;"
>                          " p2mt = %x;"
> -                        " gfn = %"PRIx64"; paused = %"PRId64")\n",
> +                        " gfn = %"PRIx64"; paused = %d)\n",
>                          paging->mem_event.domain_id, req.vcpu_id,
>                          req.p2mt,
>                          req.gfn, req.flags & MEM_EVENT_FLAG_VCPU_PAUSED);
>

this belongs in a separate patch; it's not part of the change described
at the top.

Are you sure?  I changed the flag type from u64 to u16, so that breaks this DPRINTF. 


Cheers,

Tim.

--
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

Thanks!
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel