[Xen-devel] Re: [PATCH 1/3] libxc: add xc_gnttab_map_grant_ref_n

On 09/22/2011 04:32 AM, Ian Campbell wrote:
> On Wed, 2011-09-21 at 18:07 +0100, Daniel De Graaf wrote:
>> On 09/21/2011 11:25 AM, Ian Campbell wrote:
> 
>>> When I map a page how do I know what the offset of it is wrt the file
>>> descriptor? DO I just have to remember how many pages I mapped an *=
>>> 4096?
>>
>> You had the offset at the time you mapped it, because that's what you
>> passed in the offset parameter to mmap(). Just don't lose the number :)
> 
> So I guess my followup question is where does the number I pass to mmap
> come from...
> 
> /me scrobbles in the code.
> 
> Aha, so it is an output from the gntdev/gntalloc ioctls. So how about:
> 
> /* IN parameters */
> /*
>  * Offset in the file descriptor for a byte within the page. This offset
>  * is the result of the IOCTL_GNTDEV_MAP_GRANT_REF and is the same as
>  * is used with mmap().
>  */
> 

Sounds good.

>>>>>> +};
>>>>>> +
>>>>>> +/* Clear (set to zero) the byte specified by index */
>>>>>> +#define UNMAP_NOTIFY_CLEAR_BYTE 0x1
>>>>>> +/* Send an interrupt on the indicated event channel */
>>>>>> +#define UNMAP_NOTIFY_SEND_EVENT 0x2
>>>>>> +
>>>>>>  #endif /* __LINUX_PUBLIC_GNTDEV_H__ */
>>>>>> diff --git a/tools/libxc/xc_gnttab.c b/tools/libxc/xc_gnttab.c
>>>>>> index 4f55fce..3d3c63b 100644
>>>>>> --- a/tools/libxc/xc_gnttab.c
>>>>>> +++ b/tools/libxc/xc_gnttab.c
>>>>>> @@ -18,6 +18,7 @@
>>>>>>   */
>>>>>>  
>>>>>>  #include "xc_private.h"
>>>>>> +#include <errno.h>
>>>>>>  
>>>>>>  int xc_gnttab_op(xc_interface *xch, int cmd, void * op, int op_size, 
>>>>>> int count)
>>>>>>  {
>>>>>> @@ -174,6 +175,28 @@ void *xc_gnttab_map_domain_grant_refs(xc_gnttab 
>>>>>> *xcg,
>>>>>>                                                          count, domid, 
>>>>>> refs, prot);
>>>>>>  }
>>>>>>  
>>>>>> +void *xc_gnttab_map_grant_ref_notify(xc_gnttab *xcg,
>>>>>> +                                     uint32_t domid,
>>>>>> +                                     uint32_t ref,
>>>>>> +                                     uint32_t notify_offset,
>>>>>> +                                     evtchn_port_t notify_port,
>>>>>> +                                     int *notify_result)
>>>>>> +{
>>>>>> +        if (xcg->ops->u.gnttab.map_grant_ref_notify)
>>>>>> +                return xcg->ops->u.gnttab.map_grant_ref_notify(xcg, 
>>>>>> xcg->ops_handle,
>>>>>> +                        domid, ref, notify_offset, notify_port, 
>>>>>> notify_result);
>>>>>> +        else {
>>>>>> +                void* area = xcg->ops->u.gnttab.map_grant_ref(xcg, 
>>>>>> xcg->ops_handle,
>>>>>> +                        domid, ref, PROT_READ|PROT_WRITE);
>>>>>> +                if (area && notify_result) {
>>>>>> +                        *notify_result = -1;
>>>>>> +                        errno = ENOSYS;
>>>>>> +                }
>>>>>> +                return area;
>>>>>> +        }
>>>>>
>>>>> I think the new public interface is fine but do we really need a new
>>>>> internal interface here?
>>>>>
>>>>> I think you can just add the notify_* arguments to the existing OSDEP
>>>>> function and have those OS backends which don't implement that feature
>>>>> return ENOSYS if notify_offset != 0 (or ~0 or whatever invalid value
>>>>> works).
>>>>>
>>>>> Why doesn't the *_notify variant take a prot argument?
>>>>
>>>> At least for the byte-clear portion of the notify, the page must be 
>>>> writable
>>>> or the notify will not work. I suppose an event channel alone could be used
>>>> for a read-only mapping, but normally the unmapping of a read-only mapping 
>>>> is
>>>> not an important event - although I guess you could contrive a use for this
>>>> type of notificaiton, so there's no reason not to allow it.
>>>
>>> If you combine this the two functions then returning EINVAL for attempts
>>> to map without PROT_WRITE (or whatever perm is necessary) would be
>>> reasonable IMHO.
>>>
>>
>> The ioctl already prevents you from requesting the impossible, so this should
>> just work.
> 
> Even better. I see the check in the gntdev driver but not in the
> gntalloc one, is that right?
> 

Yes. The unmap notification will always work in gntalloc because the
shared page is owned locally, and is always writable there; read-only
refers to the mapping domain's ability to write.

> 
>>> I hadn't noticed that we have both map_gratn_ref and map_grant_refs,
>>> that seems like an area which could be cleaned up. (not saying you
>>> should, just noticing it)
>>
>> Since I'm already rewriting the osdep layer functions, I think I can replace
>> all 3-4 existing map functions with a single function.
> 
> Awesome, thanks!
> 
>> It looks like the current 2.6.32.x xen kernels also aren't returning EAGAIN,
>> so I'm unsure as to why this support was added. The commit in question is
>> 20689:fe42b16855aa by Grzegorz Milos (committed by Keir Fraser), but I don't
>> see any discussion on xen-devel for it.
> 
> IIRC it was related to the page sharing patches which can cause grant
> hypercalls to return EAGAIN if the granted page is swapped or shared. I
> think this can only happen to backend/dom0 type operations.
> 
> I think the reason it needs to return to the guest is that the paging
> daemon may be in the same domain and having the only vcpu block in the
> hypercall would deadlock.
> 
>>  It's also unclear why repeating the
>> request every millisecond in an infinite loop is better than letting the
>> caller handle an -EAGAIN.
> 
> Yeah, the millisecond thing is pretty gross, something like
> sched_yield() might be a bit more palatable (if it's portable enough,
> although sleep could be a fallback if not)
> 
> I suppose that hiding the EAGAIN handling in the library was just
> thought to be convenient, compared with changing all the existing users.
> 
> Ian.
> 

It sounds to me like this is best solved in the kernel, although it would
still have to invoke some kind of yield operation since I assume the kernel
can't tell when the hypercall will not block (ideally you would be able to
put the invoking process to sleep pending the cross-domain page fault).

For now, it sounds like the best solution is to keep the usleep-based loop
in for all gntdev invocations. Using sched_yield will cause the CPU to spin
while the page is pending from disk or possibly while waiting for dom0 to
be scheduled (assuming a domU-domU vchan).

-- 
Daniel De Graaf
National Security Agency

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: [PATCH 1/3] libxc: add xc_gnttab_map_grant_ref_notify