On 09/22/2011 04:32 AM, Ian Campbell wrote:
> On Wed, 2011-09-21 at 18:07 +0100, Daniel De Graaf wrote:
>> On 09/21/2011 11:25 AM, Ian Campbell wrote:
>
>>> When I map a page how do I know what the offset of it is wrt the file
>>> descriptor? DO I just have to remember how many pages I mapped an *=
>>> 4096?
>>
>> You had the offset at the time you mapped it, because that's what you
>> passed in the offset parameter to mmap(). Just don't lose the number :)
>
> So I guess my followup question is where does the number I pass to mmap
> come from...
>
> /me scrobbles in the code.
>
> Aha, so it is an output from the gntdev/gntalloc ioctls. So how about:
>
> /* IN parameters */
> /*
> * Offset in the file descriptor for a byte within the page. This offset
> * is the result of the IOCTL_GNTDEV_MAP_GRANT_REF and is the same as
> * is used with mmap().
> */
>
Sounds good.
>>>>>> +};
>>>>>> +
>>>>>> +/* Clear (set to zero) the byte specified by index */
>>>>>> +#define UNMAP_NOTIFY_CLEAR_BYTE 0x1
>>>>>> +/* Send an interrupt on the indicated event channel */
>>>>>> +#define UNMAP_NOTIFY_SEND_EVENT 0x2
>>>>>> +
>>>>>> #endif /* __LINUX_PUBLIC_GNTDEV_H__ */
>>>>>> diff --git a/tools/libxc/xc_gnttab.c b/tools/libxc/xc_gnttab.c
>>>>>> index 4f55fce..3d3c63b 100644
>>>>>> --- a/tools/libxc/xc_gnttab.c
>>>>>> +++ b/tools/libxc/xc_gnttab.c
>>>>>> @@ -18,6 +18,7 @@
>>>>>> */
>>>>>>
>>>>>> #include "xc_private.h"
>>>>>> +#include <errno.h>
>>>>>>
>>>>>> int xc_gnttab_op(xc_interface *xch, int cmd, void * op, int op_size,
>>>>>> int count)
>>>>>> {
>>>>>> @@ -174,6 +175,28 @@ void *xc_gnttab_map_domain_grant_refs(xc_gnttab
>>>>>> *xcg,
>>>>>> count, domid,
>>>>>> refs, prot);
>>>>>> }
>>>>>>
>>>>>> +void *xc_gnttab_map_grant_ref_notify(xc_gnttab *xcg,
>>>>>> + uint32_t domid,
>>>>>> + uint32_t ref,
>>>>>> + uint32_t notify_offset,
>>>>>> + evtchn_port_t notify_port,
>>>>>> + int *notify_result)
>>>>>> +{
>>>>>> + if (xcg->ops->u.gnttab.map_grant_ref_notify)
>>>>>> + return xcg->ops->u.gnttab.map_grant_ref_notify(xcg,
>>>>>> xcg->ops_handle,
>>>>>> + domid, ref, notify_offset, notify_port,
>>>>>> notify_result);
>>>>>> + else {
>>>>>> + void* area = xcg->ops->u.gnttab.map_grant_ref(xcg,
>>>>>> xcg->ops_handle,
>>>>>> + domid, ref, PROT_READ|PROT_WRITE);
>>>>>> + if (area && notify_result) {
>>>>>> + *notify_result = -1;
>>>>>> + errno = ENOSYS;
>>>>>> + }
>>>>>> + return area;
>>>>>> + }
>>>>>
>>>>> I think the new public interface is fine but do we really need a new
>>>>> internal interface here?
>>>>>
>>>>> I think you can just add the notify_* arguments to the existing OSDEP
>>>>> function and have those OS backends which don't implement that feature
>>>>> return ENOSYS if notify_offset != 0 (or ~0 or whatever invalid value
>>>>> works).
>>>>>
>>>>> Why doesn't the *_notify variant take a prot argument?
>>>>
>>>> At least for the byte-clear portion of the notify, the page must be
>>>> writable
>>>> or the notify will not work. I suppose an event channel alone could be used
>>>> for a read-only mapping, but normally the unmapping of a read-only mapping
>>>> is
>>>> not an important event - although I guess you could contrive a use for this
>>>> type of notificaiton, so there's no reason not to allow it.
>>>
>>> If you combine this the two functions then returning EINVAL for attempts
>>> to map without PROT_WRITE (or whatever perm is necessary) would be
>>> reasonable IMHO.
>>>
>>
>> The ioctl already prevents you from requesting the impossible, so this should
>> just work.
>
> Even better. I see the check in the gntdev driver but not in the
> gntalloc one, is that right?
>
Yes. The unmap notification will always work in gntalloc because the
shared page is owned locally, and is always writable there; read-only
refers to the mapping domain's ability to write.
>
>>> I hadn't noticed that we have both map_gratn_ref and map_grant_refs,
>>> that seems like an area which could be cleaned up. (not saying you
>>> should, just noticing it)
>>
>> Since I'm already rewriting the osdep layer functions, I think I can replace
>> all 3-4 existing map functions with a single function.
>
> Awesome, thanks!
>
>> It looks like the current 2.6.32.x xen kernels also aren't returning EAGAIN,
>> so I'm unsure as to why this support was added. The commit in question is
>> 20689:fe42b16855aa by Grzegorz Milos (committed by Keir Fraser), but I don't
>> see any discussion on xen-devel for it.
>
> IIRC it was related to the page sharing patches which can cause grant
> hypercalls to return EAGAIN if the granted page is swapped or shared. I
> think this can only happen to backend/dom0 type operations.
>
> I think the reason it needs to return to the guest is that the paging
> daemon may be in the same domain and having the only vcpu block in the
> hypercall would deadlock.
>
>> It's also unclear why repeating the
>> request every millisecond in an infinite loop is better than letting the
>> caller handle an -EAGAIN.
>
> Yeah, the millisecond thing is pretty gross, something like
> sched_yield() might be a bit more palatable (if it's portable enough,
> although sleep could be a fallback if not)
>
> I suppose that hiding the EAGAIN handling in the library was just
> thought to be convenient, compared with changing all the existing users.
>
> Ian.
>
It sounds to me like this is best solved in the kernel, although it would
still have to invoke some kind of yield operation since I assume the kernel
can't tell when the hypercall will not block (ideally you would be able to
put the invoking process to sleep pending the cross-domain page fault).
For now, it sounds like the best solution is to keep the usleep-based loop
in for all gntdev invocations. Using sched_yield will cause the CPU to spin
while the page is pending from disk or possibly while waiting for dom0 to
be scheduled (assuming a domU-domU vchan).
--
Daniel De Graaf
National Security Agency
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|