This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] The mfn of the frame, that holds a mlock-ed PV domU user

To: Rafal Wojtczuk <rafal@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Mon, 19 Apr 2010 09:48:47 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, qubes-devel@xxxxxxxxxxxxxxxx
Delivery-date: Mon, 19 Apr 2010 09:49:27 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100419112520.GA18767@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20100412185454.GC3671@xxxxxxxxxxxxxxxxxxx> <4BC37C32.1060805@xxxxxxxx> <4BC380E2.8060605@xxxxxxxxxxxxxxxxxxxxxx> <4BC3850F.7070108@xxxxxxxx> <20100419112520.GA18767@xxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100330 Fedora/3.0.4-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.4
On 04/19/2010 04:25 AM, Rafal Wojtczuk wrote:
> On Mon, Apr 12, 2010 at 01:39:43PM -0700, Jeremy Fitzhardinge wrote:
>> But I assume you have other code which wants to grant through the
>> Xorg-allocated framebufer.  That complicates things a bit, but you could
>> still add a device (no /proc files, please) with an ioctl which:
>>    1. takes a range of usermode addresses
>>    2. increments the page refcount for those pages
> Or, do not decrement the count, by not calling put_page() ?
>>    3. returns the mfns for those pages
>> That will prevent the pages from being migrated while you're referring
>> to their mfns.
> After removing the call to put_page() in u2mfn_ioctl(), see once again
> http://gitweb.qubes-os.org/gitweb/?p=mainstream/gui.git;a=blob;f=vchan/u2mfn/u2mfn.c;h=6ff113c07c50ef078ab04d9e61d2faab338357e7;hb=HEAD#l35
> the page's mfn changed again.
> Even commenting out the kunmap() call in this function did not help, either.
> Am I missing something ?

It definitely shouldn't be possible to move a page with a non-zero
refcount.  So it looks like something else is going on there.  Even if
the process exits, those pages should remain in unusable limbo rather
than being freed and reallocated.

> The only working way (for the ring buffer case) is to acquire memory via 
> kmalloc and pass it to userspace via remap_pfn_range. But this is unsuitable 
> for the case of X composition buffers, because we don't want to alter the
> way X allocates memory (it calls plain malloc). We could hijack X's malloc()
> via LD_PRELOAD, but then we cannot distinguish which calls are made because
> of composition buffer allocation.

Yes.  Unfortunately that has its own set of problems.  For example, if
the X server wants to fork for some reason then you become subject to
the whims of COW as to what page is being used in which process.

But it seems to me you're operating at the wrong architectural level
here.  I fully understand your short-term goal is "get it working", but
I think you're going to want to revise this for v2.0.  Your architecture
is not very different from a standard CPU+GPU compositing setup, except
your "GPU" is actually dom0 (which of course may be really using the
GPU).  X should already have all the interfaces you need to efficiently
pass an application's compositing buffer to the "GPU" for rendering.

(Maybe you need to do a "Xen DRI" driver to implement this?)

>>  You need to add something to explicitly decrement the
>> refcount to prevent a memory leak, presumably at the time you tear down
>> the mapping in dom0.  Ideally you'd arrange to do that triggered off
>> unmap of the memory range (by isolating the pages in their own new vma)
>> so that it all gets cleaned up on process exit.
> By "triggered off unmap" do you mean setting the vm_ops field in struct 
> vm_area_struct to a custom struct vm_operations_struct (particularly, with a 
> custom close() method), or is there something simpler ?

Yes, that's what I had in mind.  You'd need to chop the VMA up to
isolate the virtual address range you want to apply the close to.  But
that assumes your range doesn't already have a close method of course;
it gets awkward if it does.

>> I'm not at all familiar with how X manages composition buffers, but it
>> seems to me that in normal use, one would want to be able to either
>> allocate that buffer in texture memory (so it can be used as a texture
>> source), or at least copy updates into texture memory.  Couldn't you
>> hook into that transfer to the composition hardware (ie, dom0)?
> We are talking about X running in domU; there is no related hardware.
> We can determine where the composition buffer is only after it has
> been allocated. 

(See above.)

>> No, kernel allocations are not movable by default.
> Could you mention a few details more on the related migration mechanism ?
> E.g. which PG_ flag (set by kmalloc) makes a page unmovable ? Preferably, 
> with pointers to relevant code ? 

__GFP_MOVABLE is the key thing to look at.  It causes page allocation to
allocate the page in a movable zone.  All user memory is allocated with
GFP_HIGHUSER_MOVABLE (in do_wp_page(), for example), which means that
the memory needn't be directly addressable by the kernel (HIGHUSER), and
can be moved or reclaimed when necessary (MOVABLE).

> I guess it is in linux/mm/migrate.c, but I am getting lost
> trying to figure out which parts are NUMA specific and which are not; and
> particularly, what triggers the migration.

TBH I've never really looked into the mechanisms of how it works.  But I
think mm/migrate.c is actually something else, relating to moving pages
around between NUMA nodes.

I had a quick look at it just now, and migration definitely seems to
happen on demand in the buddy_allocator (mm/page_alloc.c), if it can't
satisfy a memory request.  I don't know whether it tries to actively
move pages around to decrease fragmentation.

> Interestingly, Xorg guys claim X server does nothing special with the memory
> acquired by malloc() for the composition buffer. Yet, so far no corruption
> of the displayed images have been observed. Maybe a single page vma (that
> stores the ring buffer) is particularly attractive for the
> migration/defragmentation algorithm, and that is why it is easy to trigger
> its relocation (but not so with the composition buffer case) ? 

Hm, that doesn't ring true.  AFAIK all migration happens at the page
level with no reference to VMAs (though its possible that being mapped
into a process address space makes a page temporarily unmigratable, and
it needs to wait for something to shoot down/age out the ptes before
migrating the page).  Again, I'm not well versed in the details.

Its quite possible that the problem you're seeing has nothing to do with
page migration at all, and this is a goosechase.


Xen-devel mailing list