WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH 00/17] Netchannel2 for a modern git kernel

To: Steven Smith <steven.smith@xxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH 00/17] Netchannel2 for a modern git kernel
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Fri, 23 Oct 2009 15:06:15 -0700
Cc: Steven Smith <Steven.Smith@xxxxxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <Keir.Fraser@xxxxxxxxxxxxx>, "joserenato.santos@xxxxxx" <joserenato.santos@xxxxxx>
Delivery-date: Fri, 23 Oct 2009 15:06:45 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20091020094049.GA23358@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <cover.1254667618.git.ssmith@xxxxxxxxxxxxxxxxxxxxxxxxxx> <4AC92B44.5020208@xxxxxxxx> <20091005092937.GA1036@xxxxxxxxxxxxxxxxxxxxxxxxxx> <4ACA638A.5060209@xxxxxxxx> <20091006090616.GA21511@xxxxxxxxxxxxxxxxxxxxxxxxxx> <4ACB7909.7000804@xxxxxxxx> <20091007081510.GA14268@xxxxxxxxxxxxxxxxxxxxxxxxxx> <4ADD51E4.2060901@xxxxxxxx> <20091020094049.GA23358@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20091014 Fedora/3.0-2.8.b4.fc11 Lightning/1.0pre Thunderbird/3.0b4
On 10/20/09 02:40, Steven Smith wrote:
> It might make sense to send an initial version which doesn't support
> receiver-map mode first, because that avoids the whole PG_foreign
> issue.  It'd be a bit slow, but it would work, and it'd be properly
> cross-compatible with a receiver-map capable version.
>   

I don't think there's any particular rush to get it upstream.  Getting
the core stuff up is more important; if there's a set of relatively
self-contained driver-like patches out of tree, then that's fairly easy
to manage (though getting everything upstream is the ultimate goal).

>>> The NC2 approach is basically similar to the NC1 approach, but
>>> generalised so that NC1 and NC2 can cooperate in a reasonably sane
>>> way.  It still uses the PG_foreign bit to identify foreign pages, and
>>> the page->private and page->mapping fields for various bits of
>>> information.
>>>       
>> Unfortunately the PG_foreign approach is a non-starter for upstream,
>> mainly because adding new page flags is strongly frowned upon unless
>> there's a very compelling reason.  Unless we can find some other kernel
>> subsystems which can make use of a page destructor, we probably won't
>> make the cut.  (It doesn't help that there are no more page flags left
>> on 32-bit.)
>>     
> Yeah, I didn't think that was going to go very far.
>
> It might be possible to do something like:
>
> 1) Create a special struct address_space somewhere.  This wouldn't
>    really do anything, but would just act as a placeholder.
> 2) Whenever we would normally set PG_foreign, set page->mapping to
>    point at the placeholder address_space.
> 3) Rather than testing PG_foreign, test page->mapping == &placeholder.
> 4) Somehow move all of the Xen-specific bits which currently use
>    ->mapping to use ->private instead.
>
> Then we wouldn't need the page bit.  It's not even that much of an
> abuse; foreign memory is arguably a special kind of address space, so
> creating a struct address_space for it isn't insane.
>   

Yes, that's an interesting idea.  There are a few instances of
non-usermode-visible filesystems for things like this, so there's
precedent.  (And maybe making it user-visible would be a clean way to
map foreign pages into usermode...).

But it still doesn't give us a callback to reclaim the page once its
done.  We could elevate the refcount to prevent the page from ever being
released, but we'd still have to go out an manually search for them
rather than getting proper notifications.

>> The approach I'm trying at the moment is to use the skb destructor
>> mechanism to grab the pages out of the skb as its freed.  To deal with
>> skb_clone, I'm adding a flag to the skb to force a clone to do a
>> complete copy so there are no more aliases to the pages (skb_clone
>> should be rare in the common case).
>>     
> Yeah, that would work.  There needs to be some way for netback to get
> grant references and so forth related to netchannel2-mapped pages, and
> vice versa, but that shouldn't be too hard.
>   

Large numbers of the struct page fields would be available for borrowing.

> Yes, that's true, the cleanup bit is much easier for block requests,
> but you still potentially have a forwarding issue.  There are a couple
> of potentially problematic scenarios:
>
> 1) You might have nested block devices.  Suppose you have three
> domains (domA, domB, and domC), and a physical block device sdX in
> domA.  DomA could then be configured to run a blkback exposing sdX to
> domB as xvdY.  DomB might then itself run a blkback exposing xvdY to
> domC as xvdZ.  This won't work.  Requests issued by domC will be
> mapped by domB's blkback and injected into its local storage stack,
> and will eventually reach domB's xvdY blkfront.  This will try to
> grant domA access to the relevant memory, but, because it doesn't know
> about foreign mappings, it'll grant as if the memory was owned by
> domB.  Xen will then reject domA's attempts to map these domB grants,
> and every request on xvdZ will fail.
>
> Admittedly, that'd be a rather stupid configuration, but it's not
> currently blocked by the tools (and it'd be rather difficult to block,
> even if we wanted to).
>   

It's not completely outlandish, but its a bit unfortunate that the
control-plane operations also require the page-data to be mapped in the
intermediate domain.  It would be nice to have some kind of bypass mode
so that the data can be mapped directly between A and C.

> 2) I've not actually checked this, but I suspect we have problem if
> you're running an iSCSI initiator in dom0 against a target running in
> a domU, and then try to expose the SCSI device in dom0 as a block
> device in some other domU.  When requests come in from the blkfront,
> the dom0 blkback will map them as foreign pages, and then pass them
> off to the iSCSI initiator.  It would make sense for the pages in the
> block request to get attached to the skb as fragment pages, rather
> than copied.  When the skb eventually reaches netback, netback will
> try to do a grant copy into the receiving netfront's buffers (because
> PG_foreign isn't set), which will fail, because dom0 doesn't actually
> own the pages.
>
> As I say, I've not actually checked whether that's how the initiators
> work, but it would be a sane implementation if you're talking to a NIC
> with jumbogram support.
>
>
>
> Thinking some more, there's another variant of this bug which doesn't
> involve block devices at all: bridging between a netfront and a
> netback.  If you have a single bridge with both netfront and netback
> devices attached to it, and you're not in ALWAYS_COPY_SKB mode,
> forwarding packets from the netback interface to the netfront one
> won't work.  Packets received by netback will be foreign mappings, but
> netfront doesn't know that, so when it sends packets to the backend
> it'll set up grants as if they were in local memory, which won't work.
> I'm not sure what the right fix for that is; probably just copying
> the packet in netfront.
>   

Yeah.  More instances of getting the data mixed up in control
operations.  If the domain is acting as an intermediate between two
other domains, it doesn't need to see or touch the data at all.  Net is
a bit more complex than block because the header is more closely mixed
up with the payload, but just copying the headers and passing
through-mapping references for the pages should do the trick.

And I think your AS idea works for distinguishing
transient-and-perhaps-not-present pages quite well.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel