|
|
|
|
|
|
|
|
|
|
xen-devel
RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
> -----Original Message-----
> From: Emmanuel Ackaouy [mailto:ack@xxxxxxxxxxxxx]
> Sent: Wednesday, December 06, 2006 2:42 AM
> To: Santos, Jose Renato G
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Turner, Yoshio; Jose
> Renato Santos; G John Janakiraman
> Subject: Re: [Xen-devel] [PATCH] Reduce overhead in
> find_domain_by_id() [0/2]
>
> I also spotted find_domain_by_id() showing up rather high in
> network intensive workloads. The CPU overhead of our network
> I/O path is pretty large so it's worth trying to address and
> if I remember, that one was oddly rather high on the list of
> low hanging fruits.
>
> Find_domain_by_id() is called from __gnttab_map_grant_ref()
> which is typically called N times on an array of grant ops
> from gnttab_map_grant_ref(). Perhaps we could find a way to
> optimize the common case here and only lookup and hold the
> domain once per OP array instead of once per op in the multi op?
>
> We could also cleanup some code while there:
>
> if ( unlikely((rd = find_domain_by_id(op->dom)) == NULL) )
> {
> vvvvvvvvvvvvvvvvvvvvvvvv
> if ( rd != NULL )
> put_domain(rd);
> ^^^^^^^^^^^^^^^^^^^^^^^^ WTF???
> DPRINTK("Could not find domain %d\n", op->dom);
> op->status = GNTST_bad_domain;
> return;
> }
>
> It's a bit puzzling to me that grabbing the lock adds such an
> overhead. Is this purely a lock operation overhead or is
> there contention on the lock cache line (could find this out
> by profiling for data cache line misses)?
>
Yes, this is due to cache contention on the lock.
There is also cache contention on the domain refcnt used by
get_domain().
I just implemented a percpu version of the reference count
that avoids cache contention and the cost of
find_domain_by_id() is reduced either further.
Currently, find_domain_by_id() consumes approximately 3.05% of
the total CPU cycles for a TCP TX micro benchmark. With the RCU
scheme this is reduced to 1.16%. And with a per cpu reference
count mechanism, this is reduced to 0.31%.
I can submit a patch for the percpu reference count after
I clean up the code a bit.
Regards
Renato
> Cheers,
> Emmanuel.
>
> On Tue, Dec 05, 2006 at 07:35:37PM -0600, Santos, Jose Renato G wrote:
> >
> > This is a set of patches to improve performance of
> find_domain_by_id().
> > find_domain_by_id shows up high in profiles for network I/O
> intensive
> > workloads.
> > Most of the cost for this function comes from 3 main functions (of
> > aproximate equal costs): 1)read_lock(), 2)read_unlock() and
> > 3)get_domain().
> > These patches replace the lock used for accessing domain_list and
> > domain_hash with a lock free RCU scheme. Experiments
> confirm that the
> > cost of find_domain_by_id() is in fact reduced by 2/3.
> > The patches apply cleanly to changeset 12732.
> >
> > Renato
> >
> > Patches:
> > 1/2 - Import linux RCU code into Xen
> > 2/2 - replace domlist_lock operations by RCU operations
> >
> > Signed-off-by: Jose Renato Santos <jsantos@xxxxxxxxxx>
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|