This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[XenPPC] Re: copy_4K_page() doesn't use dcbtst?

To: Hollis Blanchard <hollisb@xxxxxxxxxx>
Subject: [XenPPC] Re: copy_4K_page() doesn't use dcbtst?
From: Paul Mackerras <paulus@xxxxxxxxx>
Date: Tue, 29 Aug 2006 10:16:05 +1000
Cc: linuxppc-dev <linuxppc-dev@xxxxxxxxxx>, xen-ppc-devel <xen-ppc-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 28 Aug 2006 18:20:59 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <1156786523.28490.52.camel@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-ppc-devel-request@lists.xensource.com?subject=help>
List-id: Xen PPC development <xen-ppc-devel.lists.xensource.com>
List-post: <mailto:xen-ppc-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ppc-devel>, <mailto:xen-ppc-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ppc-devel>, <mailto:xen-ppc-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1156786523.28490.52.camel@xxxxxxxxxxxxxxxxxxxxx>
Sender: xen-ppc-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hollis Blanchard writes:

> Hi Paul, some Xen people were just noticing that copy_4K_page
> (arch/powerpc/lib/copypage_64.S) doesn't use the dcbtst instruction. Why
> doesn't it help there?

Why would we want to read the cache lines for the destination from
memory when we're only going to overwrite them completely anyway?

A stronger argument would be for using dcbz, but IIRC it actually made
things slower (on POWER4 at least).  I suspect the hardware is
gathering the stores for the whole of each cache line automatically,
so using dcbz doesn't provide any benefit.

I did a lot of measurements of memory copy speed on POWER4 (using
different copy loops, copy sizes, alignments, cache hot/cold cases)
and the copy_4K_page loop is the fastest I could come up with for
POWER4.  If anyone can come up with a routine that is measurably
faster on current machines, I'm happy to look at it, of course.


Xen-ppc-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>