This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[XenPPC] Re: copy_4K_page() doesn't use dcbtst?

To: Paul Mackerras <paulus@xxxxxxxxx>
Subject: [XenPPC] Re: copy_4K_page() doesn't use dcbtst?
From: Hollis Blanchard <hollisb@xxxxxxxxxx>
Date: Mon, 28 Aug 2006 21:11:53 -0500
Cc: linuxppc-dev <linuxppc-dev@xxxxxxxxxx>, xen-ppc-devel <xen-ppc-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 28 Aug 2006 19:12:53 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <17651.34629.132793.190742@xxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-ppc-devel-request@lists.xensource.com?subject=help>
List-id: Xen PPC development <xen-ppc-devel.lists.xensource.com>
List-post: <mailto:xen-ppc-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ppc-devel>, <mailto:xen-ppc-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ppc-devel>, <mailto:xen-ppc-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: IBM Linux Technology Center
References: <1156786523.28490.52.camel@xxxxxxxxxxxxxxxxxxxxx> <17651.34629.132793.190742@xxxxxxxxxxxxxxxxxxxx>
Sender: xen-ppc-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Tue, 2006-08-29 at 10:16 +1000, Paul Mackerras wrote:
> Hollis Blanchard writes:
> > Hi Paul, some Xen people were just noticing that copy_4K_page
> > (arch/powerpc/lib/copypage_64.S) doesn't use the dcbtst instruction. Why
> > doesn't it help there?
> Why would we want to read the cache lines for the destination from
> memory when we're only going to overwrite them completely anyway?
> A stronger argument would be for using dcbz, but IIRC it actually made
> things slower (on POWER4 at least).  I suspect the hardware is
> gathering the stores for the whole of each cache line automatically,
> so using dcbz doesn't provide any benefit.

Yes, dcbz makes more sense.

> I did a lot of measurements of memory copy speed on POWER4 (using
> different copy loops, copy sizes, alignments, cache hot/cold cases)
> and the copy_4K_page loop is the fastest I could come up with for
> POWER4.  If anyone can come up with a routine that is measurably
> faster on current machines, I'm happy to look at it, of course.

I figured you had done measurements; we were just curious about the
unexpected results. Thanks!

Hollis Blanchard
IBM Linux Technology Center

Xen-ppc-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>