WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ppc-devel

Re: [XenPPC] copy_page speedup using dcbz on target

If you really want to explore mem/page copy for XenPPC then you have to understand that since we run without an MMU, profiling code with MMU on, _including_ RMA, is not helpful because the access is guarded (G=1, I=0). For more information see 970FX UM Sections:
  6.3.8.4 Loads in Real Mode
  6.3.9.4 Stores in Real Mode

You will probably find that grouping (as Hollis suggests) by cache line will be much better. but also prefetch the next line somehow.

Please run your experiments _in_ Xen,and use timebase (ticks) or NOW () (nanosecs) to model it.

On Dec 15, 2006, at 6:31 PM, Hollis Blanchard wrote:

On Fri, 2006-12-15 at 17:50 -0500, poff wrote:
3) Useful when PPC must do page copies in place of 'page flipping'.

So you're saying we should worry about it later?


For the future, copy_page using dcbz:

diff -r 7669fca80bfc xen/arch/powerpc/mm.c
--- a/xen/arch/powerpc/mm.c     Mon Dec 04 11:46:53 2006 -0500
+++ b/xen/arch/powerpc/mm.c     Fri Dec 15 17:52:58 2006 -0500
@@ -280,7 +280,8 @@ extern void copy_page(void *dp, void *sp
     if (on_systemsim()) {
         systemsim_memcpy(dp, sp, PAGE_SIZE);
     } else {
-        memcpy(dp, sp, PAGE_SIZE);
+       clear_page(dp);
+       __copy_page(dp, sp);
     }
 }

diff -r 7669fca80bfc xen/include/asm-powerpc/page.h
--- a/xen/include/asm-powerpc/page.h    Mon Dec 04 11:46:53 2006 -0500
+++ b/xen/include/asm-powerpc/page.h    Fri Dec 15 17:52:58 2006 -0500
@@ -90,6 +90,25 @@ 1:  dcbz    0,%0\n\

 extern void copy_page(void *dp, void *sp);

+static __inline__ void __copy_page(void *dp, void *sp)
+{
+       ulong dwords, dword_size;
+
+       dword_size = 8;
+       dwords = (PAGE_SIZE / dword_size) - 1;
+
+       __asm__ __volatile__(
+       "mtctr     %2      # copy_page\n\
+       ld      %2,0(%1)\n\
+       std     %2,0(%0)\n\
+1:     ldu     %2,8(%1)\n\
+       stdu    %2,8(%0)\n\
+       bdnz    1b"
+       : /* no result */
+       : "r" (dp), "r" (sp), "r" (dwords)
+       : "%ctr", "memory");
+}
+

I'd rather have copy_page() dcbz; stdu; stdu; stdu; ... stdu; in each
loop iteration.

It would also be nice to improve memcpy, though that one is certainly
more difficult due to alignment, varying lengths, etc.

Out current memcpy() comes from memcpy.S which is straight from linux, its not the best, but prolly good enuff.


Perhaps we can
borrow code from
http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html

This tunes for usermode. I don't think its performance is relevant.
-JX


_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel