WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ppc-devel

Re: [XenPPC] copy_page speedup using dcbz on target

On Sat, 16 Dec 2006 11:34, Jimi Xenidis wrote:

> If you really want to explore mem/page copy for XenPPC then you have  
> to understand that since we run without an MMU, profiling code with  
> MMU on, _including_ RMA, is not helpful because the access is guarded ... 

> Please run your experiments _in_ Xen ...

Timing code has been included in Xen, setup.c; 
however, results match prior timings in userspace:

JS20:
elapsed time: 0x000000000000a8f5
elapsed time using dcbz: 0x0000000000005410

elapsed time: 0x000000000000a987
elapsed time using dcbz: 0x0000000000005361


JS21:
elapsed time: 0x0000000000000862
elapsed time using dcbz: 0x0000000000000420

elapsed time: 0x0000000000000859
elapsed time using dcbz: 0x0000000000000424

...............................................

> You will probably find that grouping (as Hollis suggests) by cache  
> line will be much better. but also prefetch the next line somehow.

Somewhat better... (following observations were made running in user space)
The unrolling the copy loop (by cache line) improves performance a few percent.
(did not record the time; also unrolled loop still used same number of registers
and no touching)

However, including dcbz at beginning of loop slowed things down. Perhaps need to
dcbz a couple lines ahead of usage?

_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel