Applying Data Copy to Improve Memory Performance of General Array Computations

Yi, Qing

doi:10.1007/978-3-540-69330-7_7

Qing Yi²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4339))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

543 Accesses

Abstract

Data copy is an important compiler optimization which dynamically rearranges the layout of arrays by copying their elements into local buffers. Traditionally, array copy is considered expensive and has been applied only to the working sets of fully blocked computations. This paper presents an algorithm which automatically applies data copy to optimize the performance of general computations independent of blocking. The algorithm automatically decides where to insert copy operations and which regions of arrays to copy. In addition, when specialized, it is equivalent to a general scalar replacement algorithm on arbitrary array computations. The algorithm is fully implemented and has been applied to optimize several scientific kernels. The results show that the algorithm is highly effective and that data copy can significantly improve the performance of scientific computations, both when combined with blocking and when applied alone without blocking.

The work was developed when the author was under employment by Lawrence Livermore National Laboratory, Livermore, CA, 94550.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Locality-Based Optimizations in the Chapel Compiler

Automatic Data Layout Optimizations for GPUs

Data Layout Optimization for Portable Performance

References

Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. The Society for Industrial and Applied Mathematics (1999)
Google Scholar
Anderson, J., Amarasinghe, S., Lam, M.: Data and computation transformation for multiprocessors. In: ACM Symposium on Principles and Practices of Parallel Programming, Santa Barbara (July 1995)
Google Scholar
Banerjee, U.: Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston (1988)
Google Scholar
Carr, S., Kennedy, K.: Scalar replacement in the presence of conditional control flow. Software – Practice and Experience 24(1), 51–77 (1994)
Article Google Scholar
Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, Gorgia (May 1999)
Google Scholar
Han, H., Tseng, C.-W.: Improving locality for adaptive irregular scientific codes. Technical Report CS-TR-4039, Dept. of Computer Science, University of Maryland (September 1999)
Google Scholar
Kennedy, K., McKinley, K.S.: Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University (also available as CRPC-TR94370) (August 1993)
Google Scholar
Lam, M., Rothberg, E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara (April 1991)
Google Scholar
Mellor-Crummy, J., Whalley, D., Kennedy, K.: Improving Memory Hierarchy Performance For Irregular Applications. In: Proceedings of the 13th ACMSIGARCH International Conference on Supercomputing, Phodes, Greece (1999)
Google Scholar
O’Boyle, M., Knijnenburg, P.: Integrating loop and data transformations for global optimisation. In: International Conference on Parallel Architectures and Compilation Techniques, Paris, France (October 1998)
Google Scholar
Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Canada (June 1998)
Google Scholar
Temam, O., Granston, E., Jalby, W.: To copy or not to copy: A compiletime technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing 1993, Portland, OR (November 1993)
Google Scholar
Wolfe, M.J.: Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge (1989)
MATH Google Scholar
Yi, Q., Kennedy, K., Adve, V.: Transforming complex loop nests for locality. The Journal of Supercomputing 27, 219–264 (2004)
Article MATH Google Scholar
Yi, Q., Kennedy, K., You, H., Seymour, K., Dongarra, J.: Automatic blocking of qr and lu factorizations for locality. In: The Second ACM SIGPLAN Workshop on Memory System Performance, Washington, DC, USA (June 2004)
Google Scholar
Yi, Q., Quinlan, D.: Applying loop optimizations to object-oriented abstractions through general classification of array semantics. In: The 17th International Workshop on Languages and Compilers for Parallel Computing, West Lafayette, Indiana, USA (September 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas at San Antonio,
Qing Yi

Authors

Qing Yi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

BSC-UPC,
Eduard Ayguadé
Department of Computer Science, Louisiana State University, 70803, Baton Rouge, LA, USA
Gerald Baumgartner
Dept. of Electrical and Computer Engg., Louisiana State University, Baton Rouge, LA, USA
J. Ramanujam
Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Avenue, 43210, Columbus, OH, USA
P. Sadayappan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yi, Q. (2006). Applying Data Copy to Improve Memory Performance of General Array Computations. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-69330-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69329-1
Online ISBN: 978-3-540-69330-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics