Dependence-Based Code Generation for a CELL Processor

Zhao, Yuan; Kennedy, Ken

doi:10.1007/978-3-540-72521-3_6

Yuan Zhao¹ &
Ken Kennedy¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4382))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

586 Accesses

Abstract

Obtaining high performance on the STI CELL processor requires substantial programming effort because its architectural features must be explicitly managed, with separate codes required for two different types of cores (PPE and SPE). Research at IBM has developed a single source-image compiler for CELL that performs vectorization but uses OpenMP to specify cross-core parallelism. In this paper, we present and evaluate an alternative dependence-based compiler approach that automatically generates parallel and vector code for CELL from a single source program with no parallelism directives. In contrast to OpenMP, our approach can also handle loop nests that carry dependences. To preserve correct program semantics, we employ on-chip communication mechanisms to implement barrier and unidirectional synchronization primitives. We also implement strategies to boost performance by managing DMA data movement, improving data alignment, and exploiting memory reuse in the innermost loop.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

Article 13 May 2019

Unification of Static and Dynamic Analyses to Enable Vectorization

Automated Compiler Optimization of Multiple Vector Loads/Stores

Article 09 January 2017

References

Allen, J.R.: Dependence Analysis for Subscripted Variables and its Application to Program Transformation. PhD thesis, Rice University, Houston, Texas (1983)
Google Scholar
Allen, R., Callahan, D., Kennedy, K.: Automatic decomposition of scientific programs for parallel execution. In: POPL ’87: Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, Munich, West Germany, ACM Press, New York (1987)
Google Scholar
Allen, R., Kennedy, K.: Vector register allocation. IEEE Transactions on Computers 41(10), 1290–1317 (1992)
Article Google Scholar
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Bik, A.J.C., et al.: Automatic intra-register vectorization for the intel architecture. International Journal of Parallel Programming 30(2), 65–98 (2002)
Article MATH Google Scholar
Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, April (1991)
Google Scholar
Carr, S., Kennedy, K.: Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems 15(3), 400–462 (1994)
Google Scholar
Crescent Bay Software. VAST/AltiVec. http://www.crescentbaysoftware.com/vast_altivec.html
Eichenberger, A.E., et al.: Optimizing compiler for a cell processor. In: PACT (2005)
Google Scholar
Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. In: PLDI’04, Washington DC, USA, June (2004)
Google Scholar
Feldman, S.I., et al.: A fortran-to-C converter. Technical Report 149, AT&T Bell Laboratories, Murray Hill, NJ (1990)
Google Scholar
Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: ASPLOS-IV: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, United States, April (1991)
Google Scholar
Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: PLDI (2000)
Google Scholar
Mowry, T.C.: Tolerating Latency Through Software-Controlled Data Prefetching. PhD thesis, Standford University, California (1994)
Google Scholar
Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO ’06: Proceedings of the International Symposium on Code Generation and Optimization, Washington, DC, USA (2006)
Google Scholar
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: PLDI, Ottawa, Ontario, Canada (2006)
Google Scholar
Shin, J., Chame, J., Hall, M.W.: Compiler-controlled caching in superword register files for multimeida extension architecture. In: PACT (2002)
Google Scholar
Temam, O., Granston, E.D., Jalby, W.: To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Supercomputing ’93: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, Portland, Oregon, United States, November 1993, IEEE Computer Society Press, Los Alamitos (1993)
Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1), 3–25 (2001)
Article MATH Google Scholar
Yi, Q.: Applying data copy to improve memory performance of general array computations. In: Ayguadé, E., et al. (eds.) LCPC 2005. LNCS, vol. 4339, Springer, Heidelberg (2006)
Chapter Google Scholar
Zhao, Y., Kennedy, K.: Scalarization on short vector machines. In: 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, Texas, March 20–22, 2005, IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Rice University, Houston, TX, USA
Yuan Zhao & Ken Kennedy

Authors

Yuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ken Kennedy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

George Almási Călin Caşcaval Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Y., Kennedy, K. (2007). Dependence-Based Code Generation for a CELL Processor. In: Almási, G., Caşcaval, C., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2006. Lecture Notes in Computer Science, vol 4382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72521-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-72521-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72520-6
Online ISBN: 978-3-540-72521-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics