Abstract
We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix–matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather–scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.





Similar content being viewed by others
References
Otten M, Gong J, Mametjanov A, Vose A, Levesque J, Fischer P, Min M (2015) An MPI/OpenACC implementation of a high order electromagnetics solver with GPUDirect communication. In: Int J High Perform Comput Appl (accepted)
Jespersen DC (2010) Acceleration of a CFD code with a GPU. Sci Program 18(3–4):193–201
Hoshino T, Maruyama N, Matsuoka S, Takaki R (2013) CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application. In: The proceeding of 13th IEEE/ACM international symposium on cluster, cloud, and grid computing, Delft, The Netherlands
Kraus J, Schlottke M, Adinetz A, Pleiter D (2014) Accelerating a C++ CFD code with OpenACC. In: The proceedings of the first workshop on accelerator programming using directives SC14, LA, USA, pp 47–54
Xia Y, Luo H, Luo L, Edwards J, Lou J (2015) OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows. Int J Numer Meth Fluids 78(3):123–139
Niemeyer K, Sung C (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528–564
Fischer P, Lottes JW, Kerkemeier SG Nek5000 web page. http://nek5000.mcs.anl.gov
Fischer P, Lottes JW (2004) Hybrid Schwarz-multigrid methods for the spectral element method: extensions to Navier–Stokes. In: Kornhuber R, Hoppe R, Périaux J, Pironneau O, Widlund O, Xu J (eds) Domain decomposition methods in science and engineering series. Springer, Berlin
Lottes JW, Fischer P (2005) Hybrid multigrid/Schwarz algorithms for the spectral element method. J Sci Comput 24:45–78
Fischer P, Lottes J, Pointer WD, Siegel A (2008) Petascale algorithms for reactor hydrodynamics. J Phys Conf Ser 125:012076
Tufo HM, Fishcer P (2001) Fast parallel direct solvers for coarse-grid problems. J Parall Distrib Comput 61:151–177
Deville M, Fischer P, Mund E (2002) High-order methods for incompressible fluid flow. Cambridge University Press, Cambridge
Markidis S, Gong J, Schliephake M, Laure E, Hart A, Henty D, Heisey K, Fischer P (2015) OpenACC acceleration of the Nek5000 spectral element code. Int J High Perform Comput Appl 29:311–319
Gong J, Markidis S, Schliephake M, Laure E, Henningson D, Schlatter P, Peplinski A, Hart A, Doleschal J, Henty D, Fischer P (2015) Nek5000 with OpenACC. In: Markidis S, Laure E (eds) Solving Software Challenges for Exascale, the International Conference on Exascale Applications and Software, EASC 2014 Stockholm, Sweden, April 20–23, 2014. Springer, Berlin, LNCS8759
Acknowledgments
This material is based upon work supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357, and partially supported by the Swedish e-Science Research Centre (SeRC). This research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725. The research also used computing resources of the French Alternative Energies and Atomic Energy Commission (CEA) in France via the Partnership for Advanced Computing in Europe (PRACE).
Author information
Authors and Affiliations
Corresponding author
Additional information
COST IC1305.
Rights and permissions
About this article
Cite this article
Gong, J., Markidis, S., Laure, E. et al. Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations. J Supercomput 72, 4160–4180 (2016). https://doi.org/10.1007/s11227-016-1744-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1744-5