OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms

Meng, Zhaoyi; Koniges, Alice; He, Yun (Helen); Williams, Samuel; Kurth, Thorsten; Cook, Brandon; Deslippe, Jack; Bertozzi, Andrea L.

doi:10.1007/978-3-319-45550-1_2

Zhaoyi Meng^16,17,
Alice Koniges¹⁷,
Yun (Helen) He¹⁷,
Samuel Williams¹⁷,
Thorsten Kurth¹⁷,
Brandon Cook¹⁷,
Jack Deslippe¹⁷ &
…
Andrea L. Bertozzi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

International Workshop on OpenMP

1359 Accesses
4 Citations

Abstract

We investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelize the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. A large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Convergent Parallel Algorithms for Big Data Optimization Problems

SCIP-Jack—a solver for STP and variants with parallelization extensions

Article 02 December 2016

Large-scale distributed L-BFGS

Article Open access 17 July 2017

Notes

1.
We have explored various sub-chunk sizes but found that twice the optimal Haswell value, i.e. 128 vectors, yield the best performance.

References

Meng, Z., Merkurjev, E., Koniges, A., Bertozzi, A.L.: Hyperspectral Video Analysis Using Graph Clustering Methods. Image Processing On Line, submitted
Google Scholar
Stoer, M., Wagner, F.: A simple min-cut algorithm. J. ACM (JACM) 44(4), 585–591 (1997)
Article MathSciNet MATH Google Scholar
Szlam, A., Bresson, X.: A total variation-based graph clustering algorithm for cheeger ratio cuts. UCLA CAM Report, pp. 09–68 (2009)
Google Scholar
Bertozzi, A.L., Flenner, A.: Diffuse interface models on graphs for classification of high dimensional data. SIAM Rev. 58(2), 293–328 (2016)
Article MathSciNet MATH Google Scholar
Chung, F.: Spectral Graph Theory, vol. 92. American Mathematical Society, Providence (1997)
MATH Google Scholar
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
Van Gennip, Y., Bertozzi, A.L.: $ Gamma $-convergence of graph Ginzburg-Landau functionals. Adv. Differ. Equ. 17(11/12), 1115–1180 (2012)
MathSciNet MATH Google Scholar
Bertozzi, A.L., Flenner, A.: Diffuse interface models on graphs for classification of high dimensional data. Multiscale Model. Simul. 10(3), 1090–1118 (2012)
Article MathSciNet MATH Google Scholar
Luo, X., Bertozzi, A.L.: Convergence analysis of the graph Allen-Cahn scheme. Preprint
Google Scholar
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)
Article Google Scholar
Merkurjev, E., Kostic, T., Bertozzi, A.L.: An MBO scheme on graphs for classification and image processing. SIAM J. Imaging Sci. 6(4), 1903–1930 (2013)
Article MathSciNet MATH Google Scholar
Merkurjev, E., Bae, E., Bertozzi, A.L., Tai, X.C.: Global binary optimization on graphs for classification of high-dimensional data. J. Math. Imaging Vis. 52(3), 414–435
Google Scholar
Hu, H., Sunu, J., Bertozzi, A.L.: Multi-class graph Mumford-Shah model for plume detection using the MBO scheme. In: Tai, X.-C., Bae, E., Chan, T.F., Lysaker, M. (eds.) EMMCVPR 2015. LNCS, vol. 8932, pp. 209–222. Springer, Heidelberg (2015)
Google Scholar
Kuang, D., Gittens, A., Hamid, R.: Hardware compliant approximate image codes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Demmel, J.W.: Applied Numerical Linear Algebra. Siam, Philadelphia (1997)
Book MATH Google Scholar
Broadwater, J.B., Limsui, D., Carr, A.K.: A primer for chemical plume detection using LWIR sensors. Technical Paper, National Security Technology Department, Las Vegas, NV (2011)
Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Rooine Toolkit: https://bitbucket.org/berkeleylab/cs-roofline-toolkit
Intel Software Development Emulator. https://software.intel.com/en-us/articles/intel-software-development-emulator
Doerfler, D.: Understanding Application Data Movement Characteristics using Intel VTune Amplifier and Software Development Emulator tools, Intel Xeon Phi User Group (IXPUG) (2015)
Google Scholar
Intel VTune Official Website. https://software.intel.com/en-us/intel-vtune-amplifier-xe
Cori Website: https://www.nersc.gov/users/computational-systems/cori

Download references

Acknowledgments

This work was supported by NSF grants DMS-1417674 and DMS-1045536 and AFOSR MURI grant FA9550-10-1-0569. We would like to thank Dr. Da Kuang for his suggestions on optimizing the serial codes. This work was also supported by U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

University of California, Los Angeles, USA
Zhaoyi Meng & Andrea L. Bertozzi
Lawrence Berkeley National Laboratory, Berkeley, USA
Zhaoyi Meng, Alice Koniges, Yun (Helen) He, Samuel Williams, Thorsten Kurth, Brandon Cook & Jack Deslippe

Authors

Zhaoyi Meng
View author publications
You can also search for this author in PubMed Google Scholar
Alice Koniges
View author publications
You can also search for this author in PubMed Google Scholar
Yun (Helen) He
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Williams
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Kurth
View author publications
You can also search for this author in PubMed Google Scholar
Brandon Cook
View author publications
You can also search for this author in PubMed Google Scholar
Jack Deslippe
View author publications
You can also search for this author in PubMed Google Scholar
Andrea L. Bertozzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhaoyi Meng or Alice Koniges .

Editor information

Editors and Affiliations

RIKEN AICS , Kobe, Japan
Naoya Maruyama
Lawrence Livermore National Laboratory , Livermore, California, USA
Bronis R. de Supinski
RIKEN AICS , Kobe, Japan
Mohamed Wahib

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, Z. et al. (2016). OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-45550-1_2
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics