Skip to main content

OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms

  • Conference paper
  • First Online:
OpenMP: Memory, Devices, and Tasks (IWOMP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

Abstract

We investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelize the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. A large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We have explored various sub-chunk sizes but found that twice the optimal Haswell value, i.e. 128 vectors, yield the best performance.

References

  1. Meng, Z., Merkurjev, E., Koniges, A., Bertozzi, A.L.: Hyperspectral Video Analysis Using Graph Clustering Methods. Image Processing On Line, submitted

    Google Scholar 

  2. Stoer, M., Wagner, F.: A simple min-cut algorithm. J. ACM (JACM) 44(4), 585–591 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  3. Szlam, A., Bresson, X.: A total variation-based graph clustering algorithm for cheeger ratio cuts. UCLA CAM Report, pp. 09–68 (2009)

    Google Scholar 

  4. Bertozzi, A.L., Flenner, A.: Diffuse interface models on graphs for classification of high dimensional data. SIAM Rev. 58(2), 293–328 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chung, F.: Spectral Graph Theory, vol. 92. American Mathematical Society, Providence (1997)

    MATH  Google Scholar 

  6. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  7. Van Gennip, Y., Bertozzi, A.L.: \( Gamma \)-convergence of graph Ginzburg-Landau functionals. Adv. Differ. Equ. 17(11/12), 1115–1180 (2012)

    MathSciNet  MATH  Google Scholar 

  8. Bertozzi, A.L., Flenner, A.: Diffuse interface models on graphs for classification of high dimensional data. Multiscale Model. Simul. 10(3), 1090–1118 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  9. Luo, X., Bertozzi, A.L.: Convergence analysis of the graph Allen-Cahn scheme. Preprint

    Google Scholar 

  10. Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)

    Article  Google Scholar 

  11. Merkurjev, E., Kostic, T., Bertozzi, A.L.: An MBO scheme on graphs for classification and image processing. SIAM J. Imaging Sci. 6(4), 1903–1930 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. Merkurjev, E., Bae, E., Bertozzi, A.L., Tai, X.C.: Global binary optimization on graphs for classification of high-dimensional data. J. Math. Imaging Vis. 52(3), 414–435

    Google Scholar 

  13. Hu, H., Sunu, J., Bertozzi, A.L.: Multi-class graph Mumford-Shah model for plume detection using the MBO scheme. In: Tai, X.-C., Bae, E., Chan, T.F., Lysaker, M. (eds.) EMMCVPR 2015. LNCS, vol. 8932, pp. 209–222. Springer, Heidelberg (2015)

    Google Scholar 

  14. Kuang, D., Gittens, A., Hamid, R.: Hardware compliant approximate image codes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  15. Demmel, J.W.: Applied Numerical Linear Algebra. Siam, Philadelphia (1997)

    Book  MATH  Google Scholar 

  16. Broadwater, J.B., Limsui, D., Carr, A.K.: A primer for chemical plume detection using LWIR sensors. Technical Paper, National Security Technology Department, Las Vegas, NV (2011)

    Google Scholar 

  17. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  18. Rooine Toolkit: https://bitbucket.org/berkeleylab/cs-roofline-toolkit

  19. Intel Software Development Emulator. https://software.intel.com/en-us/articles/intel-software-development-emulator

  20. Doerfler, D.: Understanding Application Data Movement Characteristics using Intel VTune Amplifier and Software Development Emulator tools, Intel Xeon Phi User Group (IXPUG) (2015)

    Google Scholar 

  21. Intel VTune Official Website. https://software.intel.com/en-us/intel-vtune-amplifier-xe

  22. Cori Website: https://www.nersc.gov/users/computational-systems/cori

Download references

Acknowledgments

This work was supported by NSF grants DMS-1417674 and DMS-1045536 and AFOSR MURI grant FA9550-10-1-0569. We would like to thank Dr. Da Kuang for his suggestions on optimizing the serial codes. This work was also supported by U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhaoyi Meng or Alice Koniges .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Meng, Z. et al. (2016). OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45550-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45549-5

  • Online ISBN: 978-3-319-45550-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics