Abstract
Non-uniform sampling two-dimensional convolution (NUSC for short) is a practical method in the field of 2D space image processing. NUSC maps sampling data of non-uniform distribution to a regular output grid through convolution. The growth rate of such data volume continues to increase, and the computational performance of NUSC is one of the key issues to be solved. Heterogeneous computing platforms provide advanced computing capabilities for accelerating NUSC performance. But heterogeneous programming and performance tuning are complex. A simple and efficient dedicated programming model and corresponding runtime framework can effectively solve such a problem.
This paper proposes a parallel programming model and framework for the development of NUSC applications in heterogeneous computing environments, named EasyNUSC. When developing NUSC applications, EasyNUSC can automatically parallelize NUSC applications and perform tedious work. Developers no longer need to pay attention to the details of algorithm parallelization and task scheduling. In terms of performance optimization, this paper proposes a series of strategies in vectorization, memory access, and data reuse. The experimental data shows that EasyNUSC achieves up to 339 times the performance of a serial program within a single node, while providing excellent scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aldinucci, M., et al.: Practical parallelization of scientific applications with OpenMP, OpenACC and MPI. J. Parallel Distrib. Comput. 157, 13–29 (2021)
Dave, C.P., Joshi, R., Srivastava, S.S.: Article: a survey on geometric correction of satellite imagery. Int. J. Comput. Appl. 116(12), 24–27 (2015)
Feldmann, J., et al.: Publisher correction: parallel convolutional processing using an integrated photonic tensor core. Nature 591(7849) (2021)
Georganas, E., et al.: Anatomy of high-performance deep learning convolutions on simd architectures. In: SC 2018 (2018)
Gu, Z., et al.: Ce-net: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019)
Hassan, S.A., Mahmoud, M.M., Hemeida, A., Saber, M.A.: Effective implementation of matrix-vector multiplication on intel’s AVX multicore processor. Comput. Lang. Syst. Struct. 51, 158–175 (2018)
Jordà, M., Valero-Lara, P., Peña, A.J.: cuConv: Cuda implementation of convolution for CNN inference. Clust. Comput. 25(2), 1459–1473 (2022)
Li, Y., He, L., Ye, X., Guo, D.: Geometric correction algorithm of UAV remote sensing image for the emergency disaster. In: 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 6691–6694 (2016)
Liao, X.K., Yang, C.Q., Yi, T.T.H.Z., Wang, F., Wu, Q.: Jingling: OpenMC: towards simplifying programming for tianhe supercomputers. J. Comput. Sci. Technol. 29(3), 532 (2014)
Luo, Q., et al.: HyGrid: a CPU-GPU hybrid convolution-based gridding algorithm in radio astronomy. In: Vaidya, J., Li, J. (eds.) ICA3PP 2018. LNCS, vol. 11334, pp. 621–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05051-1_43
Martinez, A., Gelb, A., Gutierrez, A.: Edge detection from non-uniform fourier data using the convolutional gridding algorithm. J. Sci. Comput. 61(3), 490–512 (2014)
Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47(4) (2015)
Paz, A., Plaza, A.: GPU implementation of target and anomaly detection algorithms for remotely sensed hyperspectral image analysis. In: Huang, B., Plaza, A.J., Serra-Sagristà, J., Lee, C., Li, Y., Qian, S.E. (eds.) Satellite Data Compression, Communications, and Processing VI. Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 7810, p. 78100R, August 2010
Petrovič, F., et al.: A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with kernel tuning toolkit. Futur. Gener. Comput. Syst. 108, 161–177 (2020)
Tang, S., et al.: Easypdp: an efficient parallel dynamic programming runtime system for computational biology. IEEE Trans. Parallel Distrib. Syst. 23(5), 862–872 (2012)
Wang, C., Yu, C., Sun, J., Meng, X.: Dpx10: an efficient x10 framework for dynamic programming applications. In: 2015 44th International Conference on Parallel Processing, pp. 869–878 (2015)
Wang, H., Yu, C., Zhang, B., Xiao, J., Luo, Q.: HCGrid: a convolution-based gridding framework for radio astronomy in hybrid computing environments. MNRAS 501(2), 2734–2744 (2021)
Winkel, B., Lenz, D., Flöer, L.: Cygrid: a fast Cython-powered convolution-based gridding module for python. A &A 591, A12 (2016)
Xie, G., Zhang, Y.l.: A few of the most popular models for heterogeneous parallel programming. In: 2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES), pp. 15–18 (2017)
Zhang, J.-Y., Guo, Y., Hu, X.: Parallel computing method for two-dimensional matrix convolution. J. ZheJiang Univ. (Eng. Sci.) 52(3), 515 (2018)
Acknowledgments
The authors would like to thank all those who have helped to improve this paper. This work is supported by the National Natural Science Foundation of China (61972277).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, Y. et al. (2023). EasyNUSC: An Efficient Heterogeneous Computing Framework for Non-uniform Sampling Two-Dimensional Convolution Applications. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-22677-9_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)