Abstract
This paper presents a tool for repairing errors in GPU kernels written in CUDA or OpenCL due to data races and barrier divergence. Our novel extension to prior work can also remove barriers that are deemed unnecessary for correctness. We implement these ideas in our tool called GPURepair, which uses GPUVerify as the verification oracle for GPU kernels. We also extend GPUVerify to support CUDA Cooperative Groups, allowing GPURepair to perform inter-block synchronization for CUDA kernels. To the best of our knowledge, GPURepair is the only tool that can propose a fix for intra-block data races and barrier divergence errors for both CUDA and OpenCL kernels and the only tool that fixes inter-block data races for CUDA kernels. We perform extensive experiments on about 750 kernels and provide a comparison with prior work. We demonstrate the superiority of GPURepair through its capability to fix more kernels and its unique ability to remove redundant barriers and handle inter-block data races.
The author names are in alphabetical order.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
CUDA Toolkit 5.0. https://developer.nvidia.com/cuda-toolkit-50-archive. Accessed 18 Nov 2020
GPURepair GitHub Repository. https://github.com/cs17resch01003/gpurepair. Accessed 18 Nov 2020
GPURepair VMCAI 2021 Artifacts. https://doi.org/10.5281/zenodo.4276525. Accessed 18 Nov 2020
GPUVerify Test Suite. https://github.com/mc-imperial/gpuverify/tree/master/testsuite. Accessed 18 Nov 2020
Microsoft Azure Fsv2-Series Virtual Machine Sizes. https://docs.microsoft.com/en-us/azure/virtual-machines/fsv2-series. Accessed 18 Nov 2020
VMCAI 2021 Virtual Machine. https://doi.org/10.5281/zenodo.4017292. Accessed 18 Nov 2020
Abdulla, P.A., Atig, M.F., Chen, Y.-F., Leonardsson, C., Rezine, A.: Counter-example guided fence insertion under TSO. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 204–219. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28756-5_15
Amighi, A., Darabi, S., Blom, S., Huisman, M.: Specification and verification of atomic operations in GPGPU programs. In: Calinescu, R., Rumpe, B. (eds.) SEFM 2015. LNCS, vol. 9276, pp. 69–83. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22969-0_5
Anand, S., Polikarpova, N.: Automatic synchronization for GPU kernels. In: FMCAD 2018, pp. 1–9. IEEE (2018)
Betts, A., et al.: The design and implementation of a verification technique for GPU kernels. TOPLAS 37(3), 10:1–10:49 (2015)
Betts, A., Chong, N., Donaldson, A.F., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: OOPSLA 2012, pp. 113–132. ACM (2012)
Bjørner, N., Phan, A.-D., Fleckenstein, L.: vZ - an optimizing SMT solver. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 194–199. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_14
Blom, S., Huisman, M., Mihelcic, M.: Specification and verification of GPGPU programs. Sci. Comput. Program. 95, 376–388 (2014)
Boehm, B.W., Papaccio, P.N.: Understanding and Controlling Software Costs. IEEE Trans. Software Eng. 14(10), 1462–1477 (1988)
Černý, P., Chatterjee, K., Henzinger, T.A., Radhakrishna, A., Singh, R.: Quantitative synthesis for concurrent programs. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 243–259. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_20
Chandra, S., Torlak, E., Barman, S., BodÃk, R.: Angelic debugging. In: ICSE 2011, pp. 121–130. ACM (2011)
Deshmukh, J., Ramalingam, G., Ranganath, V.-P., Vaswani, K.: Logical concurrency control from sequential proofs. In: Gordon, A.D. (ed.) ESOP 2010. LNCS, vol. 6012, pp. 226–245. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11957-6_13
Griesmayer, A., Bloem, R., Cook, B.: Repair of boolean programs with an application to C. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 358–371. Springer, Heidelberg (2006). https://doi.org/10.1007/11817963_33
Jin, G., Song, L., Zhang, W., Lu, S., Liblit, B.: Automated atomicity-violation fixing. In: PLDI 2011, pp. 389–400. ACM (2011)
Jobstmann, B., Griesmayer, A., Bloem, R.: Program repair as a game. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 226–238. Springer, Heidelberg (2005). https://doi.org/10.1007/11513988_23
Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput. Syst Sci. 9(3), 256–278 (1974)
Joshi, S., Kroening, D.: Property-driven fence insertion using reorder bounded model checking. In: Bjørner, N., de Boer, F. (eds.) FM 2015. LNCS, vol. 9109, pp. 291–307. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19249-9_19
Joshi, S., Lal, A.: Automatically finding atomic regions for fixing bugs in Concurrent programs. CoRR abs/1403.1749 (2014)
Joshi, S., Muduganti, G.: GPURepair: Automated Repair of GPU Kernels. https://arxiv.org/abs/2011.08373 (2020)
Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: FSE 2010, pp. 187–196. ACM (2010)
Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: PPOPP 2012, pp. 215–224. ACM (2012)
Malik, M.Z., Siddiqui, J.H., Khurshid, S.: Constraint-based program debugging using data structure repair. In: ICST 2011, pp. 190–199. IEEE Computer Society (2011)
Monteiro, F.R., et al.: ESBMC-GPU a context-bounded model checking tool to verify CUDA programs. Sci. Comput. Program. 152, 63–69 (2018)
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Muzahid, A., Otsuki, N., Torrellas, J.: AtomTracker: A Comprehensive approach to atomic region inference and violation detection. In: MICRO 2010, pp. 287–297. IEEE Computer Society (2010)
Vechev, M.T., Yahav, E., Yorsh, G.: Abstraction-guided synthesis of synchronization. In: POPL 2010, pp. 327–338. ACM (2010)
Zhang, L., Wahib, M., Zhang, H., Matsuoka, S.: A study of single and multi-device synchronization methods in Nvidia GPUs. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, 18–22 May 2020, pp. 483–493. IEEE (2020)
Acknowledgements
We thank the anonymous reviewers for their helpful comments and the authors of AutoSync for providing the source-code under a public license. We also thank the Ministry of Education, India, for financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Joshi, S., Muduganti, G. (2021). GPURepair: Automated Repair of GPU Kernels. In: Henglein, F., Shoham, S., Vizel, Y. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2021. Lecture Notes in Computer Science(), vol 12597. Springer, Cham. https://doi.org/10.1007/978-3-030-67067-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-67067-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67066-5
Online ISBN: 978-3-030-67067-2
eBook Packages: Computer ScienceComputer Science (R0)