Abstract
The join operator is one of the key operators in RDBMS, and estimating its evaluation time is a fundamental task in query optimization, scheduling, etc. However, it is hard to make a precise estimation, which is not only related with the physical join implementations (hash, sort, loop) but also with the corresponding parameters, like the size of the data, the number of partitions, the number of threads in a modern hash join. Existing works rely on the time complexity analysis but yield rough results, or employ machine learning techniques to build a predictive model but require many training instances. In this paper, we propose a method, named JG2Time, to estimate the running time using the join-graphs constructed from the source codes. Specifically, we construct a heterogonous join-graph by annotating parameter nodes to a call-graph generated by running time analysis tools, and propose ReGAT, a heterogonous graph neural network, to fully capture the edge weights (the number of function calls) in the join-graph. The embeddings learned from ReGAT can be used to predict the running time. In addition, we optimize JG2Time with a multi-task model that also predicts the times of function calls, and an unsupervised code learning method to enhance its generalization. The experimental results illustrate the effectiveness of JG2Time and its optimization strategies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.W.: A transformer-based approach for source code summarization. arXiv preprint arXiv:2005.00653 (2020)
Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. arXiv preprint arXiv:1207.0145 (2012)
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3(POPL), 1–29 (2019)
Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. Proc. VLDB Endowment 7(1), 85–96 (2013)
Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 362–373. IEEE (2013)
Bandle, M., Giceva, J., Neumann, T.: To partition, or not to partition, that is the join question in a real system. In: Proceedings of the 2021 International Conference on Management of Data, pp. 168–180 (2021)
Barber, R., et al.: Memory-efficient hash joins. Proc. VLDB Endowment 8(4), 353–364 (2014)
Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 37–48 (2011)
Büch, L., Andrzejak, A.: Learning-based recursive aggregation of abstract syntax trees for code clone detection. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 95–104. IEEE (2019)
Bui, N.D., Yu, Y., Jiang, L.: Infercode: self-supervised learning of code representations by predicting subtrees. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1186–1197. IEEE (2021)
Busbridge, D., Sherburn, D., Cavallo, P., Hammerla, N.Y.: Relational graph attention networks. arXiv preprint arXiv:1904.05811 (2019)
Chen, J., Hou, H., Gao, J., Ji, Y., Bai, T.: RGCN: recurrent graph convolutional networks for target-dependent sentiment analysis. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds.) KSEM 2019. LNCS (LNAI), vol. 11775, pp. 667–675. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29551-6_59
Fang, Z., He, Z., Chu, J., Weng, C.: SIMD accelerates the probe phase of star joins in main memory databases. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11448, pp. 476–480. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18590-9_70
Ha, H., Zhang, H.: DeepPerf: performance prediction for configurable software with deep sparse neural network. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 1095–1106. IEEE (2019)
Han, S., Wang, D., Li, W., Lu, X.: A comparison of code embeddings and beyond. arXiv preprint arXiv:2109.07173 (2021)
He, B., Luo, Q.: Cache-oblivious databases: limitations and opportunities. ACM Trans. Database Syst. (TODS) 33(2), 1–42 (2008)
He, B., et al.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524 (2008)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Lang, H., Leis, V., Albutiu, M.-C., Neumann, T., Kemper, A.: Massively parallel NUMA-aware hash joins. In: Jagatheesan, A., Levandoski, J., Neumann, T., Pavlo, A. (eds.) IMDM 2013-2014. LNCS, vol. 8921, pp. 3–14. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13960-9_1
Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
:Ramadan, T., Islam, T.Z., Phelps, C., Pinnow, N., Thiagarajan, J.J.: Comparative code structure analysis using deep learning for performance prediction. In: 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 151–161. IEEE (2021)
Samoaa, H., Leitner, P.: An exploratory study of the impact of parameterization on JMH measurement results in open-source projects. In: Proceedings of the ACM/SPEC International Conference on Performance Engineering, pp. 213–224 (2021)
Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1961–1976 (2016)
Wagner, B., Kohn, A., Neumann, T.: Self-tuning query scheduling for analytical workloads. In: Proceedings of the 2021 International Conference on Management of Data, pp. 1879–1891 (2021)
Wang, X., et al.: Heterogeneous graph attention network. In: The World Wide Web Conference, pp. 2022–2032 (2019)
Wei, B., Li, G., Xia, X., Fu, Z., Jin, Z.: Code generation as a dual task of code summarization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 87–98. IEEE (2016)
Yang, C., Xiao, Y., Zhang, Y., Sun, Y., Han, J.: Survey, benchmark, evaluation, and beyond, Heterogeneous network representation learning (2020)
Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-lstm for join order selection. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1297–1308. IEEE (2020)
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)
Zhu, Z., Fan, X., Chu, X., Bi, J.: HGCN: a heterogeneous graph convolutional network-based deep learning model toward collective classification. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1161–1171 (2020)
Acknowledgement
This work was partially supported by NSFC under Grant No. 62272008 and 61832001, and ZTE-PKU Joint Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Miao, H., Chen, J., Lin, Y., Xu, M., Han, Y., Gao, J. (2023). JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-30637-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30636-5
Online ISBN: 978-3-031-30637-2
eBook Packages: Computer ScienceComputer Science (R0)