JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs

Miao, Hao; Chen, Jiazun; Lin, Yang; Xu, Mo; Han, Yinjun; Gao, Jun

doi:10.1007/978-3-031-30637-2_9

Hao Miao¹⁵,
Jiazun Chen¹⁵,
Yang Lin¹⁶,
Mo Xu¹⁶,
Yinjun Han¹⁶ &
…
Jun Gao¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2626 Accesses

Abstract

The join operator is one of the key operators in RDBMS, and estimating its evaluation time is a fundamental task in query optimization, scheduling, etc. However, it is hard to make a precise estimation, which is not only related with the physical join implementations (hash, sort, loop) but also with the corresponding parameters, like the size of the data, the number of partitions, the number of threads in a modern hash join. Existing works rely on the time complexity analysis but yield rough results, or employ machine learning techniques to build a predictive model but require many training instances. In this paper, we propose a method, named JG2Time, to estimate the running time using the join-graphs constructed from the source codes. Specifically, we construct a heterogonous join-graph by annotating parameter nodes to a call-graph generated by running time analysis tools, and propose ReGAT, a heterogonous graph neural network, to fully capture the edge weights (the number of function calls) in the join-graph. The embeddings learned from ReGAT can be used to predict the running time. In addition, we optimize JG2Time with a multi-task model that also predicts the times of function calls, and an unsupervised code learning method to enhance its generalization. The experimental results illustrate the effectiveness of JG2Time and its optimization strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Investigating Learning Join Order Optimization Strategies for Rule-based Data Engines

Article 22 November 2024

A Novel Technique for Query Plan Representation Based on Graph Neural Nets

DeepQT : Learning Sequential Context for Query Execution Time Prediction

Notes

1.
https://valgrind.org/.

References

Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.W.: A transformer-based approach for source code summarization. arXiv preprint arXiv:2005.00653 (2020)
Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. arXiv preprint arXiv:1207.0145 (2012)
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3(POPL), 1–29 (2019)
Google Scholar
Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. Proc. VLDB Endowment 7(1), 85–96 (2013)
Google Scholar
Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 362–373. IEEE (2013)
Google Scholar
Bandle, M., Giceva, J., Neumann, T.: To partition, or not to partition, that is the join question in a real system. In: Proceedings of the 2021 International Conference on Management of Data, pp. 168–180 (2021)
Google Scholar
Barber, R., et al.: Memory-efficient hash joins. Proc. VLDB Endowment 8(4), 353–364 (2014)
Article Google Scholar
Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 37–48 (2011)
Google Scholar
Büch, L., Andrzejak, A.: Learning-based recursive aggregation of abstract syntax trees for code clone detection. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 95–104. IEEE (2019)
Google Scholar
Bui, N.D., Yu, Y., Jiang, L.: Infercode: self-supervised learning of code representations by predicting subtrees. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1186–1197. IEEE (2021)
Google Scholar
Busbridge, D., Sherburn, D., Cavallo, P., Hammerla, N.Y.: Relational graph attention networks. arXiv preprint arXiv:1904.05811 (2019)
Chen, J., Hou, H., Gao, J., Ji, Y., Bai, T.: RGCN: recurrent graph convolutional networks for target-dependent sentiment analysis. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds.) KSEM 2019. LNCS (LNAI), vol. 11775, pp. 667–675. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29551-6_59
Chapter Google Scholar
Fang, Z., He, Z., Chu, J., Weng, C.: SIMD accelerates the probe phase of star joins in main memory databases. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11448, pp. 476–480. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18590-9_70
Chapter Google Scholar
Ha, H., Zhang, H.: DeepPerf: performance prediction for configurable software with deep sparse neural network. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 1095–1106. IEEE (2019)
Google Scholar
Han, S., Wang, D., Li, W., Lu, X.: A comparison of code embeddings and beyond. arXiv preprint arXiv:2109.07173 (2021)
He, B., Luo, Q.: Cache-oblivious databases: limitations and opportunities. ACM Trans. Database Syst. (TODS) 33(2), 1–42 (2008)
Article Google Scholar
He, B., et al.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524 (2008)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Lang, H., Leis, V., Albutiu, M.-C., Neumann, T., Kemper, A.: Massively parallel NUMA-aware hash joins. In: Jagatheesan, A., Levandoski, J., Neumann, T., Pavlo, A. (eds.) IMDM 2013-2014. LNCS, vol. 8921, pp. 3–14. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13960-9_1
Chapter Google Scholar
Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
:Ramadan, T., Islam, T.Z., Phelps, C., Pinnow, N., Thiagarajan, J.J.: Comparative code structure analysis using deep learning for performance prediction. In: 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 151–161. IEEE (2021)
Google Scholar
Samoaa, H., Leitner, P.: An exploratory study of the impact of parameterization on JMH measurement results in open-source projects. In: Proceedings of the ACM/SPEC International Conference on Performance Engineering, pp. 213–224 (2021)
Google Scholar
Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1961–1976 (2016)
Google Scholar
Wagner, B., Kohn, A., Neumann, T.: Self-tuning query scheduling for analytical workloads. In: Proceedings of the 2021 International Conference on Management of Data, pp. 1879–1891 (2021)
Google Scholar
Wang, X., et al.: Heterogeneous graph attention network. In: The World Wide Web Conference, pp. 2022–2032 (2019)
Google Scholar
Wei, B., Li, G., Xia, X., Fu, Z., Jin, Z.: Code generation as a dual task of code summarization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 87–98. IEEE (2016)
Google Scholar
Yang, C., Xiao, Y., Zhang, Y., Sun, Y., Han, J.: Survey, benchmark, evaluation, and beyond, Heterogeneous network representation learning (2020)
Google Scholar
Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-lstm for join order selection. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1297–1308. IEEE (2020)
Google Scholar
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)
Google Scholar
Zhu, Z., Fan, X., Chu, X., Bi, J.: HGCN: a heterogeneous graph convolutional network-based deep learning model toward collective classification. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1161–1171 (2020)
Google Scholar

Download references

Acknowledgement

This work was partially supported by NSFC under Grant No. 62272008 and 61832001, and ZTE-PKU Joint Program.

Author information

Authors and Affiliations

School of Computer Science, Peking University, Beijing, China
Hao Miao, Jiazun Chen & Jun Gao
ZTE Corporation, Beijing, China
Yang Lin, Mo Xu & Yinjun Han

Authors

Hao Miao
View author publications
You can also search for this author in PubMed Google Scholar
Jiazun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Mo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yinjun Han
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Gao .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Torino, Turin, Italy
Maria Luisa Sapino
POSTECH, Pohang, Korea (Republic of)
Wook-Shin Han
University of California Santa Barbara, Santa Barbara, CA, USA
Amr El Abbadi
University of Auckland, Auckland, New Zealand
Gill Dobbie
Tianjin University, Tianjin, China
Zhiyong Feng
Beijing University of Posts and Telecommunications, Beijing, China
Yingxiao Shao
The University of Queensland, Brisbane, QLD, Australia
Hongzhi Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miao, H., Chen, J., Lin, Y., Xu, M., Han, Y., Gao, J. (2023). JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-30637-2_9
Published: 14 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30636-5
Online ISBN: 978-3-031-30637-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs