Skip to main content

JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Included in the following conference series:

  • 2626 Accesses

Abstract

The join operator is one of the key operators in RDBMS, and estimating its evaluation time is a fundamental task in query optimization, scheduling, etc. However, it is hard to make a precise estimation, which is not only related with the physical join implementations (hash, sort, loop) but also with the corresponding parameters, like the size of the data, the number of partitions, the number of threads in a modern hash join. Existing works rely on the time complexity analysis but yield rough results, or employ machine learning techniques to build a predictive model but require many training instances. In this paper, we propose a method, named JG2Time, to estimate the running time using the join-graphs constructed from the source codes. Specifically, we construct a heterogonous join-graph by annotating parameter nodes to a call-graph generated by running time analysis tools, and propose ReGAT, a heterogonous graph neural network, to fully capture the edge weights (the number of function calls) in the join-graph. The embeddings learned from ReGAT can be used to predict the running time. In addition, we optimize JG2Time with a multi-task model that also predicts the times of function calls, and an unsupervised code learning method to enhance its generalization. The experimental results illustrate the effectiveness of JG2Time and its optimization strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://valgrind.org/.

References

  1. Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.W.: A transformer-based approach for source code summarization. arXiv preprint arXiv:2005.00653 (2020)

  2. Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. arXiv preprint arXiv:1207.0145 (2012)

  3. Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3(POPL), 1–29 (2019)

    Google Scholar 

  4. Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. Proc. VLDB Endowment 7(1), 85–96 (2013)

    Google Scholar 

  5. Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 362–373. IEEE (2013)

    Google Scholar 

  6. Bandle, M., Giceva, J., Neumann, T.: To partition, or not to partition, that is the join question in a real system. In: Proceedings of the 2021 International Conference on Management of Data, pp. 168–180 (2021)

    Google Scholar 

  7. Barber, R., et al.: Memory-efficient hash joins. Proc. VLDB Endowment 8(4), 353–364 (2014)

    Article  Google Scholar 

  8. Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 37–48 (2011)

    Google Scholar 

  9. Büch, L., Andrzejak, A.: Learning-based recursive aggregation of abstract syntax trees for code clone detection. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 95–104. IEEE (2019)

    Google Scholar 

  10. Bui, N.D., Yu, Y., Jiang, L.: Infercode: self-supervised learning of code representations by predicting subtrees. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1186–1197. IEEE (2021)

    Google Scholar 

  11. Busbridge, D., Sherburn, D., Cavallo, P., Hammerla, N.Y.: Relational graph attention networks. arXiv preprint arXiv:1904.05811 (2019)

  12. Chen, J., Hou, H., Gao, J., Ji, Y., Bai, T.: RGCN: recurrent graph convolutional networks for target-dependent sentiment analysis. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds.) KSEM 2019. LNCS (LNAI), vol. 11775, pp. 667–675. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29551-6_59

    Chapter  Google Scholar 

  13. Fang, Z., He, Z., Chu, J., Weng, C.: SIMD accelerates the probe phase of star joins in main memory databases. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11448, pp. 476–480. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18590-9_70

    Chapter  Google Scholar 

  14. Ha, H., Zhang, H.: DeepPerf: performance prediction for configurable software with deep sparse neural network. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 1095–1106. IEEE (2019)

    Google Scholar 

  15. Han, S., Wang, D., Li, W., Lu, X.: A comparison of code embeddings and beyond. arXiv preprint arXiv:2109.07173 (2021)

  16. He, B., Luo, Q.: Cache-oblivious databases: limitations and opportunities. ACM Trans. Database Syst. (TODS) 33(2), 1–42 (2008)

    Article  Google Scholar 

  17. He, B., et al.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524 (2008)

    Google Scholar 

  18. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  19. Lang, H., Leis, V., Albutiu, M.-C., Neumann, T., Kemper, A.: Massively parallel NUMA-aware hash joins. In: Jagatheesan, A., Levandoski, J., Neumann, T., Pavlo, A. (eds.) IMDM 2013-2014. LNCS, vol. 8921, pp. 3–14. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13960-9_1

    Chapter  Google Scholar 

  20. Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  21. :Ramadan, T., Islam, T.Z., Phelps, C., Pinnow, N., Thiagarajan, J.J.: Comparative code structure analysis using deep learning for performance prediction. In: 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 151–161. IEEE (2021)

    Google Scholar 

  22. Samoaa, H., Leitner, P.: An exploratory study of the impact of parameterization on JMH measurement results in open-source projects. In: Proceedings of the ACM/SPEC International Conference on Performance Engineering, pp. 213–224 (2021)

    Google Scholar 

  23. Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1961–1976 (2016)

    Google Scholar 

  24. Wagner, B., Kohn, A., Neumann, T.: Self-tuning query scheduling for analytical workloads. In: Proceedings of the 2021 International Conference on Management of Data, pp. 1879–1891 (2021)

    Google Scholar 

  25. Wang, X., et al.: Heterogeneous graph attention network. In: The World Wide Web Conference, pp. 2022–2032 (2019)

    Google Scholar 

  26. Wei, B., Li, G., Xia, X., Fu, Z., Jin, Z.: Code generation as a dual task of code summarization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  27. White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 87–98. IEEE (2016)

    Google Scholar 

  28. Yang, C., Xiao, Y., Zhang, Y., Sun, Y., Han, J.: Survey, benchmark, evaluation, and beyond, Heterogeneous network representation learning (2020)

    Google Scholar 

  29. Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-lstm for join order selection. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1297–1308. IEEE (2020)

    Google Scholar 

  30. Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)

    Google Scholar 

  31. Zhu, Z., Fan, X., Chu, X., Bi, J.: HGCN: a heterogeneous graph convolutional network-based deep learning model toward collective classification. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1161–1171 (2020)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by NSFC under Grant No. 62272008 and 61832001, and ZTE-PKU Joint Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miao, H., Chen, J., Lin, Y., Xu, M., Han, Y., Gao, J. (2023). JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30637-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30636-5

  • Online ISBN: 978-3-031-30637-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics