Abstract
Natural language querying offers an intuitive and user-friendly interface. A Natural Language Interface over databases, often termed as “Text-to-SQL”, involves translating a query posed in natural language into a corresponding SQL query for structured databases. A significant number of recent methodologies, anchored in the pre-trained language model and encode-decode paradigms, have been developed to address this task. Yet, existing approaches often grapple with generating accurate SQL queries, especially in scenarios that involve multiple values and intricate column calculations.
In this study, we present a task-driven Text-to-SQL model. This model breaks down the SQL prediction process into specific sub-tasks based on the unique task requirements of the query. Specifically, we amalgamate structure prediction, value extraction, and column relationship prediction into a cohesive workflow. The model is designed to construct target SQL queries incrementally, with each sub-task building upon the outcomes of its predecessors. Additionally, we introduce a novel filtering mechanism to refine and re-order candidates produced during the beam search phase. We substantiate the efficacy of our model using public datasets, showcasing its adeptness in both English and Chinese contexts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bogin, B., Gardner, M., Berant, J.: Global reasoning over database structures for text-to-SQL parsing. In: EMNLP-IJCNLP 2019, pp. 3657–3662 (2019)
Cao, R., Chen, L., Chen, Z., Zhao, Y., Zhu, S., Yu, K.: LGESQL: line graph enhanced text-to-SQL model with mixed local and non-local relations. In: ACL/IJCNLP 2021, pp. 2541–2555 (2021)
Gan, Y., et al.: Towards robustness of text-to-SQL models against synonym substitution. In: ACL/IJCNLP 2021, pp. 2505–2515 (2021)
Gan, Y., Chen, X., Purver, M.: Exploring underexplored limitations of cross-domain text-to-SQL generalization. In: EMNLP 2021, pp. 8926–8931 (2021)
Guo, J., et al.: Towards complex text-to-SQL in cross-domain database with intermediate representation. In: ACL 2019, pp. 4524–4535 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB 2003, pp. 850–861 (2003)
Lei, W., et al.: Re-examining the role of schema linking in text-to-SQL. In: EMNLP 2020, pp. 6943–6954 (2020)
Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow. 8(1), 73–84 (2014)
Li, H., Zhang, J., Li, C., Chen, H.: RESDSQL: decoupling schema linking and skeleton parsing for text-to-SQL. CoRR abs/2302.05965 (2023)
Li, J., et al.: Graphix-T5: mixing pre-trained transformers with graph-aware layers for text-to-SQL parsing. CoRR abs/2301.07507 (2023)
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV 2017, pp. 2999–3007 (2017)
Liu, A., Hu, X., Lin, L., Wen, L.: Semantic enhanced text-to-SQL parsing via iteratively learning schema linking graph. In: KDD 2022, pp. 1021–1030 (2022)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019)
Luo, Y., Lin, X., Wang, W., Zhou, X.: SPARK: top-k keyword query in relational databases. In: SIGMOD 2007, pp. 115–126 (2007)
Ma, P., Wang, S.: MT-Teql: evaluating and augmenting neural NLIDB on real-world linguistic and schema variations. Proc. VLDB Endow. 15(3), 569–582 (2021)
Qin, B., et al.: A survey on text-to-SQL parsing: concepts, methods, and future directions. CoRR abs/2208.13629 (2022)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)
Scholak, T., Schucher, N., Bahdanau, D.: PICARD: parsing incrementally for constrained auto-regressive decoding from language models. In: EMNLP 2021, pp. 9895–9901 (2021)
Shazeer, N., Stern, M.: Adafactor: adaptive learning rates with sublinear memory cost. In: ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4603–4611 (2018)
Suhr, A., Chang, M., Shaw, P., Lee, K.: Exploring unexplored generalization challenges for cross-database semantic parsing. In: ACL 2020, pp. 8372–8388 (2020)
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. In: ACL 2020, pp. 7567–7578 (2020)
Wang, L., et al.: Proton: probing schema linking information from pre-trained language models for text-to-SQL parsing. In: KDD 2022, pp. 1889–1898. ACM (2022)
Wang, L., et al.: DuSQL: a large-scale and pragmatic Chinese text-to-SQL dataset. In: EMNLP 2020, pp. 6923–6935 (2020)
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. In: NAACL-HLT 2021, pp. 483–498 (2021)
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: EMNLP 2018, pp. 3911–3921 (2018)
Zelle, J.M., Mooney, R.J.: Learning to parse database queries using inductive logic programming. In: AAAI 1996, vol. 2, pp. 1050–1055 (1996)
Acknowledgement
This work is supported by National Natural Science Foundation of China (NSFC), 61972151. We thank the anonymous reviewers for their valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, Y., Zhang, Q., Yao, J. (2023). Task-Driven Neural Natural Language Interface to Database. In: Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R. (eds) Web Information Systems Engineering – WISE 2023. WISE 2023. Lecture Notes in Computer Science, vol 14306. Springer, Singapore. https://doi.org/10.1007/978-981-99-7254-8_51
Download citation
DOI: https://doi.org/10.1007/978-981-99-7254-8_51
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7253-1
Online ISBN: 978-981-99-7254-8
eBook Packages: Computer ScienceComputer Science (R0)