CFP last date
20 February 2025
Reseach Article

SpeakQL: SQL Generation from Natural Language

Published on None 2025 by Madhu Damani, Gaurav Kamdar, Laksh Jethani, Mukesh Israni
International Conference on “Large Language Models and Use cases” 2023
Control System labs
LLMUC2023 - Number 2
None 2025
Authors: Madhu Damani, Gaurav Kamdar, Laksh Jethani, Mukesh Israni

Madhu Damani, Gaurav Kamdar, Laksh Jethani, Mukesh Israni . SpeakQL: SQL Generation from Natural Language. International Conference on “Large Language Models and Use cases” 2023. LLMUC2023, 2 (None 2025), 43-48.

@article{
author = { Madhu Damani, Gaurav Kamdar, Laksh Jethani, Mukesh Israni },
title = { SpeakQL: SQL Generation from Natural Language },
journal = { International Conference on “Large Language Models and Use cases” 2023 },
issue_date = { None 2025 },
volume = { LLMUC2023 },
number = { 2 },
month = { None },
year = { 2025 },
issn = 0975-8887,
pages = { 43-48 },
numpages = 6,
url = { /proceedings/llmuc2023/number2/speakql-sql-generation-from-natural-language/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on “Large Language Models and Use cases” 2023
%A Madhu Damani
%A Gaurav Kamdar
%A Laksh Jethani
%A Mukesh Israni
%T SpeakQL: SQL Generation from Natural Language
%J International Conference on “Large Language Models and Use cases” 2023
%@ 0975-8887
%V LLMUC2023
%N 2
%P 43-48
%D 2025
%I International Journal of Computer Applications
Abstract

In recent years, there has been growing interest in the complex task of converting natural language into SQL queries. This challenge typically involves using sequence-tosequence models, which require the serialization of SQL queries. However, a fundamental issue arises as a single SQL query can have multiple valid serializations, leading to the ‘order matters’ problem and making it difficult to train such models effectively. While existing state-of-the-art methods turn to reinforcement learning to address this issue, their success is limited. This paper presents SpeakQL, a novel approach tailored to scenarios where query order is not critical. SpeakQL adopts a sketchbased strategy, incorporating a dependency graph into its model architecture to consider the influence of prior predictions on current ones. Furthermore, SpeakQL utilizes GloVe embeddings and a column attention mechanism to enhance contextual comprehension, ultimately improving the query generation and result retrieval process.

References
  1. Ayodele Adebiyi, Aderemi Adewumi, and Charles Ayo. “Stock price prediction using the ARIMA model”. In: Mar. 2014. DOI: 10.1109/UKSim.2014.67. s
  2. Khalid Alkhatib et al. “Stock Price Prediction Using K-Nearest Neighbor (kNN) Algorithm”. In: 2013. URL: https://api.semanticscholar.org/CorpusID:17150877.
  3. Vaishnavi Gururaj and R ShriyaV. “Stock Market Prediction using Linear Regression and Support Vector Machines”. In: 2019 URL: https://api.semanticscholar. org/CorpusID:220725999.
  4. Chien-Feng Huang et al. “A comparative study of stock scoring using regression and genetic-based linear models”. In: 2011 IEEE International Conference on Granular Computing. 2011, pp. 268–273. DOI: 10.1109/GRC. 2011.6122606.
  5. Zan Huang et al. “Credit rating analysis with support vector machines and neural networks: A market comparative study”. English (US). In: Decision Support Systems 37.4 (Sept. 2004), pp. 543–558. ISSN: 0167-9236. DOI: 10.1016/S0167-9236(03)00086-1.
  6. Bast, H., & Haussmann, E. (2015, October). More accurate question answering on freebase. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 1431-1440). https://doi.org/10.1145/2806416.2806472
  7. Blunschi, L., Jossen, C., Kossman, D., Mori, M., & Stockinger, K. (2012). Soda: Generating SQL for business users. Proceedings of the VLDB Endowment, 5, 932-943. https://doi.org/10.14778/2336664.2336667
  8. Chang, S., Liu, P., Tang, Y., Huang, J., He, X., & Zhou, B. (2020, April). Zero-shot text-to-SQL learning with auxiliary task. In Proceedings of the AAAI Conference on Artificial Intelligence, (pp. 7488-7495). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v34i05.6246
  9. Clarke, J., Goldwasser, D., Chang, M. W., & Roth, D. (2010, July). Driving semantic parsing from the world’s response. In Proceedings of the fourteenth conference on computational natural language learning (pp. 18-27). https://www.aclweb.org/anthology/W10-2903.pdf
  10. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://arxiv.org/abs/1810.04805
  11. Dong, L., & Lapata, M. (2018). Coarse-to-fine decoding for neural semantic parsing. In 56th Annual Meeting of the Association for Computational Linguistics, (pp. 731–742), Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1068
  12. Ferré, S. (2017). Sparklis: An expressive query builder for SPARQL endpoints with guidance in natural language. Semantic Web, 8(3), 405-418. https://doi.org/10.3233/SW-150208
  13. Guo, J., Zhan, Z., Gao, Y., Xiao, Y., Lou, J. G., & Liu, T. (2019). Towards complex text-to-SQL in cross-domain database with intermediate representation. In 57th Annual Meeting of the Association for Computational Linguistics, (pp. 4524-4535), Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1444
  14. Youssef Mellah et al. / Journal of Computer Science 2021, 17 (5): 480.489 DOI: 10.3844/jcssp.2021.480.489 488 He, P., Mao, Y., Chakrabarti, K., & Chen, W. (2019). X-SQL: reinforce schema representation with context. arXiv preprint arXiv:1908.08113. https://arxiv.org/abs/1908.08113
  15. Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, (pp. 328-339), Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1031
  16. Hwang, W., Yim, J., Park, S., & Seo, M. (2019). A comprehensive exploration on wikiSQL with tableaware word contextualization. arXiv preprint arXiv:1902.01069. https://arxiv.org/abs/1902.01069
  17. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for selfsupervised learning of language representations. arXiv preprint arXiv:1909.11942. https://arxiv.org/abs/1909.11942
  18. Li, N., Keller, B., Butler, M., & Cer, D. (2020). SeqGenSQL--A Robust Sequence Generation Model for Structured Query Language. arXiv preprint arXiv:2011.03836. https://arxiv.org/abs/2011.03836
  19. Liu, X., He, P., Chen, W., & Gao, J. (2019a). Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (pp. 4487–4496), Association for Computational Linguistics, Florence, Italy.
  20. Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. https://arxiv.org/abs/1711.05101 Lyu, Q., Chakrabarti, K., Hathi, S., Kundu, S., Zhang, J., & Chen, Z. (2020). Hybrid ranking network for textto-SQL. arXiv preprint arXiv:2008.04759. https://arxiv.org/abs/2008.04759
Index Terms

Computer Science
Information Sciences

Keywords

Machine Learning Deep Learning Recurrent Neural Networks (RNN) Long Short-Term Memory (LSTMs) Global Vector Embeddings (GloVe) Word2Vec