SpeakQL: SQL Generation from Natural Language

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Comparison of Preprocessing Algorithms using an Affordable EEG Headset

Feb

2017

Impact of Mobility on Energy Consumption of AODV Protocol for Routing in Mobile Ad Hoc Networks

Oct

2016

Performance Evaluation and Comparison of PDTMRP and MAODV

May

2015

Development of Kannada Speech Corpus for Continuous Speech Recognition

Jun

2018

Reseach Article

SpeakQL: SQL Generation from Natural Language

Published on None 2025 by Madhu Damani, Gaurav Kamdar, Laksh Jethani, Mukesh Israni

International Conference on “Large Language Models and Use cases” 2023

Control System labs

LLMUC2023 - Number 2

None 2025

Authors: Madhu Damani, Gaurav Kamdar, Laksh Jethani, Mukesh Israni

Madhu Damani, Gaurav Kamdar, Laksh Jethani, Mukesh Israni . SpeakQL: SQL Generation from Natural Language. International Conference on “Large Language Models and Use cases” 2023. LLMUC2023, 2 (None 2025), 43-48.

@article{

author = { Madhu Damani, Gaurav Kamdar, Laksh Jethani, Mukesh Israni },

title = { SpeakQL: SQL Generation from Natural Language },

journal = { International Conference on “Large Language Models and Use cases” 2023 },

issue_date = { None 2025 },

volume = { LLMUC2023 },

number = { 2 },

month = { None },

year = { 2025 },

issn = 0975-8887,

pages = { 43-48 },

numpages = 6,

url = { /proceedings/llmuc2023/number2/speakql-sql-generation-from-natural-language/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on “Large Language Models and Use cases” 2023

%A Madhu Damani

%A Gaurav Kamdar

%A Laksh Jethani

%A Mukesh Israni

%T SpeakQL: SQL Generation from Natural Language

%J International Conference on “Large Language Models and Use cases” 2023

%@ 0975-8887

%V LLMUC2023

%N 2

%P 43-48

%D 2025

%I International Journal of Computer Applications

Abstract

In recent years, there has been growing interest in the complex task of converting natural language into SQL queries. This challenge typically involves using sequence-tosequence models, which require the serialization of SQL queries. However, a fundamental issue arises as a single SQL query can have multiple valid serializations, leading to the ‘order matters’ problem and making it difficult to train such models effectively. While existing state-of-the-art methods turn to reinforcement learning to address this issue, their success is limited. This paper presents SpeakQL, a novel approach tailored to scenarios where query order is not critical. SpeakQL adopts a sketchbased strategy, incorporating a dependency graph into its model architecture to consider the influence of prior predictions on current ones. Furthermore, SpeakQL utilizes GloVe embeddings and a column attention mechanism to enhance contextual comprehension, ultimately improving the query generation and result retrieval process.

References

Ayodele Adebiyi, Aderemi Adewumi, and Charles Ayo. “Stock price prediction using the ARIMA model”. In: Mar. 2014. DOI: 10.1109/UKSim.2014.67. s
Khalid Alkhatib et al. “Stock Price Prediction Using K-Nearest Neighbor (kNN) Algorithm”. In: 2013. URL: https://api.semanticscholar.org/CorpusID:17150877.
Vaishnavi Gururaj and R ShriyaV. “Stock Market Prediction using Linear Regression and Support Vector Machines”. In: 2019 URL: https://api.semanticscholar. org/CorpusID:220725999.
Chien-Feng Huang et al. “A comparative study of stock scoring using regression and genetic-based linear models”. In: 2011 IEEE International Conference on Granular Computing. 2011, pp. 268–273. DOI: 10.1109/GRC. 2011.6122606.
Zan Huang et al. “Credit rating analysis with support vector machines and neural networks: A market comparative study”. English (US). In: Decision Support Systems 37.4 (Sept. 2004), pp. 543–558. ISSN: 0167-9236. DOI: 10.1016/S0167-9236(03)00086-1.
Bast, H., & Haussmann, E. (2015, October). More accurate question answering on freebase. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 1431-1440). https://doi.org/10.1145/2806416.2806472
Blunschi, L., Jossen, C., Kossman, D., Mori, M., & Stockinger, K. (2012). Soda: Generating SQL for business users. Proceedings of the VLDB Endowment, 5, 932-943. https://doi.org/10.14778/2336664.2336667
Chang, S., Liu, P., Tang, Y., Huang, J., He, X., & Zhou, B. (2020, April). Zero-shot text-to-SQL learning with auxiliary task. In Proceedings of the AAAI Conference on Artificial Intelligence, (pp. 7488-7495). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v34i05.6246
Clarke, J., Goldwasser, D., Chang, M. W., & Roth, D. (2010, July). Driving semantic parsing from the world’s response. In Proceedings of the fourteenth conference on computational natural language learning (pp. 18-27). https://www.aclweb.org/anthology/W10-2903.pdf
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://arxiv.org/abs/1810.04805
Dong, L., & Lapata, M. (2018). Coarse-to-fine decoding for neural semantic parsing. In 56th Annual Meeting of the Association for Computational Linguistics, (pp. 731–742), Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1068
Ferré, S. (2017). Sparklis: An expressive query builder for SPARQL endpoints with guidance in natural language. Semantic Web, 8(3), 405-418. https://doi.org/10.3233/SW-150208
Guo, J., Zhan, Z., Gao, Y., Xiao, Y., Lou, J. G., & Liu, T. (2019). Towards complex text-to-SQL in cross-domain database with intermediate representation. In 57th Annual Meeting of the Association for Computational Linguistics, (pp. 4524-4535), Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1444
Youssef Mellah et al. / Journal of Computer Science 2021, 17 (5): 480.489 DOI: 10.3844/jcssp.2021.480.489 488 He, P., Mao, Y., Chakrabarti, K., & Chen, W. (2019). X-SQL: reinforce schema representation with context. arXiv preprint arXiv:1908.08113. https://arxiv.org/abs/1908.08113
Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, (pp. 328-339), Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1031
Hwang, W., Yim, J., Park, S., & Seo, M. (2019). A comprehensive exploration on wikiSQL with tableaware word contextualization. arXiv preprint arXiv:1902.01069. https://arxiv.org/abs/1902.01069
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for selfsupervised learning of language representations. arXiv preprint arXiv:1909.11942. https://arxiv.org/abs/1909.11942
Li, N., Keller, B., Butler, M., & Cer, D. (2020). SeqGenSQL--A Robust Sequence Generation Model for Structured Query Language. arXiv preprint arXiv:2011.03836. https://arxiv.org/abs/2011.03836
Liu, X., He, P., Chen, W., & Gao, J. (2019a). Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (pp. 4487–4496), Association for Computational Linguistics, Florence, Italy.
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. https://arxiv.org/abs/1711.05101 Lyu, Q., Chakrabarti, K., Hathi, S., Kundu, S., Zhang, J., & Chen, Z. (2020). Hybrid ranking network for textto-SQL. arXiv preprint arXiv:2008.04759. https://arxiv.org/abs/2008.04759

Index Terms

Computer Science

Information Sciences

Keywords

Machine Learning Deep Learning Recurrent Neural Networks (RNN) Long Short-Term Memory (LSTMs) Global Vector Embeddings (GloVe) Word2Vec