Literature Review on Automatic Speech Recognition

Wiqas Ghai; Navdeep Singh

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Article:A Comparative study of Face Recognition with Principal Component Analysis and Cross-Correlation Technique

November

2010

Evaluating Embedded GPUs Performance via Computer Vision Applications

Jul

2020

Detection and Identification of Mass Structure in Digital Mammogram

September

2013

A Two Hop Power Adaptive MAC Protocol for Densely Populated Wireless Networks

March

2013

Reseach Article

Literature Review on Automatic Speech Recognition

by Wiqas Ghai, Navdeep Singh

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 41 - Number 8

Year of Publication: 2012

Authors: Wiqas Ghai, Navdeep Singh

10.5120/5565-7646

Wiqas Ghai, Navdeep Singh . Literature Review on Automatic Speech Recognition. International Journal of Computer Applications. 41, 8 ( March 2012), 42-50. DOI=10.5120/5565-7646

@article{ 10.5120/5565-7646,

author = { Wiqas Ghai, Navdeep Singh },

title = { Literature Review on Automatic Speech Recognition },

journal = { International Journal of Computer Applications },

issue_date = { March 2012 },

volume = { 41 },

number = { 8 },

month = { March },

year = { 2012 },

issn = { 0975-8887 },

pages = { 42-50 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume41/number8/5565-7646/ },

doi = { 10.5120/5565-7646 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:29:07.192320+05:30

%A Wiqas Ghai

%A Navdeep Singh

%T Literature Review on Automatic Speech Recognition

%J International Journal of Computer Applications

%@ 0975-8887

%V 41

%N 8

%P 42-50

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Automatic speech recognition, which was considered to be a concept of science fiction and which has been hit by number of performance degrading factors, is now an important part of information and communication technology. Improvements in the fundamental approaches and development of new approaches by researchers have lead to the advancement of ASRs which were just responding to a set of sounds to sophisticated ASRs which responds to fluently spoken natural language. Using artificial neural networks (ANNs), mathematical models of the low-level circuits in the human brain, to improve speech-recognition performance, through a model known as the ANN-Hidden Markov Model (ANN-HMM) have shown promise for large-vocabulary speech recognition systems. Achieving higher Recognition accuracy, low Word error rate, developing speech corpus depending upon the nature of language and addressing the issues of sources of variability through approaches like Missing Data Techniques & Convolutive Non-Negative Matrix Factorization, are the major considerations for developing an efficient ASR. In this paper, an effort has been made to highlight the progress made so far for ASRs of different languages and the technological perspective of automatic speech recognition in countries like China, Russian, Portuguese, Spain, Saudi Arab, Vietnam, Japan, UK, Sri-Lanka, Philippines, Algeria and India.

References

Davis, K. , Biddulph, R. , and Balashek, S. , "Automatic Recognition of Spoken Digit," J. Acoust. Soc. Am. 24: Nov 1952, p. 637.
Hemdal, J. F. and Hughes, G. W. , A feature based computer recognition program for the modeling of vowel perception, in Models for the Perception of Speech and Visual Form, Wathen-Dunn, W. Ed. MIT Press, Cambridge, MA.
Watcher, M. D. , Matton, M. , Demuynck, K. , Wambacq, P. , Cools, R. , "Template Based Continuous Speech Recognition", IEEE Transaction on Audio, Speech, & Language Processing, 2007.
Samoulian, A. , "Knowledge Based Approach to Speech Recognition", 1994.
Tripathy, H. K. , Tripathy, B. K. , Das, P. K. , "A Knowledge based Approach Using Fuzzy Inference Rules for Vowel Recognition", Journal of Convergence Information Technology Vol. 3 No 1, March 2008.
Savage, J. , Rivera, C. , Aguilar, V. , "Isolated word speech recognition using Vector Quantization Techniques and Artificial Neural Networks", 1991.
Debyeche, M. , Haton, J. P. , Houacine, A. , "Improved Vector Quantization Technique for Discrete HMM speech recognition system", International Arab Journal of information Technology, Vol. 4, No. 4, October 2007.
Hatulan, R. J. F. , Chan, A. J. L. , Hilario, A. D. , Lim, J. K. T. , and Sybingco, E. , "Speech to text converter for Filipino Language using Hybrid Artificial Neural Network and Hidden Markov Model", ECE Student Forum December 1, 2007 De La Salle University.
Sendra, J. P. , Iglesias, D. M. , Maria, F. D. , "Support Vector Machines For Continuous Speech Recognition", 14th European Signal Processing Conference 2006, Florence, Italy, Sept 2006.
Jain, R. And Saxena, S. K. , "Advanced Feature Extraction & Its Implementation In Speech Recognition System", IJSTM, Vol. 2 Issue 3, July 2011.
Aggarwal, R. K. and Dave, M. , "Acoustic Modelling Problem for Automatic Speech Recognition System: Conventional Methods (Part I)", International Journal of Speech Technology (2011) 14:297–308.
Aggarwal, R. K. and Dave, M. , "Acoustic modelling problem for automatic speech recognition system: advances and refinements (Part II)", International Journal of Speech Technology (2011) 14:309–320.
Ostendorf, M. , Digalakis, V. , & Kimball, O. A. (1996). From HMM's to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4(5), 360–378.
Yasuhisa Fujii, Y. , Yamamoto, K. , Nakagawa, S. , "AUTOMATIC SPEECH RECOGNITION USING HIDDEN CONDITIONAL NEURAL FIELDS", ICASSP 2011: P-5036-5039.
Mohamed, A. R. , Dahl, G. E. , and Hinton, G. , "Acoustic Modelling using Deep Belief Networks", submitted to IEEE TRANS. On audio, speech, and language processing, 2010.
Sorensen, J. , and Allauzen, C. , "Unary data structures for Language Models", INTERSPEECH 2011.
Kain, A. , Hosom, J. P. , Ferguson, S. H. , Bush, B. , "Creating a speech corpus with semi-spontaneous, parallel conversational and clear speech", Tech Report: CSLU-11-003, August 2011.
Hamdani, G. D. , Selouani, S. A. , Boudraa, M. , "ALGERIAN ARABIC SPEECH DATABASE (ALGASD): CORPUS DESIGN AND AUTOMATIC SPEECH RECOGNITION APPLICATION", The Arabian Journal for Science and Engineering, Volume 35, Number 2C, Dec 2010.
NGUYEN Hong Quang, TRINH Van Loan, LE The Dat, "Automatic Speech Recognition for Vietnamese using HTK", 2004.
Mathur, R. , Babita, Kansal, A. , "Domain specific speaker independent continuous speech recognizer using Julius", Proceedings of ASCNT – 2010, CDAC, Noida, India, pp. 55 – 60.
Kumar, K. and Aggarwal, R. K. , "Hindi Speech Recognition System Using HTK", International Journal of Computing and Business Research, ISSN (Online): 2229-6166, Volume 2 Issue 2 May 2011.
Gupta, R. , and Sivakumar, G. , "Speech Recognition for Hindi Language", IIT BOMBAY, 2006.
Venkataramani, B. , "SOPC-Based Speech-to-Text Conversion", 2006.
Lee, K. S. , "EMG-Based Speech Recognition Using Hidden Markov Models With Global Control Variables" IEEE Transactions on Biomedical Engineering, vol. 55, issue-3, pp: 930-940, March 2008.
Rabiner, L. Juang, B. H. , Yegnanarayana, B. , "Fundamentals of Speech Recognition", Pearson Publishers, 2010.
Garg, A. , Nikita, Poonam, "Connected digits recognition using Distance calculation at each digit", IJCEM International Journal of Computational Engineering & Management, Vol. 14, October 2011, ISSN (Online): 2230-7893.
Mishra, A. N. , Biswas, A. , Chandra, M. , Sharan, S. N. , "Robust Hindi connected digits recognition", International Journal of Signal Processing, Image Processing and Pattern Recognition Vol. 4, No. 2, June, 2011.
Syama, R. and Mary Idicula, S. , "Speech Recognition for Malyalam Language", 2008.
Kumar, R. , Singh, C. , Kaushik, S. , "Isolated and Connected Word Recognition for Punjabi Language using Acoustic Template Matching Technique",2004.
Thangarajan, R. , Natarajan, A. M. , Selvam, M. , "Word and Triphone Based Approaches in Continuous Speech Recognition for Tamil Language", March 2008.
Thangarajan, R. , Natarajan, A. M. , Selvam, M. , "Syllable based Continuous Speech Recognition for Tamil", Jan 2008.
Mohammad A. M. Abushariah, Moustafa Elshafei, Othman O. Khalifa, "Natural Speaker-Independent Arabic Speech Recognition System Based on Hidden Markov Models Using Sphinx Tools", May 2010.
Ronzhin, A. I. , Karpov, A. A. , "Large Vocabulary Automatic speech recognition for Russian Language", 2004.
Thang Tat Vu, Dung Tien Nguyen, Mai Chi Luong, John-Paul Hosom, "Vietnamese Large Vocabulary continuous speech recognition", 2004.
Huang Feng-Long, "An Effective approach for Chinese speech recognition on small size of vocabulary", Signal & Image Processing: An International Journal (SIPIJ) Vol. 2, No. 2, June 2011.
Nadungodage, T. and Weerasinghe, R. , "Continuous Sinhala Speech Recognizer", Conference on Human Language Technology for Development, Alexandria, Egypt, 2-5 May 2011.
Raza, A. , Hussain, S. , Sarfraz, H. , Ullah, I. , and Sarfraz, Z. , "An ASR System for Spontaneous Urdu Speech " in Proceedings of O-COCOSDA'09 and IEEE Xplore, 2009.
Vipperla, R. , Bozonnet, S. , Wang, D. , Evans, N. "Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization", CHIME Workshop on Machine Listening in Multisource Environments, Sept 2011.
Gemmeke, J. F. , Segbroeck, M. V. , Wang, Y. , Cranen, B. , Hamme, H. V. , "Automatic speech recognition using missing data techniques: Handling of real-world data", 2011.
Paul, D. , and Parekh, R. , "Automatic Speech Recognition of Isolated Words Using Neural Networks", Vol. 3 No. 6, IJEST-2011.
Kanokphara, S. , Tesprasit, V. , Thongprasirt, R. , "Pronunciation Variation Speech Recognition without Dictionary Modification On Sparse Database",2002.
Potamianos, A. , and Rose, R. C. , "On Combining Frequency Warping and Spectral Shaping for HMM based Speech Recognition", IEEE international conference on acoustics, Speech, & Signal Processing, April1997.
Rabiner, L. R. , Wilpon, J. G. , Rosenberg, A. E. , "A Voice Controlled Repertory-Dialer System", The Bell System Technical Journal Vol. 59, No. 7, 1980.
Aldefeld, B. Rabiner, L. R. Rosenberg, A. E. Wilpon, J. G. , "Automated Directory Listing Retrieval System based on Isolated Word Recognition", Vol 68, issue 11, Nov 1980.
Myers, C. S. And Rabiner, L. R. , "Automated Directory Listing Retrieval System based on recognition of connected letter strings, Journal of the Acoustical Society of America, Vol. 71, No. 3, Mar 1982.
Kawahara, T. , "New Transcription System using ASR in Japanese Parliament", Academic Center for Computing and Media Studies, Kyoto University, 2011.

Index Terms

Computer Science

Information Sciences

Keywords

Language Model Hidden Markov Model Vector Quantization Dynamic Time Warping Missing Data Techniques Convolutive Non-negative Matrix Factorization