Advances in Voice Enabled Human Machine Communication

Shweta Sinha; S. S Agrawal; Aruna Jain

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Advances in Voice Enabled Human Machine Communication

by Shweta Sinha, S. S Agrawal, Aruna Jain

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 60 - Number 17

Year of Publication: 2012

Authors: Shweta Sinha, S. S Agrawal, Aruna Jain

10.5120/9785-4356

Shweta Sinha, S. S Agrawal, Aruna Jain . Advances in Voice Enabled Human Machine Communication. International Journal of Computer Applications. 60, 17 ( December 2012), 27-33. DOI=10.5120/9785-4356

@article{ 10.5120/9785-4356,

author = { Shweta Sinha, S. S Agrawal, Aruna Jain },

title = { Advances in Voice Enabled Human Machine Communication },

journal = { International Journal of Computer Applications },

issue_date = { December 2012 },

volume = { 60 },

number = { 17 },

month = { December },

year = { 2012 },

issn = { 0975-8887 },

pages = { 27-33 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume60/number17/9785-4356/ },

doi = { 10.5120/9785-4356 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:07:13.505430+05:30

%A Shweta Sinha

%A S. S Agrawal

%A Aruna Jain

%T Advances in Voice Enabled Human Machine Communication

%J International Journal of Computer Applications

%@ 0975-8887

%V 60

%N 17

%P 27-33

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The inherent advantage of speech communication due to its variability, convenience and speed along with our increasing requirements to communicate with machines has driven the attention of researchers towards mechanical recognition of speech. Technological advancements and improvements in the fundamental approaches have shown a successful transition from small vocabulary isolated word recognition to large vocabulary continuous speech recognition. Even after years of research and development the accuracy of automatic speech recognition remains one of the major challenges. Design of speech recognition system requires careful selection of feature extraction technique and modeling approach to cover the challenges faced due to variability of speech-speaker characteristic, storage space and processing speed requirements. In this paper an effort has been made to highlight the progress made so far for mechanizing the recognition of speech along with the major challenges in this field. Authors have also presented a brief description of voice enabled service for common people. The objective of this paper is to summarize some of the well known methods used at various stages of speech recognition system along with their benefits and limitations.

References

J. L. Flanagan, Speech Analysis, Synthesis and Perception, Second Edition, Springer-Verlag, 1972.
B. H. Juang, Lawrence R. Rabiner, Automatic Speech Recognition – A Brief History of the Technology Development, Elsevier Encyclopedia of Language and Linguistics, Second Edition, 2005
V. Zue et al, Jupiter: A Telephone-Based Conversational Interface for Weather Information, IEEE Trans. On Speech and Audio Processing, Vol. X, pp. 100-112, Jan. 2000.
J. Glass and E. Weinstein, Speech Builder: Facilitating Spoken Dialogue System Development, 7th European Conf. on Speech Communication and Technology, Aalborg Denmark, Sept. 2001.
Sadaoki Furui, Recent Advances in Spontaneous Speech Recognition and Understanding, ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, 2003
K. -F. Lee, Large-vocabulary speaker-independent continuous speech recognition: The Sphinx system, Ph. D. Thesis, Carnegie Mellon University, 1988.
Hong C. Leung , Benjamin Chigier and James R. Glass, A comparative study of signal representations and classification techniques for speech recognition, Proc of IEEE International Conference(ICASSP-93) on Acoustics, Speech, and Signal Processing, 1993
John Makhoul, Linear Prediction: A Tutorial Review ,Proc of IEEE, Vol 63, no 4 April 1975
B S Atal and S L Hanauer, Speech Analysis and Synthesis by Linear Prediction of the Formant Frequencies, Electronics and Communications in Japan, Vol. 53 A, pp. 36-43, 1970
F. Itakura, Minimum Prediction Residual Principle Applied to Speech Recognition, IEEE Trans. Acoustics, Speech and Signal Proc. , Vol. ASSP-23, pp. 57-72, Feb. 1975.
Cristhian Manuel Durán Acevedo, Martín Gallo Nieves, Integrated System Approach for the Automatic Speech Recognition using Linear predict Coding and Neural Networks, Fourth Congress of Electronics, Robotics and Automotive Mechanics 2007.
Antanas Lipeika, Joana Lipeikien ? E, Laimutis Telksnys,Development of Isolated Word Speech Recognition System , INFORMATICA, 2002, Vol. 13, No. 1, 37–46
Cuntai GUAN, Yongbin CEBN and Boziu WU, Direct modification on LPC coefficients with application to Speech enhancement and improving the performance of Speech recognition in noise , Proc of IEEE international conference(ICASSP-93) on Acoustics, Speech, and Signal Processing, 1993
Jialong He, Li Liu, and Gunther Palm, On the use of Residual Cepstrum in Speech Recognition, Proc of IEEE international conference(ICASSP-96) on Acoustics, Speech, and Signal Processing, 1996
Xueying Zhang, Yueling Guo, Xuemei Hou, A Speech Recognition Method of Isolated Words Based on Modified LPC Cepstrum, 2007 IEEE International Conference on Granular Computing
S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. ASSP 28 (1980) 357--366.
Deng, D. Yu, and A. Acero, Structured speech modeling, IEEE Transactions on Audio, Speech and Language Processing 2006, Vol. 14, No. 5, pp. 1492-1504
Joseph Picone, Signal Modeling Techniques in Speech Recognition, Proc IEEE June199
Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques, Journal of Computing, Volume 2, Issue 3, March 2010, ISSN 2151-9617
Akira Sasou, Futoshi Asano, Satoshi Nakamura, Kazuyo Tanaka, HMM-based noise-robust feature compensation, Speech Communication , Vol 48 (2006)
Siva Prasad Nandyala , T. Kishore Kumar, Real Time Isolated Word Recognition using Adaptive Algorithm, Proc of International Conference on Industrial and Intelligent Information (ICIII 2012)
Hubert Wassner and Gerard Chollet, New time frequency derived cepstral coefficients for automatic speech recognition, Proc of ICSLP 1996
D. O'Shaughnessy, Speech Communications, IEEE Press, New York, 2000
H Hermansky, Perceptual linear predictive(PLP) analysis of speech, J. Acoustic Society of America,Vol 87,No 4, 1990
N Morgan, H Hermansky, H Bourlard, P Kohn, C Wooters, Continuous speech recognition using PLP analysis with multilayer perceptron, ICASSP'91, Proceedings of IEEE International Conference on Acoustic,Speech and Signal processing 1991.
Corneliu Octavian DUMITRU , Inge GAVAT, A comparative study of feature extraction methods applied to continuous speech recognition in Romanian language, Proc of 48th International Symposium ELMAR-2006, 07-09 June 2006
Cini Kurian, Kannan Balakrishnan, Malayalam isolated digit recognition using HMM and PLP cepstral coefficient, International Journal of Advanced Information Technology (IJAIT), Vol 1,No 5 2011
A Revathi,R Ganapathy,Y Venkatramani, Text independent speaker recognition and Speaker independent speech recognition using iterative clustering approach. International Journal of Computer Science and Information Technology, Vol 1, No 2,2009.
Hynek Hermansky, RASTA processing of speech, IEEE Transaction of Speech And Audio Processing, Vol 2, No 4, 1994
Katrin Kirchhoff, Gernot A. Fink , Gerhard Sagerer, Combining acoustic and articulatory feature information for robust speech recognition, Speech Communication , Vol 37, 2002
Brian ED Kingsbury, N Morgan, Steven Greenberg, Robust speech recognition using the modulation spectrogram, Speech Communication, Vol 25, 1998
T Schurer, An experimental comparison of different feature extraction and classification methods for telephone speech, Proc of IEEE Workshop on Interactive Voice Technology for communications Applications, 1994
Brian ED Kingbury,N Morgan, Recognizing reverberant speech with RASTA-PLP, Proc of ICASSP-1997
Z. Hachkar, B. Mounir, A. Farchi, J. El Abbadi, Comparison of MFCC and PLP Parameterization In pattern recognition of Arabic Alphabet Speech, Canadian Journal on Artificial Intelligence, Machine Learning & Pattern Recognition Vol. 2, No. 3, April 2011
Daniel Jurafsky, James H Martin, Speech and Language processing, Pearson education Inc.
Douglas O'Shaughnessy, Automatic speech recognition: History, methods and challenges, Pattern Recognition ,Vol 41 , ELSEVIER, 2008
Spector, Simon Kinga and Joe Frankel, Recognition, Speech production knowledge in automatic speech recognition,Journal of Acoustic Society of America, 2006
M. A Zissman, Predicting,diagonosing and improving automatic Language identification performance,Proc. Eurospeech97,Sept. 1997 ol. 1,pp. 51-54 1989.
R. P. Lippmann, Review of Neural Networks for Speech Recognition, Readings in Speech Recognition, A. Waibel and K. F. Lee, Editors, Morgan Kaufmann Publishers, 1990
W. S. McCullough and W. H. Pitts, A Logical Calculus of Ideas Immanent in Nervous Activity, Bull. Math Biophysics, Vol. 5,1943.
DongSuk Yuktt and James Flanagan, Telephone Speech Recognition Using Neural Networks and Hidden Markov Models, , Proceedings of IEEE International Conference on Acoustic,Speech and Signal processing 1999
Judith Justin, Ila Vennila, Performance of Speech Recognition using Artificial Neural Network and Fuzzy Logic, European Journal of Scientific Research,Vol. 66 No. 1 (2011)
H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. ASSP 26 (1978)
H. Silverman, D. Morgan, The application of dynamic programming to connected speech segmentation, IEEE ASSP Mag. 7 (3) (1990)
M. De Wachter, M. Matton, K. Demuynck, P. Wambacq, R. Cools, D. Van Compernolle, Template-based continuous speech recognition, IEEE Trans. ASLP, 15 (2007) 1377--1390.
Bharti W. Gawali, Santosh Gaikwad, Pravin Yannawar, Suresh C. Mehrotra,Marathi Isolated Word Recognition System using MFCC and DTW Features, Proc. of Int. Conf. on Advances in Computer Science 2010
L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77 (2) (1989) 257--286.
X. Huang, Y. Ariki, M. Jack, Hidden Markov Models for Speech Recognition,Edinburgh University Press, Edinburgh, 1990.
R. K. Aggarwal and M. Dave,Using Gaussian Mixtures for Hindi Speech Recognition System, International Journal of Signal Processing, Image Processing and Pattern Recognition Vol. 4, No. 4, December, 2011
Akira Sasou , Futoshi Asano , Satoshi Nakamura , Kazuyo Tanaka, HMM-based noise-robust feature compensation, Speech Communication, Vol 48 (2006)
Carsten Meyer , Hauke Schramm, Boosting HMM acoustic models in large vocabulary speech recognition, Speech Communication, Vol. 48 (2006)
Jen-Tzung Chien, Chuang-Hua Chueh, Joint acoustic and language modeling for speech recognition, Speech Communication 52 (2010)
S. Young, et. al. , the HTKBook, http://htk. eng. cam. ac. uk/
William B. Cavnar and John M. Trenkle, N-Gram-Based Text Categorization, In Proc. of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, 1994
Rajat Mathur, Babita, Abhishek Kansal,Domain Specific Speaker Independent Continuous Speech Recognition Using Julius, Proceedings of ASCNT–2010, CDAC, Noida, India, pp. 55- 60.
F. Reena Sharma and S. Geetanjali Wasson, Speech Recognition and Synthesis Tool: Assistive Technology for Physically Disabled Persons, Proc of International Journal of Computer Science and Telecommunications, Vol 3, Issue 4, pp. 86-91,April 2012.

Index Terms

Computer Science

Information Sciences

Keywords

Dynamic Time Warping Feature Extraction Hidden Markov Model Neural Network Speech Recognition