CFP last date
20 January 2025
Reseach Article

Advances in Voice Enabled Human Machine Communication

by Shweta Sinha, S. S Agrawal, Aruna Jain
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 60 - Number 17
Year of Publication: 2012
Authors: Shweta Sinha, S. S Agrawal, Aruna Jain
10.5120/9785-4356

Shweta Sinha, S. S Agrawal, Aruna Jain . Advances in Voice Enabled Human Machine Communication. International Journal of Computer Applications. 60, 17 ( December 2012), 27-33. DOI=10.5120/9785-4356

@article{ 10.5120/9785-4356,
author = { Shweta Sinha, S. S Agrawal, Aruna Jain },
title = { Advances in Voice Enabled Human Machine Communication },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 60 },
number = { 17 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 27-33 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume60/number17/9785-4356/ },
doi = { 10.5120/9785-4356 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:07:13.505430+05:30
%A Shweta Sinha
%A S. S Agrawal
%A Aruna Jain
%T Advances in Voice Enabled Human Machine Communication
%J International Journal of Computer Applications
%@ 0975-8887
%V 60
%N 17
%P 27-33
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The inherent advantage of speech communication due to its variability, convenience and speed along with our increasing requirements to communicate with machines has driven the attention of researchers towards mechanical recognition of speech. Technological advancements and improvements in the fundamental approaches have shown a successful transition from small vocabulary isolated word recognition to large vocabulary continuous speech recognition. Even after years of research and development the accuracy of automatic speech recognition remains one of the major challenges. Design of speech recognition system requires careful selection of feature extraction technique and modeling approach to cover the challenges faced due to variability of speech-speaker characteristic, storage space and processing speed requirements. In this paper an effort has been made to highlight the progress made so far for mechanizing the recognition of speech along with the major challenges in this field. Authors have also presented a brief description of voice enabled service for common people. The objective of this paper is to summarize some of the well known methods used at various stages of speech recognition system along with their benefits and limitations.

References
  1. J. L. Flanagan, Speech Analysis, Synthesis and Perception, Second Edition, Springer-Verlag, 1972.
  2. B. H. Juang, Lawrence R. Rabiner, Automatic Speech Recognition – A Brief History of the Technology Development, Elsevier Encyclopedia of Language and Linguistics, Second Edition, 2005
  3. V. Zue et al, Jupiter: A Telephone-Based Conversational Interface for Weather Information, IEEE Trans. On Speech and Audio Processing, Vol. X, pp. 100-112, Jan. 2000.
  4. J. Glass and E. Weinstein, Speech Builder: Facilitating Spoken Dialogue System Development, 7th European Conf. on Speech Communication and Technology, Aalborg Denmark, Sept. 2001.
  5. Sadaoki Furui, Recent Advances in Spontaneous Speech Recognition and Understanding, ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, 2003
  6. K. -F. Lee, Large-vocabulary speaker-independent continuous speech recognition: The Sphinx system, Ph. D. Thesis, Carnegie Mellon University, 1988.
  7. Hong C. Leung , Benjamin Chigier and James R. Glass, A comparative study of signal representations and classification techniques for speech recognition, Proc of IEEE International Conference(ICASSP-93) on Acoustics, Speech, and Signal Processing, 1993
  8. John Makhoul, Linear Prediction: A Tutorial Review ,Proc of IEEE, Vol 63, no 4 April 1975
  9. B S Atal and S L Hanauer, Speech Analysis and Synthesis by Linear Prediction of the Formant Frequencies, Electronics and Communications in Japan, Vol. 53 A, pp. 36-43, 1970
  10. F. Itakura, Minimum Prediction Residual Principle Applied to Speech Recognition, IEEE Trans. Acoustics, Speech and Signal Proc. , Vol. ASSP-23, pp. 57-72, Feb. 1975.
  11. Cristhian Manuel Durán Acevedo, Martín Gallo Nieves, Integrated System Approach for the Automatic Speech Recognition using Linear predict Coding and Neural Networks, Fourth Congress of Electronics, Robotics and Automotive Mechanics 2007.
  12. Antanas Lipeika, Joana Lipeikien ? E, Laimutis Telksnys,Development of Isolated Word Speech Recognition System , INFORMATICA, 2002, Vol. 13, No. 1, 37–46
  13. Cuntai GUAN, Yongbin CEBN and Boziu WU, Direct modification on LPC coefficients with application to Speech enhancement and improving the performance of Speech recognition in noise , Proc of IEEE international conference(ICASSP-93) on Acoustics, Speech, and Signal Processing, 1993
  14. Jialong He, Li Liu, and Gunther Palm, On the use of Residual Cepstrum in Speech Recognition, Proc of IEEE international conference(ICASSP-96) on Acoustics, Speech, and Signal Processing, 1996
  15. Xueying Zhang, Yueling Guo, Xuemei Hou, A Speech Recognition Method of Isolated Words Based on Modified LPC Cepstrum, 2007 IEEE International Conference on Granular Computing
  16. S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. ASSP 28 (1980) 357--366.
  17. Deng, D. Yu, and A. Acero, Structured speech modeling, IEEE Transactions on Audio, Speech and Language Processing 2006, Vol. 14, No. 5, pp. 1492-1504
  18. Joseph Picone, Signal Modeling Techniques in Speech Recognition, Proc IEEE June199
  19. Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques, Journal of Computing, Volume 2, Issue 3, March 2010, ISSN 2151-9617
  20. Akira Sasou, Futoshi Asano, Satoshi Nakamura, Kazuyo Tanaka, HMM-based noise-robust feature compensation, Speech Communication , Vol 48 (2006)
  21. Siva Prasad Nandyala , T. Kishore Kumar, Real Time Isolated Word Recognition using Adaptive Algorithm, Proc of International Conference on Industrial and Intelligent Information (ICIII 2012)
  22. Hubert Wassner and Gerard Chollet, New time frequency derived cepstral coefficients for automatic speech recognition, Proc of ICSLP 1996
  23. D. O'Shaughnessy, Speech Communications, IEEE Press, New York, 2000
  24. H Hermansky, Perceptual linear predictive(PLP) analysis of speech, J. Acoustic Society of America,Vol 87,No 4, 1990
  25. N Morgan, H Hermansky, H Bourlard, P Kohn, C Wooters, Continuous speech recognition using PLP analysis with multilayer perceptron, ICASSP'91, Proceedings of IEEE International Conference on Acoustic,Speech and Signal processing 1991.
  26. Corneliu Octavian DUMITRU , Inge GAVAT, A comparative study of feature extraction methods applied to continuous speech recognition in Romanian language, Proc of 48th International Symposium ELMAR-2006, 07-09 June 2006
  27. Cini Kurian, Kannan Balakrishnan, Malayalam isolated digit recognition using HMM and PLP cepstral coefficient, International Journal of Advanced Information Technology (IJAIT), Vol 1,No 5 2011
  28. A Revathi,R Ganapathy,Y Venkatramani, Text independent speaker recognition and Speaker independent speech recognition using iterative clustering approach. International Journal of Computer Science and Information Technology, Vol 1, No 2,2009.
  29. Hynek Hermansky, RASTA processing of speech, IEEE Transaction of Speech And Audio Processing, Vol 2, No 4, 1994
  30. Katrin Kirchhoff, Gernot A. Fink , Gerhard Sagerer, Combining acoustic and articulatory feature information for robust speech recognition, Speech Communication , Vol 37, 2002
  31. Brian ED Kingsbury, N Morgan, Steven Greenberg, Robust speech recognition using the modulation spectrogram, Speech Communication, Vol 25, 1998
  32. T Schurer, An experimental comparison of different feature extraction and classification methods for telephone speech, Proc of IEEE Workshop on Interactive Voice Technology for communications Applications, 1994
  33. Brian ED Kingbury,N Morgan, Recognizing reverberant speech with RASTA-PLP, Proc of ICASSP-1997
  34. Z. Hachkar, B. Mounir, A. Farchi, J. El Abbadi, Comparison of MFCC and PLP Parameterization In pattern recognition of Arabic Alphabet Speech, Canadian Journal on Artificial Intelligence, Machine Learning & Pattern Recognition Vol. 2, No. 3, April 2011
  35. Daniel Jurafsky, James H Martin, Speech and Language processing, Pearson education Inc.
  36. Douglas O'Shaughnessy, Automatic speech recognition: History, methods and challenges, Pattern Recognition ,Vol 41 , ELSEVIER, 2008
  37. Spector, Simon Kinga and Joe Frankel, Recognition, Speech production knowledge in automatic speech recognition,Journal of Acoustic Society of America, 2006
  38. M. A Zissman, Predicting,diagonosing and improving automatic Language identification performance,Proc. Eurospeech97,Sept. 1997 ol. 1,pp. 51-54 1989.
  39. R. P. Lippmann, Review of Neural Networks for Speech Recognition, Readings in Speech Recognition, A. Waibel and K. F. Lee, Editors, Morgan Kaufmann Publishers, 1990
  40. W. S. McCullough and W. H. Pitts, A Logical Calculus of Ideas Immanent in Nervous Activity, Bull. Math Biophysics, Vol. 5,1943.
  41. DongSuk Yuktt and James Flanagan, Telephone Speech Recognition Using Neural Networks and Hidden Markov Models, , Proceedings of IEEE International Conference on Acoustic,Speech and Signal processing 1999
  42. Judith Justin, Ila Vennila, Performance of Speech Recognition using Artificial Neural Network and Fuzzy Logic, European Journal of Scientific Research,Vol. 66 No. 1 (2011)
  43. H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. ASSP 26 (1978)
  44. H. Silverman, D. Morgan, The application of dynamic programming to connected speech segmentation, IEEE ASSP Mag. 7 (3) (1990)
  45. M. De Wachter, M. Matton, K. Demuynck, P. Wambacq, R. Cools, D. Van Compernolle, Template-based continuous speech recognition, IEEE Trans. ASLP, 15 (2007) 1377--1390.
  46. Bharti W. Gawali, Santosh Gaikwad, Pravin Yannawar, Suresh C. Mehrotra,Marathi Isolated Word Recognition System using MFCC and DTW Features, Proc. of Int. Conf. on Advances in Computer Science 2010
  47. L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77 (2) (1989) 257--286.
  48. X. Huang, Y. Ariki, M. Jack, Hidden Markov Models for Speech Recognition,Edinburgh University Press, Edinburgh, 1990.
  49. R. K. Aggarwal and M. Dave,Using Gaussian Mixtures for Hindi Speech Recognition System, International Journal of Signal Processing, Image Processing and Pattern Recognition Vol. 4, No. 4, December, 2011
  50. Akira Sasou , Futoshi Asano , Satoshi Nakamura , Kazuyo Tanaka, HMM-based noise-robust feature compensation, Speech Communication, Vol 48 (2006)
  51. Carsten Meyer , Hauke Schramm, Boosting HMM acoustic models in large vocabulary speech recognition, Speech Communication, Vol. 48 (2006)
  52. Jen-Tzung Chien, Chuang-Hua Chueh, Joint acoustic and language modeling for speech recognition, Speech Communication 52 (2010)
  53. S. Young, et. al. , the HTKBook, http://htk. eng. cam. ac. uk/
  54. William B. Cavnar and John M. Trenkle, N-Gram-Based Text Categorization, In Proc. of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, 1994
  55. Rajat Mathur, Babita, Abhishek Kansal,Domain Specific Speaker Independent Continuous Speech Recognition Using Julius, Proceedings of ASCNT–2010, CDAC, Noida, India, pp. 55- 60.
  56. F. Reena Sharma and S. Geetanjali Wasson, Speech Recognition and Synthesis Tool: Assistive Technology for Physically Disabled Persons, Proc of International Journal of Computer Science and Telecommunications, Vol 3, Issue 4, pp. 86-91,April 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Dynamic Time Warping Feature Extraction Hidden Markov Model Neural Network Speech Recognition