We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Call for Paper
December Edition
IJCA solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 20 November 2024

Submit your paper
Know more
Reseach Article

A Novel Speech to Text Converter System for Mobile Applications

by R. Sandanalakshmi, V. Martina Monfort, G. Nandhini
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 73 - Number 19
Year of Publication: 2013
Authors: R. Sandanalakshmi, V. Martina Monfort, G. Nandhini
10.5120/12991-9886

R. Sandanalakshmi, V. Martina Monfort, G. Nandhini . A Novel Speech to Text Converter System for Mobile Applications. International Journal of Computer Applications. 73, 19 ( July 2013), 7-13. DOI=10.5120/12991-9886

@article{ 10.5120/12991-9886,
author = { R. Sandanalakshmi, V. Martina Monfort, G. Nandhini },
title = { A Novel Speech to Text Converter System for Mobile Applications },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 73 },
number = { 19 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 7-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume73/number19/12991-9886/ },
doi = { 10.5120/12991-9886 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:40:30.385523+05:30
%A R. Sandanalakshmi
%A V. Martina Monfort
%A G. Nandhini
%T A Novel Speech to Text Converter System for Mobile Applications
%J International Journal of Computer Applications
%@ 0975-8887
%V 73
%N 19
%P 7-13
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper an efficient implementation of speech to text converter for mobile application is presented. The prime motive of this work is to formulate a system which would give optimum performance in terms of complexity, accuracy, delay and memory requirements for mobile environment. The speech to text converter consists of two stages namely front-end analysis and pattern recognition. The proposed method uses effective methods for voice activity detection in preprocessing, feature extraction and recognizer. The energy of high frequency part is separately considered as zero crossing rate to differentiate noise from speech. RASTAPLP feature extraction method is used in which RASTA filter suppresses the spectral components that change more slowly or quickly than the typical range of change of speech thus avoiding unnecessary information in the extracted features. In the proposed system Generalized Regression Neural Network is used as recognizer in which syllable level recognition is used that reduces memory requirement and complexity for mobile application. Thus a small database containing all possible syllable pronunciation of the user is sufficient to give recognition accuracy closer to 100%. Reduction in 50% with respect to delay and memory requirement is proved in the proposed system. Thus the proposed technique entertains realization of real time speaker dependant applications like mobile phones, PDAs etc.

References
  1. M. Marzinzik and B. Kollmeir, "Speech Pause Detection For Noise Spectrum Estimation By Tracking Power Envelope Dynamics", IEEE Transactions On Speech And Audio Processing, Barcelona, Vol. 10, No. 2, pp. 109-117, Feb. 2002.
  2. M. H. Moattar and M. M. Homayounpour, "A Simple But Efficient Real-Time Voice Activity Detection Algorithm", 17th European Signal Processing Conference (EUSIPCO 2009), Glasgow, Scotland, pp. 2549-2553, August 24-28, 2009.
  3. Namgook Cho and Eun-Kyoung Kim, "Enhanced Voice Activity Detection Using Acoustic Event Detection And Classification", IEEE Transactions On Consumer Electronics, Vol. 57, No. 1 pp. 196-202, Feb. 2011,.
  4. Varela, Indra S. A. , Madrid, Spain San-Segundo. R. , Hernandez, L. A. , "Robust speech detection for noisy environments", IEEE Aerospace and Electronic Systems Magazine, vol. 26, Issue. 11, pp. 16 - 23, Nov. 2011.
  5. Hynek Hermansky and Nelson Morgan, "RASTA Processing Of Speech", IEEE Transactions On Speech and Audio Processing, Vol. 2, No. 4, pp. 578-589, October 1994.
  6. Chin Luh Tan and Adznan Jantan, "Digit Recognition Using Neural Networks", Malaysian Journal Of Computer Science, Vol. 17, No. 2, pp. 40-54, Dec. 2004.
  7. R. L. K. Venkateswarlu, R. Vasantha Kumari and G. Vani Jayasri, "Speech Recognition Using Radial Basis Function Neural Network", 3rd International Conference On Electronics Computer Technology (ICECT) 2011, Vol. 3, pp. 441-445, 2011.
  8. Wouter Gevaert, Georgi Tsenov and Valeri Mladenov, "Neural Networks Used For Speech Recognition", Journal Of Automatic Control, University of Belgrade,Vol. 20, pp. 1-7, 2010.
  9. L. K. V. Revada, V. K. Rambatla and K. V. N. Ande, "A Novel Approach To Speech Recognition By Using Generalized Regression Neural Networks", IJCSI International Journal Of Computer Science Issues, Vol. 8, Issue 2, pp. 484-489, March 2011.
  10. Abderrahmane Amrouche and Jean Michel Rouvaen, "Efficient System For Speech Recognition Using General Regression Neural Network", World Academy Of Science, Engineering And Technology, Vol. 1, N0. 6, pp. 271-277, 2006.
  11. George E. Dahl, Dong Yu, Li Deng, and Alex Acero, "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition", IEEE Trans. Audio Speech Language Process. , vol. 20, no. 3, Jan. 2012.
  12. N. Morgan and H. Hermansky, "RASTA extensions, Robustness to additive and convolutional noise," in Proceeding of workshop speech processing Adverse Environments, Cannes, France,Nov. 1992.
  13. N. Morgan and H. Hermansky,H. Boulard,P. Kohn, and C. Wooters, " Continuous speech recognition using PLP analysis with multilayer perceptrons," in proc. IEEE Int. Conf. Acoust. , Speech Signal Processing, Toronto ,Canada,1991, pp. 49 -52.
Index Terms

Computer Science
Information Sciences

Keywords

Preprocessing Voice activity detection RASTAPLP Neural network syllable based recognition