CFP last date
20 January 2025
Reseach Article

Issues in developing LVCSR System for Dravidian Languages: An Exhaustive Case Study for Tamil

by Bharadwaja Kumar G, Melvin Jose Johnson Premkumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 70 - Number 19
Year of Publication: 2013
Authors: Bharadwaja Kumar G, Melvin Jose Johnson Premkumar
10.5120/12172-8180

Bharadwaja Kumar G, Melvin Jose Johnson Premkumar . Issues in developing LVCSR System for Dravidian Languages: An Exhaustive Case Study for Tamil. International Journal of Computer Applications. 70, 19 ( May 2013), 1-7. DOI=10.5120/12172-8180

@article{ 10.5120/12172-8180,
author = { Bharadwaja Kumar G, Melvin Jose Johnson Premkumar },
title = { Issues in developing LVCSR System for Dravidian Languages: An Exhaustive Case Study for Tamil },
journal = { International Journal of Computer Applications },
issue_date = { May 2013 },
volume = { 70 },
number = { 19 },
month = { May },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume70/number19/12172-8180/ },
doi = { 10.5120/12172-8180 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:33:14.829054+05:30
%A Bharadwaja Kumar G
%A Melvin Jose Johnson Premkumar
%T Issues in developing LVCSR System for Dravidian Languages: An Exhaustive Case Study for Tamil
%J International Journal of Computer Applications
%@ 0975-8887
%V 70
%N 19
%P 1-7
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Research in the area of Large Vocabulary Continuous Speech Recognition (LVCSR) for Indian languages has not seen the level of advancement as in English since there is a dearth of large scale speech and language corpora even today. Tamil is one among the four major Dravidian languages spoken in southern India. One of the characteristics of Tamil is that it is morphologically very rich. This quality poses a great challenge for developing LVCSR systems. In this paper, we have analyzed a Tamil corpora of 10 million words and have exhibited the results of a type-token analysis which implies the morphological richness of Tamil. We have demonstrated a grapheme-to-phoneme (G2P) mapping system for Tamil which gives an accuracy of 99. 56%. We have shown the impact of important parameters such as absolute beam width, language weight, number of gaussians and the number of senones on speech recognition accuracy for limited vocabulary (3k). We have presented the results of large open vocabulary speech recognition task for vocabulary sizes of 30k, 60k and 100k on the speaker independent task. The Out Of Vocabulary (OOV) rates are 20. 2%, 15. 8%, 12. 8% respectively. The accuracies are 43. 59%, 47. 11% and 43. 52% respectively.

References
  1. Kumar, M. , Rajput, N. , Verma, A. (2004). A largevocabulary continuous speech recognition system for Hindi. IBM Journal of Research and Development, 48(5/6):703-710.
  2. Kumar, R. , Kishore, S. , Gopalakrishna, A. , Chitturi, R. ,Joshi, S. , Singh, S. , Sitaram, R. (2005). Development of Indian language speech databases for large vocabulary speech recognition systems. International Conference on Speech and Computer (SPECOM) Proceedings.
  3. Banerjee, P. , Garg, G. , Mitra, P. , Basu, A. (2008). Application of triphone clustering in acoustic modeling for continuous speech recognition in Bengali. ICPR-2008 Proceedings. pp. 1-4.
  4. Kumar, C. S. , Wei, F. S. (2003). A bilingual speech recognition system for English and Tamil. ICICS-PCM 2003 Proceedings, Singapore.
  5. Thangarajan, R. , Natarajan, A. M. , Selvam, M. (2009). Syllable modeling in continuous speech recognition for Tamil language. International Journal of Speech Technology, 12(1):47-57.
  6. Thangarajan, R. , Natarajan, A. M. , Selvam, M. (2008). Word and triphone based approaches in continuous speech recognition for Tamil language. WSEAS Trans. Sig. Proc, 4(3):76-85.
  7. Sarada, G. L. , Lakshmi, A. , Murthy, H. A. , Nagarajan, T. (2009). Automatic transcription of continuous speech into syllable-like units for Indian languages. Sadhana, 34(2):221-233.
  8. Chandrasekar, M. , Ponnavaikko, M. (2008). Tamil speech recognition: a complete model. Electronic Journal on Technical Acoustics.
  9. Saraswathi, S. , Geetha, T. V. (2010). Design of language models at various phases of Tamil speech recognition system. International Journal of Engineering, Science and Technology, 2(5):244-257.
  10. Saraswathi, S. , Geetha, T. V. (2007). Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system. ACM Transactions on Asian Language Information Processing (TALIP), 6(3):9.
  11. Kumar, G. B. , Murthy, K. N. , Chaudhuri, B. B. (2007). Statistical analysis of Telugu text corpora. IJDL, 36(2):71-99.
  12. Kumar, G. B. (2007). UCSG Shallow Parser: A Hybrid Architecture for a Wide Coverage Natural Language Parsing System. PhD thesis, Department of Computer & Information Sciences, University of Hyderabad, Hyderabad, India.
  13. Bisani, M. , Ney, H. (2008). Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication, 50(5):434-451.
  14. J. Novak, D. Yang, N. Minematsu, K. Hirose, "Initial and Evaluations of an Open Source WFST-based Phoneticizer", The University of Tokyo, Tokyo Institute of Technology.
  15. D. Yang, et. al. , "Rapid development of a G2P system based on WFST framework", ASJ 2009 Autumn session, pp. 111- 112, 2009.
  16. Group, R. (2008). Robust Group Tutorial. http://www. speech. cs. cmu. edu/sphinx/tutorial. html.
  17. Clarkson, P. , Rosenfeld, R. (1997). Statistical language modeling using the CMU-Cambridge toolkit. ESCA Eurospeech Proceedings, 2707-2710.
  18. Lamere, P. , Kwok, P. , Walker, W. , Gouva, R. , Singh, R. , Raj, B. , Wolf, P. (2003). Design of the CMU sphinx-4 decoder. 8th European Conf. on Speech Communication and Technology (EUROSPEECH) Proceedings.
  19. Walker, W. , Lamere, P. , Kwok, P. , Raj, B. , Singh, R. , Gouvea, E. , Wolf, P. , Woelfel, J. (2004). Sphinx-4: A flexible open source framework for speech recognition. Technical report, Sun Microsystems.
  20. Mosur, R. (2008). Sphinx-3 s3. X Decoder. http://www. cs. cmu. edu/˜archan/ s info/Sphinx3/doc/s3 description. html.
  21. C. S. Kumar, Shunmugom V. , Udhyakumar Nallsamy and Srinivasan R. , "Automatic grapheme to phoneme converter for Tamil using rules", in proceedings International Conference On Speech and Language Technology, 2004.
  22. Udhyakumar Nallasamy, C. S. Kumar, Srinivasan R. and Swaminathan R. , "Decision tree learning for automatic grapheme to phoneme conversion for Tamil", in proceedings SPECOM, 2004.
  23. S. Hahn , P. Vozila , M. Bisani, "Comparison of Graphemeto- Phoneme Methods on Large Pronunciation Dictionaries and LVCSR Tasks", in Proceedings of Interspeech, 2012.
  24. C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri, "OpenFst: a general and efficient weighted finitestate transducer library", Prague, Czech Republic, Jul. 2007, pp. 11?23.
  25. Vesa Siivola, Mathias Creutz and Mikko Kurimo: "Morfessor and VariKN machine learning tools for speech and language technology", Proceedings of the Interspeech, 2008.
  26. Andreas Stolcke, "SRILM - An Extensible Language Modeling Toolkit", in Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado, September 2002.
Index Terms

Computer Science
Information Sciences

Keywords

Speech Recognition Tamil Sphinx Large Vocabulary