CFP last date
20 January 2025
Reseach Article

Phonotactic Model for Spoken Language Identification in Indian Language Perspective

by Sanghamitra Mohanty
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 19 - Number 9
Year of Publication: 2011
Authors: Sanghamitra Mohanty
10.5120/2389-3164

Sanghamitra Mohanty . Phonotactic Model for Spoken Language Identification in Indian Language Perspective. International Journal of Computer Applications. 19, 9 ( April 2011), 18-24. DOI=10.5120/2389-3164

@article{ 10.5120/2389-3164,
author = { Sanghamitra Mohanty },
title = { Phonotactic Model for Spoken Language Identification in Indian Language Perspective },
journal = { International Journal of Computer Applications },
issue_date = { April 2011 },
volume = { 19 },
number = { 9 },
month = { April },
year = { 2011 },
issn = { 0975-8887 },
pages = { 18-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume19/number9/2389-3164/ },
doi = { 10.5120/2389-3164 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:06:32.285349+05:30
%A Sanghamitra Mohanty
%T Phonotactic Model for Spoken Language Identification in Indian Language Perspective
%J International Journal of Computer Applications
%@ 0975-8887
%V 19
%N 9
%P 18-24
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Indian Languages are Indo-Aryan being influenced by Sanskrit or Dravidian being influenced by Tamil. Dravidian Languages have the influence of Sanskrit also. All Indian Languages have the influence of Pali language for which the graphemes are being influenced Brahmi. All the Indian languages are phonetic in nature. Every Indian language has its distinctive phone sets. North Indian languages are Indo- Aryan and South Indian Languages are Dravidian. Considering their respective Phonetic properties during speaking we have tried to consider the special CV behaviour of the language in their syllables and are able to identify the Language analysing it with the limited training data set available using the SVM Classifier. During this process we have analysed the PPR Language Modelling concept for four major Indian languages like Hindi, Bengali, Oriya, and Telugu and the results are quite appreciable.

References
  1. X. Huang, et al, “Spoken Language Processing”, Prentice Hall PTR, NJ, 2001.
  2. Jelinek. F, “Statistical Methods for Speech Recognition”, MIT Press, Cambridge, 1997.
  3. Rabiner, L.R, Schafer, R.W, “Digital Processing of Speech Signals”, Pearson education, 1st Edition, 2004.
  4. O’Shaughnessy, D, “Speech Communications Human and Machine”, Universities Press, 2nd Edition, 2001.
  5. Mohanty, S. and Swain , B. K. “Language Identification using Support Vector Machine”, Proceedings of O-COCOSDA-2010, Nepal, 2010.
  6. Mohanty, S., Bhattacharya, S., Bose, S., Swain, S., “An Approach To Parametric based Mood Analysis In Oriya Speech Processing” ,Proceedings of the International Symposium Frontiers of Research on Speech and Music(FRSM-2005).
  7. M.A. Zissman, ”Comparison of Four Approaches to Automatic Language Identification of Telephone speech, IEEE Transactions on Speech and Audio Processing”,1996.
  8. Navratil. J, ”Spoken Language Recognition - A Step Toward Multilinguality in Speech Processing”, IEEE Transactions on Speech and Ausio Processing, Sept. 2001.
  9. Muthusamy, Y.K, et al, ”Reviewing Automatic Language Identification”, IEEE Signal Processing Magazine, 1994.
  10. Schultz.T, et al, ”Language Independent and Language Adaptive Large Vocabulary Speech Recognition”, Proc. EuroSpeech, 1999, Hungary.
  11. Schultz, T and Kirchhoff, K “Multilingual Speech Processing”, Academic Press, 2006.
  12. Mak. B, et al, “Multilingual Speech Recognition with Language Identification”, Proc. ICSLP 2002.
  13. Ken Stevens, “Acoustic Phonetics”, MIT Press, Cambridge, MA, 1999.
  14. V. Vapnik. “The Nature of Statistical Learning Theory”. Springer-Verlag,1995.
  15. R. Duda, P. Hart, and D.Stork, “Pattern Classification”, Wiley, New York, 2001.
  16. N. Smith, M. Niranjan, “Data-dependent kernels in SVM classification of speech patterns”, in: Proceedings of the International Conference on Spoken Language Processing (ICSLP), Vol. 1, Beijing, China, 2000.
  17. William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, “Support vector machines for speaker and language recognition” Computer Speech and Language, vol. 20, no. 2-3, 2006.
  18. OSU-SVM website: http://svm.sourceforge.net/license.shtml
  19. Praat software website: http://www.fon.hum.uva.nl/praat/.
  20. A. Montero-Asenjo, D.T. Toledano, J. Gonzalez- Dominguez, J. Gonzalez-Rodriguez, and J. Ortega- Garcia, “Exploring PPRLM performance for NIST 2005 language recognition evaluation,” in IEEE Odyssey 2006:The Speaker and Language Recognition Workshop, 2006.
  21. Keshet,J., Bengio, S. “Automatic Speech and Speaker Recognition Large Margin and Kernel Methods”, John Wiley and Sons, Ltd, Publication,1st edition, 2009.
  22. Pavel Matejka, Petr Schwarz, Jan Cernock, and Pavel Chytil, “Phonotactic language identification using high quality phoneme recognition,” in Interspeech, 2005.
Index Terms

Computer Science
Information Sciences

Keywords

LID Indian Language Support Vector Machine Phonotactic