CFP last date
20 January 2025
Reseach Article

GMM based Language Identification using MFCC and SDC Features

by Kshirod Sarmah, Utpal Bhattacharjee
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 85 - Number 5
Year of Publication: 2014
Authors: Kshirod Sarmah, Utpal Bhattacharjee
10.5120/14840-3103

Kshirod Sarmah, Utpal Bhattacharjee . GMM based Language Identification using MFCC and SDC Features. International Journal of Computer Applications. 85, 5 ( January 2014), 36-42. DOI=10.5120/14840-3103

@article{ 10.5120/14840-3103,
author = { Kshirod Sarmah, Utpal Bhattacharjee },
title = { GMM based Language Identification using MFCC and SDC Features },
journal = { International Journal of Computer Applications },
issue_date = { January 2014 },
volume = { 85 },
number = { 5 },
month = { January },
year = { 2014 },
issn = { 0975-8887 },
pages = { 36-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume85/number5/14840-3103/ },
doi = { 10.5120/14840-3103 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:01:42.020989+05:30
%A Kshirod Sarmah
%A Utpal Bhattacharjee
%T GMM based Language Identification using MFCC and SDC Features
%J International Journal of Computer Applications
%@ 0975-8887
%V 85
%N 5
%P 36-42
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Language Identification (LID) is one of the most popular areas of research in speech signal processing. Now a day's lots of approaches have been used to improve performance of LID system which includes Parallel Phone Recognition Language Modeling (PPRLM), Support Vector Machine (SVM) and general Gaussian Mixture Model (GMM) etc. The state-of-art LID system has been utilised lots of feature vectors like LPCC, MFCC, SDC and prosodic. Although fusion of prosodic features with MFCC features shows some improvement in the performance of the LID system. But still it is not sufficient. In this paper, a baseline system for the LID system in multilingual environments has been developed using GMM as a classifier and MFCC combined with Shifted-Delta-Cepstral (SDC) as front end processing feature vectors. In this works, we used the Arunachali Language Speech Database (ALS-DB), a multilingual and multichannel speech corpus which was recently collected from the four local languages namely Adi, Apatani, Galo and Nyishi in Arunachal Pradesh including Hindi and English as secondary languages. The performance of the LID system has been improved by combing MFCC and SDC features than its individual performances. The minimum ERR rates for the features MFCC and SDC individually are 19. 70% and 11. 83% respectively while minimum ERR rate for the combined features both MFCC and SDC is 6. 40%. Approximately 15. 00% and 6. 00% of performance of the LID system has been improved while using the combining features of MFCC with SDC over the baseline systems that using MFCC and SDC features in individual respectively.

References
  1. Adda-Decker, M. and Lamel, L. " The use of lexica in automatic speech recognition". In Lexicon Development for Speech and Language Processing, Ed. F. Van Eynde and D. Gibbon. Kluwer, 2000.
  2. Muthusamy, Y. K. , Barnard, and Cole, R. A. , Automatic Language Identification: A Review/Tutorials. Signal Processing Magazine, IEEE, Vol 11, Issue. 4. pages. 33-41, 1994.
  3. Elan Noor and Hagal Aronowitz, 2006. Efficient Language Identification using Anchor models and Support Vector Machines, Speaker and Language Identification Workshop. Proc. IEEE Odyssey.
  4. Utpal Bhattacharjee and Kshirod Sarmah, 2012. A Multilingual Speech Database for Speaker Recognition, Proc. IEEE ISPCC 2012.
  5. Utpal Bhattacharjee and Kshirod Sarmah, 2013. Language Identification system using MFCC and Prosodic features, Proc. IEEE IISP 2013.
  6. Zissman, M. A. 1993. Automatic language identification using gaussian mixture and hidden markov models. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pages 399–402, Minneapolis, Minnesota (April 1993).
  7. House, A. S. and Neuburg, E. P. 1977. Toward automatic identification of the language of an utterance. i. preliminary methodological considerations. In Proc. of the Journal of Acoustic Society of America, volume 62(3), pages 708–713, (September 1977).
  8. Torres-Carrasquillo, P. E. Singer, M. A. Kohler, R. Green, D. A. Reynolds, and J. R. Deller Jr. 2002. Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In Proc. IEEE Int. Conf. on Spoken Languag Processing (ICSLP), pages 719–722, Denver, Colorado.
  9. Singer, E. P. Torres-Carrasquillo, T. Gleason, W. Campbell, and D. A. Reynolds 2003. Acoustic, phonetic and discriminative approaches to automatic language identification. In Proc. European Conference on Speech Communication and Technology (Eurospeech), pages 1345–1348, Geneva, Switzerland, (September 2003).
  10. Reynolds, D. A. 1997. Comparison of background normalization methods for textindependent speaker verification. In Proc. European Conference on Speech Communication and Technology (Eurospeech), pages 963–966, Rhodes, Greece, (September 1997).
  11. E. Wong and S. Sridharan,2002. Methods to improve gaussian mixture model based language identification system. In Proc. IEEE Int. Conf. on Spoken Language Processing (ICSLP), page Fusion of output scores on language identification system, Denver, Colorado.
  12. M. A. Zissman. Comparison of four approaches to automatic language identification of telephone speech. In IEEE Transactions on Speech and Audio Processing, volume 4, pages 31–44, January 1996.
  13. Cohen, J. Kamm, T. and Andreou, A. G. 1995. Vocal tract normalization in speech recognition: Compensating for systematic speaker variability. In Proc. of the Journal of the Acoustical Society of America, number 97, pages 31–44.
  14. P. Matejka, L. Burget, P. Schwarz, and J. Cernoky. BRNO university of technology system for NIST 2005 language recognition evaluation. In Proc. IEEE Speaker and Language Recognition Workshop (Odyssey), San Juan.
  15. Vair,C. Colibro,D. . Castaldo, F. Dalmasso, E. and Laface, P. Channel factors compensation in model and feature domain for speaker recognition. In Proc. IEEE Speaker and Language Recognition Workshop (Odyssey), pages 1–6, San Juan.
  16. Kenny, P. and Dumouchel, P. 2004. Disentangling speaker and channel effects in speaker verification. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pages 37–40, Montreal, Canada.
  17. L. Burget, P. Matejka, P. Schwarz, O. Glembek, and J. Cernoky. "Analysis of feature extraction and channel compensation in gmm speaker recognition system" In IEEE Transactions on Audio, Speech and Language Processing, volume 15, pages 1979– 1986, 2007.
  18. Arunachal Pradesh, http://en. wikipedia. org/wiki/Arunachal_Pradesh.
  19. Young, S. et al. " The HTK Book", Version 3. 0 July 2001.
  20. Torres-Carrasquillo, P. A. 2002 Language identification using Gaussian mixture models, PhD, thesis, Michigan State University.
  21. Reynolds, D. A. Gaussian Mixture Models, MIT Lincoln Laboratory, 244 wood St. Lexinton, MA 02140,USA.
Index Terms

Computer Science
Information Sciences

Keywords

Language Identification GMM MFCC SDC