CFP last date
20 December 2024
Reseach Article

Article:Text Independent Speaker Identification with Finite Multivariate Generalized Gaussian Mixture Model and Hierarchical Clustering Algorithm

by V Sailaja, K. Srinivasa Rao, K.V.V.S. Reddy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 11 - Number 11
Year of Publication: 2010
Authors: V Sailaja, K. Srinivasa Rao, K.V.V.S. Reddy
10.5120/1626-2187

V Sailaja, K. Srinivasa Rao, K.V.V.S. Reddy . Article:Text Independent Speaker Identification with Finite Multivariate Generalized Gaussian Mixture Model and Hierarchical Clustering Algorithm. International Journal of Computer Applications. 11, 11 ( December 2010), 25-31. DOI=10.5120/1626-2187

@article{ 10.5120/1626-2187,
author = { V Sailaja, K. Srinivasa Rao, K.V.V.S. Reddy },
title = { Article:Text Independent Speaker Identification with Finite Multivariate Generalized Gaussian Mixture Model and Hierarchical Clustering Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { December 2010 },
volume = { 11 },
number = { 11 },
month = { December },
year = { 2010 },
issn = { 0975-8887 },
pages = { 25-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume11/number11/1626-2187/ },
doi = { 10.5120/1626-2187 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:00:18.161665+05:30
%A V Sailaja
%A K. Srinivasa Rao
%A K.V.V.S. Reddy
%T Article:Text Independent Speaker Identification with Finite Multivariate Generalized Gaussian Mixture Model and Hierarchical Clustering Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 11
%N 11
%P 25-31
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper we propose a Text Independent Speaker Identification with Finite Multivariate Generalized Gaussian Mixture Model with Hierarchical Clustering. Each speaker speech spectra are characterized with a mixture of Generalized Gaussian Distribution includes Gaussian and Laplacian distribution as a particular case. It also includes several of the platy, lepto and meso kurtic shapes of the speech spectra. The speech analysis is done with Mel Frequency Cepstral Coefficients extracted from front end process. Using the EM algorithm the model parameters are estimated. The numbers of acoustic classes associated with each speech spectra are determined through Hierarchical clustering. The performance of the proposed algorithm is studied through experimental evolution with 100 speaker’s data base and found that this algorithm outperforms the existing speaker identification algorithm with GMM. It is also observed that this algorithm performs efficiently even heterogeneous population with small (less than 2 seconds utterances)

References
  1. Akira kuremastu, Mariko Nakano-Mivatake, hector Perez-Meana, Eric Simancas Acevedo (2005), “performance analysis of Gaussian Mixture Model Speaker Recognition Systems with different speaker features,” Electronic Journal Technical Acoustics, 14, ISSN 1819-2408.
  2. A. Higgins, L. Bahler, and J. Porter, (1993) "Voice identification using nearest-neighbor distance measure," in Proc. IEEE ICASSP,, pp. D-375-n-378.
  3. A. B. Poritz, (1982) "Linear predictive hidden Markov models and the speech signal," in Proc. IEEE ICASSP, , pp. 1291-1294.
  4. Ben Gold and Nelson Morgan (2002), “Speech and Audio Processing”, Part IV , Chapter 14,pp 189 – 203 , John willy and sons.
  5. Armando. J et al (2003), “A practical procedure to estimate the shape Parameters in the Generalized Gaussian distribution”.
  6. Choi S et al (2000), “Local Stability Analysis Of Flexible Independent Component Analysis Algorithm”.Proceedings of 2000 IEEE international Conference on Acoustic speech and signal processing, ICA SSP 2000, PP. 3426- 3429.
  7. D. A. Reynolds, R. C. Rose, and M. J. T. Smith, (1992) "PC-based TMS320C30 implementation of the Gaussian mixture model text-independent speaker recognition system," in Proc. Int. Conf. Signal Processing Appl., Tech-no. l, pp. 967-973.
  8. Douglas A. Reynolds, and Richard C. Rose (1995), “Robust Text Independent Speaker Identification using Gaussian Mixture Speaker Model,” IEEE trans. Speech and Audio Processing, vol.3, pp. 72-83.
  9. Doddigntion.G. (2001) Speaker recognition based on idiolectic differences between speakers. In Proc. EUROSPEECH. Aalborg, Denmarks .pp 2521-2524.
  10. D O Shaaughnessy (1987), “Speech Communication Human and machine, Wesley publication, New York.
  11. F. Soong et al., "A vector quantization approach to speaker recognition," in Proc. IEEE ICASSP, 1985, pp. 387-390
  12. H. Gish et a (1985), "Investigation Of Text-dependent Speaker Identification Over Telephone Channels," in Proc. IEEE ICASSP, pp. 379-382.
  13. H. Gish et al.,(1986) "Methods and experiments for text-independent speaker cognition over telephone channels," in Proc. IEEE ICASSP, pp. 865-868.
  14. J . Oglesby and J. Mason (1991), "Radial basis function networks for Speaker Recognition," in Proceedings of IEEE ICASSP, pp. 393-396.
  15. JPool, J.A. du Preez. HF (1999) “Speaker Recognition. Thesis notes, Digital Signal Processing Group, Dept. of Electrical and Electronic Engineering, University of Stellenbosch. Acoustic Society of Japan (E), 20, 4, pp. 281- 291.
  16. J. Market, B. Oshika, and A. Gray, Jr., (1977)"Long- term feature averaging for speaker recognition," IEEE Transaction Acoustic., Speech, Signal Processing, vol. ASSP-25, pp. 330- 337.
  17. Kometsu, M., Mori K., T. Arai, Murahara, Y., (2001) “ Human Language identification with reduced segmental information: Comparison between Monolinguals and bilinguals. In: Proc. EUROSPEECH, vol. 1, Scandinavia, pp.149-152.
  18. K.P.Markov, S.Nakagawa (1999), “Integrating pitch and LPC-residual Information with LPC- Cepstral for Text Independent Speaker Recognition.
  19. Leena Mary and B.Yegnanarayana, (2008) Extraction and representation of prosodic features for language and speaker recognition. speech communication 50 pp. 782, 796.
  20. L. Rudasi and S. A. Zahorian, (1991) "Text- Independent talker identification with neural networks," in Proc. IEEE ICASSP, pp. 389-392.
  21. L.Baum et al., (1970) "A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains," Ann. Math Stat., vol. 41, pp. 164-171.
  22. Md M. Bicego, D Gonzalez, E Grosso and Alba Castro (2008) “Generalized Gaussian distribution for sequential Data Classification” IEEE Trans. 978 -1- 4244-2175-6.
  23. Mori K. Toba N, Harada. T. Arai, Kometsu, M., Aoyagi, M., Murahara, Y., (1999) Human language identification with reduced spectral information. In Proc. EUROSPEECH. Vol.1. Budapest, Hungary. pp.391,394.
  24. Mclanchan G. and Krishan T (1997), “The EM Algorithm and Extensions”, John Wiley and Sons, New York – 2000.
  25. N. Z. Tishby, (1991)"On the application of mixture AR hidden Markov models to text independent speaker recognition," IEEE Trans. Signal Processing, vol. 39, pp. 563-570.
  26. Ramus.F., Nespor, M., Mehler, J.,(1999). Correlates of linguistic rhythm in speech signal. Cognition 73(3), pp. 265-292.
  27. R. C. Rose, E. M. Hofstetter, and D. A. Reynolds, (1994) "Integrated models of speech and background with application to speaker identification in noise," IEEE Trans. Speech Audio Processing, vol. 2, no. 2, pp. 245-257.
  28. R. E. Helms, (1981) "Speaker recognition using linear predictive vector code-books," Ph.D. thesis, Southern Methodist University.
  29. R. Rajeswara Rao, A. Nagesh, Kamakshi Prasad, K. Ephraim Babu (2007), “Text- Dependent Speaker Recognition System for Indian Languages”, IJCSNS International Journal of Computer Science and Network Security, vol.7 No.11, pp. 65 – 71.
  30. R. C. Rose, E. M. Hofstetter, and D. A. Reynolds (1994), "Integrated models of speech and background with application to speaker identification in noise”,
  31. S. Furui (1981), "Cepstral analysis technique for automatic speaker verification, " IEEE Trans. Acoustic., Speech, Signal Processing, vol. ASSP-29, pp. 254- 272. IEEE Trans. Speech Audio Processing, vol. 2, no. 2, pp. 245-257.
  32. S. Furui, F. Itakura, and S. Saito, (1972) "Talker recognition by longtime averaged speech spectrum," Electron., Commun. in Japan, vol. 55-A, no. 10, pp. 54-61.
  33. Shriberg.E., Stolcke, A.,Hakkani- ur,D.,Tur,G.,(2000) “Prosody-based automatic segmentation of speech into sentences and topics” Speech Comm. Pp. 32,127-154.
  34. Sharif k etal(1995), “Estimation of shape parameters for generalized Gaussian Distribution in Sub band decomposition or video”, IEEE transaction on circuit systems vol.5 no.1 pp. 52-56.
  35. T. Matsui and S. Furui, (1992) "Comparison of text- independent speaker recog¬nition methods using VQ- distortion and discrete/continuous HMMs," in Proc. IEEE ICASSP, pp. n. 157-11.164.
  36. Varanasi.MK et al(1989), “Parametric generalized Gaussian densit Estimation”, Journal Acoust. Soc.AM 86(4) pp. 1404.
  37. V.Sailaja (2010), “Some Studies on Text Independent Speaker Identification Models with Generalizations of Finite Gaussian Mixture Models”, unpublished Thesis notes Department of Electronics and Communication Engineering, Andhra University, Visakhapatnam.
  38. Wu.H.C.Y Principe J (1998), “Minimum entropy algorithm for source separation” proceedings of the Midwest symposium on system and circuits.
  39. Y. Kao, P. Rajasekaran, and J. Baras, (1992)"Free- Text speaker identification over long distance telephone channel using hypothesized phonetic segmentation," in Proc. IEEE ICASSP, pp. II. 177- 11. 180.
  40. Y. Bennani and P. Gallinari, (1991) "On the use of TDNN-extracted features information in talker identification," in Proc. IEEE ICASSP, pp. 385-388.
Index Terms

Computer Science
Information Sciences

Keywords

Generalized Gaussian Mixture Model Mel frequency cepstral coefficients EM algorithm Hierarchical clustering