We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Text-dependent Speaker Recognition by Combination of LBG VQ and DTW for Persian Language

by Mahdi Keshavarz Bahaghighat, Farshid Sahba, Ehsan Tehrani
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 51 - Number 16
Year of Publication: 2012
Authors: Mahdi Keshavarz Bahaghighat, Farshid Sahba, Ehsan Tehrani
10.5120/8126-1711

Mahdi Keshavarz Bahaghighat, Farshid Sahba, Ehsan Tehrani . Text-dependent Speaker Recognition by Combination of LBG VQ and DTW for Persian Language. International Journal of Computer Applications. 51, 16 ( August 2012), 23-27. DOI=10.5120/8126-1711

@article{ 10.5120/8126-1711,
author = { Mahdi Keshavarz Bahaghighat, Farshid Sahba, Ehsan Tehrani },
title = { Text-dependent Speaker Recognition by Combination of LBG VQ and DTW for Persian Language },
journal = { International Journal of Computer Applications },
issue_date = { August 2012 },
volume = { 51 },
number = { 16 },
month = { August },
year = { 2012 },
issn = { 0975-8887 },
pages = { 23-27 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume51/number16/8126-1711/ },
doi = { 10.5120/8126-1711 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:50:34.153440+05:30
%A Mahdi Keshavarz Bahaghighat
%A Farshid Sahba
%A Ehsan Tehrani
%T Text-dependent Speaker Recognition by Combination of LBG VQ and DTW for Persian Language
%J International Journal of Computer Applications
%@ 0975-8887
%V 51
%N 16
%P 23-27
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper gives a novel approach of automatic speaker recognition technology, with an emphasis on text-dependent speaker recognition. Speaker recognition has been studied actively for several decades. In fact, Speaker recognition system may be viewed as working in four stages, namely, analysis, feature extraction, modeling and testing. After some preprocessing modules, we apply MFCC, as one of the most important feature extraction methods in this field of works, to speech signals independently in order to extract feature vectors. Afterwards, obtained vectors are used by training system to find codewords for ten users in our Persian database by LBG VQ. Finally, we use DTW technique for recognizing a speaker among all. Our experience strongly indicates that the identification rate over 96% can be achieved by the proposed algorithm.

References
  1. E. Karpov, "Real-Time Speaker Identification " , University of Joensuu, Department of Computer Science, Master's Thesis,2003.
  2. D. A. Reynolds, "An Overview of Automatic Speaker Recognition Technology", ICASSP 2002, pp 4072-4075.
  3. Eamon. Keogh , "Exact indexing of Dynamic Time Warping",2002 Computer Science & Engineering Department Riverside ,university of California, CA92521.
  4. Z. Bin, W. Xihong, C. Huisheng, "On the Importance of Components of theMFCC in Speech and Speaker Recognition", Center for Information Science, Peking University, China, 2001.
  5. Gerrit C. van der Veer, Hans van Vliet," The Human-Computer Interface is the System: A Plea for a Poor Man's HCI Component in Software Engineering", Curricula. CSEE&T 200.
  6. Venayagamoorthy GK, Sunderpersadh N, "Comparison of Text-Dependent Speaker Identification Methods for Short Distance Telephone Lines using Artificial Neural Networks", International Joint Neural Networks Conference (IJCNN 2000), Como, Italy, 24 – 27 July, 2000, vol. 5, pp. 253-258 .
  7. Park, S. ,Chu, W. ,Yoon,J& Hsu, C. ,(2000). "Efficient search for similar subsequence of different lengths in sequence database". In Proc. 16th IEEE Intconf. on data Engineering. pp. 23-32.
  8. C. Wutiwiwatchai,V. Achariyakulporn&C. Tanprasert ,"Text-dependent Speaker Identification using LPC and DTW for Thai Language",1999,Engineering Laboratory, National Electronics and Computer Technology Center,Nation Science and Technology THAILAND.
  9. T. Kinnunen, I. Karkkainen, P. Franti: "Is Speech Data Clustered? - Statistical Analysis of Cepstral Features", Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001),vol. 4, pp. 2627-2630.
  10. T. Kinnunen, T. Kilpelainen, P. Franti: "Comparison of Clustering Algorithms in Speaker Identification", Proc. IASTED Int. Conf. Signal Processing and Communications (SPC 2000), pp. 222-227, Marbella, Spain, September 19-22,2000.
  11. L. Rabiner and B. -H. Juang, "Fundamentals of Speech Recognition",Englewood Cliffs (N. J. ), Prentice Hall Signal Processing Series, 1993.
  12. S. Molau, M. Pitz, R. Schluter, H. Ney, "Computing Mel-Frequency CepstralCoefficients on the Power Spectrum", Acoustics, Speech, and Signal Processing,2001 IEEE International Conference, Volume: 1, 2001, pp. 73-76.
  13. L. Liao, M. Gregory, "Algorithms for Speech Classification", ISSPA 1999,Brisbane, Australia.
  14. D. O'Shaughnessy, " Linear Predictive Coding", IEEE Potentials -- Vol. 7,1988, no. 1, p. 29-3.
  15. T. Matsui and S. Furui, "Concatenated phoneme models for text-variable speaker recognition," Proc. ICASSP, pp. II-391-394, (1993).
  16. V. Ram, A. Das, and V. Kumar, "Text-dependent speaker-recognition using one-pass dynamic programming",Proc. ICASSP'06, (2006).
  17. F. K. Soong, A. E. Rosenberg, et al, "A vector quantization approach to speaker recognition", AT&T Tech. Journal, Vol 66, pp 14-26 (1987).
  18. A. Das & P. Ghosh, "Audio-Visual Biometric Recognition by Vector Quantization", IEEE SLT-06, 2006.
  19. Amitava Das &GokulChittaranjan, "Text-Dependent Speaker Recognition by Efficient Capture of Speaker Dynamics in Compressed Time-Frequency Representations of Speech", submitted to Inter-speech 2008.
  20. Y. Linde, A. Buzo& R. Gray, "An algorithm for vector quantizer design", IEEE Transactions on Communications, Vol. 28, pp. 84-95, 1980.
  21. Stevens, Stanley Smith; Volkman; John; & Newman, Edwin (1937). "A scale for the measurement of the psychological magnitude pitch". Journal of the Acoustical Society of America 8 (3): 185–190.
Index Terms

Computer Science
Information Sciences

Keywords

Speaker recognition systems MFCC LBG VQ DTW