CFP last date
20 December 2024
Reseach Article

Text Dependent Speaker Recognition using MFCC features and BPANN

by Praveen N, Tessamma Thomas
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 74 - Number 5
Year of Publication: 2013
Authors: Praveen N, Tessamma Thomas
10.5120/12883-9788

Praveen N, Tessamma Thomas . Text Dependent Speaker Recognition using MFCC features and BPANN. International Journal of Computer Applications. 74, 5 ( July 2013), 31-39. DOI=10.5120/12883-9788

@article{ 10.5120/12883-9788,
author = { Praveen N, Tessamma Thomas },
title = { Text Dependent Speaker Recognition using MFCC features and BPANN },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 74 },
number = { 5 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 31-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume74/number5/12883-9788/ },
doi = { 10.5120/12883-9788 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:41:27.133981+05:30
%A Praveen N
%A Tessamma Thomas
%T Text Dependent Speaker Recognition using MFCC features and BPANN
%J International Journal of Computer Applications
%@ 0975-8887
%V 74
%N 5
%P 31-39
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Mel-Frequency Cepstral Coefficients are spectral feature which are widely used for speaker recognition and text dependent speaker recognition systems are the most accurate in voice based authentication systems. In this paper, a text dependent speaker recognition method is developed. MFCCs are computed for a selected sentence. The first 13 MFCCs are considered for each frames of duration 26ms and each coefficient is clustered to a 5 element cluster centres and finally to a form a 65 element speech code vector for the entire speech. The speech code is trained using a multi-layer perceptron backpropagation gradient descent network and the network is tested for various test patterns. The performance is measured using FAR, FRR and EER parameters. The recognition rate achieved is 96. 18% for a cluster size of 5 in each coefficient.

References
  1. Thomas F. Quatieri, Discrete Time Signal Processing Principles and Practice, Pearson Education Inc. India.
  2. J. P. Campbell, Speaker recognition: A tutorial, Proc. IEEE, vol. 85, pp. 1437–1462, 1997.
  3. Tomi H. Kinnunen, Optimizing Spectral Feature Based Text-Independent Speaker Recognition, Academic Dissertation, University of Joensuu, 2005.
  4. P. Melmerstein and S. B. Davis, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans Acoustic, Speech, and Signal Processing, vol 28, no. 4, pp. 357-366,1980.
  5. Hassoun, M. H, Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, MA.
  6. Y. Linde, A. Buzo, and R. M. Gray, An algorithm for vector quantizer design, IEEE Trans. on Communications, vol. COM_28 (1), pp. 84-96, Jan. 1980.
  7. I. A. Basheer, M. Hajmeer, Artificial neural networks: fundamentals, computing, design and application, Journal of microbiological methods 43 (2000) 3–31.
  8. Masters, T. , 1994. Practical Neural Network Recipes in C11. Academic Press, Boston, MA.
  9. Haykin S, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 2004
Index Terms

Computer Science
Information Sciences

Keywords

Mel-Frequency Cepstral Coefficients False Acceptance Rate False Rejection Rate Equal Error Rate