CFP last date
20 January 2025
Reseach Article

A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition

by G. Chenchamma, A. Govardhan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 49 - Number 3
Year of Publication: 2012
Authors: G. Chenchamma, A. Govardhan
10.5120/7605-0611

G. Chenchamma, A. Govardhan . A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition. International Journal of Computer Applications. 49, 3 ( July 2012), 6-10. DOI=10.5120/7605-0611

@article{ 10.5120/7605-0611,
author = { G. Chenchamma, A. Govardhan },
title = { A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition },
journal = { International Journal of Computer Applications },
issue_date = { July 2012 },
volume = { 49 },
number = { 3 },
month = { July },
year = { 2012 },
issn = { 0975-8887 },
pages = { 6-10 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume49/number3/7605-0611/ },
doi = { 10.5120/7605-0611 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:45:18.975987+05:30
%A G. Chenchamma
%A A. Govardhan
%T A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition
%J International Journal of Computer Applications
%@ 0975-8887
%V 49
%N 3
%P 6-10
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The objective is to model the dominating speaker-specific source in the time-domain at different levels, namely, Subsegmental, segmental and supra-segmental. The speaker-specific source information contained in the LP residual. Hence, LP residual contains different speaker-specific information at different levels. At each level features are extracted using proposed method called Hidden Markov models (HMM) and it is compared with existing Gaussian Mixture model (GMM). The experimental results demonstrates that the performance of Subsegmental level is more than the other two levels. However, the evidences from all the three levels of processing seem to be different and combine well to provide improved performance than the state-of –art speaker recognition system and demonstrating different speaker information captured at each level of processing. Finally, the combined evidence from all the three levels of processing together with vocal tract information further improves the speaker recognition performance. Experiments were conducted on TIMIT database using Gaussian Mixture Models (GMM's) and Hidden Markov models (HMM's). Comparing both results the proposed model HMM is better than the existing model GMM.

References
  1. Ananthapadmanabha,T. V. , & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-27, 309–319.
  2. Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697.
  3. Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312.
  4. Cohen, L. (1995). Time-frequency analysis: theory and application. Signal processing series. Englewood Cliffs: Prentice Hall.
  5. Davis, S. B. , & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(28), 357–366.
  6. Ezzaidi, H. , & Rouat, J. (2004). Pitch and MFCC dependent GMM models for speaker identification systems. In IEEE int. conf. on electrical and computer eng. : Vol. 1.
  7. Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.
  8. Hayakawa, S. , Takeda, K. , & Itakura, F. (1997). Speaker identification using harmonic structure of lp-residual spectrum. In Lecture notes in computer science: Vol. 1206. Audio- and video-based biometric personal authentification (pp. 253–260). Berlin: Springer.
  9. Huang, W. , Chao, J. , & Zhang, Y. (2008). Combination of pitch and MFCC, GMM super vectors for speaker verification. In IEEE int. conf. on audio, language and image process (ICALIP) (pp. 1335–1339).
  10. Jankowski, C. , Kalyanswamy, A. , Basson, S. , & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In Int. conf. on acoust. speech and signal process. (ICASSP), Albuquerque, NM (pp. 109–112).
  11. Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.
  12. Martin, A. , Doddington, G. , Kamm, T. , Ordowski, M. , & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proc. Eur. conf. on speech communication technology, Rhodes, Greece, Vol. 4 (pp. 1895–1898).
  13. Mary, L. , & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796.
  14. Mashao, D. J. ,& Skosan, M. (2006). Combining classifier decisions for robust speaker identification. Pattern Recognition, 39, 147–155.
  15. Murthy, K. S. R. , & Yegnanarayana, B. (2008). Epoch extraction from speech signal. IEEE Transactions on Speech and Audio Processing, 16(8), 1602–1613.
  16. Murthy, K. S. R. , & Yegnanarayana, B. (2009). Characterization of glottal activity from speech signal. IEEE Signal Processing Letters, 16(6), 469–472.
  17. Murthy, K. S. R. , & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing letters, 13(1), 52–55.
  18. Murthy, K. S. R. , Prasanna, S. R. M. , & Yegnanarayana, B. (2004). Speaker specific information from residual phase. In Int. conf. on signal proces. and comm. (SPCOM).
  19. Nist speaker recognition evaluation plan (2003). In: Proc. NIST speaker recognition workshop, College Park, MD.
  20. Prasanna, S. R. M. , Gupta, C. S. , & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.
  21. Przybocky, M. , & Martin, A. (2000). The NIST-1999 speaker recognition evaluation- an overview. Digital Signal Processing, 10, 1–18.
  22. Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), 639–643.
  23. Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
  24. Reynolds, D. A. , & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 4–17.
  25. Sonmez, K. , Shriberg, E. , Heck, L. , & Weintraub, M. (1998). Modeling dynamic prosodic variation for speaker verification. In Proc. ICSLP' 98: Vol. 7 (pp. 3189–3192).
  26. Thevenaz, P. , & Hugli, H. (1995). Usefulness of the LPC-residue in text-independent speaker verification. Speech Communication, 17, 145–157.
  27. Wolf, J. J. (1972). Efficient acoustic parameters for speaker recognition. The Journal of the Acoustical Society of America, 51(2), 2044–2055.
  28. Yegnanarayana, B. , Reddy, K. S. , & Kishore, S. P. (2001). Source and system feature for speaker recognition using AANN models. In Proc. IEEE int. con. acoust. speech and signal processing, Salt Lake City, UT, USA, May 2001 (pp. 409–412).
  29. Yegnanarayana, B. , Prasanna, S. R. M. , Zachariah, J. M. , & Gupta, C. S. (2005). Combining evidences from source, suprasegmental and spectral features for fixed-text speaker verification study. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.
  30. Zheng, N. , Lee, T. , & Ching, P. C. (2007). Integration of complimentary acoustic features for speaker recognition. IEEE Signal Processing Letters, 14(3), 181–184.
  31. Zue, V. , Seneff, S. , & Glassa, J. (1990). Speech database development at MIT: Timit and beyond. Speech Communication, 9(4), 351–356.
Index Terms

Computer Science
Information Sciences

Keywords

Subsegmental segmental suprasegmental LP residual Hidden Markov models (HMM)