A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition

G. Chenchamma; A. Govardhan

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Design of Instruction Service Quality System in Accordance with the Information and Communication Technology Frameworks

March

2016

Novel Notch Detection Algorithm for Detection of Dicrotic Notch in PPG Signals

January

2014

Design and Simulation of OTA using DTMOS Technique in 180 nm CMOS Process

April

2016

A Survey on FM-UWB Transceivers

January

2013

Reseach Article

A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition

by G. Chenchamma, A. Govardhan

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 49 - Number 3

Year of Publication: 2012

Authors: G. Chenchamma, A. Govardhan

10.5120/7605-0611

G. Chenchamma, A. Govardhan . A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition. International Journal of Computer Applications. 49, 3 ( July 2012), 6-10. DOI=10.5120/7605-0611

@article{ 10.5120/7605-0611,

author = { G. Chenchamma, A. Govardhan },

title = { A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition },

journal = { International Journal of Computer Applications },

issue_date = { July 2012 },

volume = { 49 },

number = { 3 },

month = { July },

year = { 2012 },

issn = { 0975-8887 },

pages = { 6-10 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume49/number3/7605-0611/ },

doi = { 10.5120/7605-0611 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:45:18.975987+05:30

%A G. Chenchamma

%A A. Govardhan

%T A Novel Approach for Extraction of Features from LP Residual in Time-Domain for Speaker Recognition

%J International Journal of Computer Applications

%@ 0975-8887

%V 49

%N 3

%P 6-10

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The objective is to model the dominating speaker-specific source in the time-domain at different levels, namely, Subsegmental, segmental and supra-segmental. The speaker-specific source information contained in the LP residual. Hence, LP residual contains different speaker-specific information at different levels. At each level features are extracted using proposed method called Hidden Markov models (HMM) and it is compared with existing Gaussian Mixture model (GMM). The experimental results demonstrates that the performance of Subsegmental level is more than the other two levels. However, the evidences from all the three levels of processing seem to be different and combine well to provide improved performance than the state-of –art speaker recognition system and demonstrating different speaker information captured at each level of processing. Finally, the combined evidence from all the three levels of processing together with vocal tract information further improves the speaker recognition performance. Experiments were conducted on TIMIT database using Gaussian Mixture Models (GMM's) and Hidden Markov models (HMM's). Comparing both results the proposed model HMM is better than the existing model GMM.

References

Ananthapadmanabha,T. V. , & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-27, 309–319.
Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697.
Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312.
Cohen, L. (1995). Time-frequency analysis: theory and application. Signal processing series. Englewood Cliffs: Prentice Hall.
Davis, S. B. , & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(28), 357–366.
Ezzaidi, H. , & Rouat, J. (2004). Pitch and MFCC dependent GMM models for speaker identification systems. In IEEE int. conf. on electrical and computer eng. : Vol. 1.
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.
Hayakawa, S. , Takeda, K. , & Itakura, F. (1997). Speaker identification using harmonic structure of lp-residual spectrum. In Lecture notes in computer science: Vol. 1206. Audio- and video-based biometric personal authentification (pp. 253–260). Berlin: Springer.
Huang, W. , Chao, J. , & Zhang, Y. (2008). Combination of pitch and MFCC, GMM super vectors for speaker verification. In IEEE int. conf. on audio, language and image process (ICALIP) (pp. 1335–1339).
Jankowski, C. , Kalyanswamy, A. , Basson, S. , & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In Int. conf. on acoust. speech and signal process. (ICASSP), Albuquerque, NM (pp. 109–112).
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.
Martin, A. , Doddington, G. , Kamm, T. , Ordowski, M. , & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proc. Eur. conf. on speech communication technology, Rhodes, Greece, Vol. 4 (pp. 1895–1898).
Mary, L. , & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796.
Mashao, D. J. ,& Skosan, M. (2006). Combining classifier decisions for robust speaker identification. Pattern Recognition, 39, 147–155.
Murthy, K. S. R. , & Yegnanarayana, B. (2008). Epoch extraction from speech signal. IEEE Transactions on Speech and Audio Processing, 16(8), 1602–1613.
Murthy, K. S. R. , & Yegnanarayana, B. (2009). Characterization of glottal activity from speech signal. IEEE Signal Processing Letters, 16(6), 469–472.
Murthy, K. S. R. , & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing letters, 13(1), 52–55.
Murthy, K. S. R. , Prasanna, S. R. M. , & Yegnanarayana, B. (2004). Speaker specific information from residual phase. In Int. conf. on signal proces. and comm. (SPCOM).
Nist speaker recognition evaluation plan (2003). In: Proc. NIST speaker recognition workshop, College Park, MD.
Prasanna, S. R. M. , Gupta, C. S. , & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.
Przybocky, M. , & Martin, A. (2000). The NIST-1999 speaker recognition evaluation- an overview. Digital Signal Processing, 10, 1–18.
Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), 639–643.
Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Reynolds, D. A. , & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 4–17.
Sonmez, K. , Shriberg, E. , Heck, L. , & Weintraub, M. (1998). Modeling dynamic prosodic variation for speaker verification. In Proc. ICSLP' 98: Vol. 7 (pp. 3189–3192).
Thevenaz, P. , & Hugli, H. (1995). Usefulness of the LPC-residue in text-independent speaker verification. Speech Communication, 17, 145–157.
Wolf, J. J. (1972). Efficient acoustic parameters for speaker recognition. The Journal of the Acoustical Society of America, 51(2), 2044–2055.
Yegnanarayana, B. , Reddy, K. S. , & Kishore, S. P. (2001). Source and system feature for speaker recognition using AANN models. In Proc. IEEE int. con. acoust. speech and signal processing, Salt Lake City, UT, USA, May 2001 (pp. 409–412).
Yegnanarayana, B. , Prasanna, S. R. M. , Zachariah, J. M. , & Gupta, C. S. (2005). Combining evidences from source, suprasegmental and spectral features for fixed-text speaker verification study. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.
Zheng, N. , Lee, T. , & Ching, P. C. (2007). Integration of complimentary acoustic features for speaker recognition. IEEE Signal Processing Letters, 14(3), 181–184.
Zue, V. , Seneff, S. , & Glassa, J. (1990). Speech database development at MIT: Timit and beyond. Speech Communication, 9(4), 351–356.

Index Terms

Computer Science

Information Sciences

Keywords

Subsegmental segmental suprasegmental LP residual Hidden Markov models (HMM)