CFP last date
20 February 2025
Reseach Article

AM-FM Based Robust Speaker Identification in Babble Noise

Published on None 2011 by Mangesh S. Deshpande, Raghunath S. Holambe
International Conference and Workshop on Emerging Trends in Technology
Foundation of Computer Science USA
ICWET - Number 14
None 2011
Authors: Mangesh S. Deshpande, Raghunath S. Holambe
4c466db1-dfa6-4145-a647-aad732a74849

Mangesh S. Deshpande, Raghunath S. Holambe . AM-FM Based Robust Speaker Identification in Babble Noise. International Conference and Workshop on Emerging Trends in Technology. ICWET, 14 (None 2011), 28-35.

@article{
author = { Mangesh S. Deshpande, Raghunath S. Holambe },
title = { AM-FM Based Robust Speaker Identification in Babble Noise },
journal = { International Conference and Workshop on Emerging Trends in Technology },
issue_date = { None 2011 },
volume = { ICWET },
number = { 14 },
month = { None },
year = { 2011 },
issn = 0975-8887,
pages = { 28-35 },
numpages = 8,
url = { /proceedings/icwet/number14/2167-is265/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference and Workshop on Emerging Trends in Technology
%A Mangesh S. Deshpande
%A Raghunath S. Holambe
%T AM-FM Based Robust Speaker Identification in Babble Noise
%J International Conference and Workshop on Emerging Trends in Technology
%@ 0975-8887
%V ICWET
%N 14
%P 28-35
%D 2011
%I International Journal of Computer Applications
Abstract

Speech babble is one of the most challenging noise interference due to its speaker/speech like characteristics for speech and speaker recognition systems. Performance of such systems strongly degrades in the presence of background noise, like the babble noise. Existing techniques solve this problem by additional processing of speech signal to remove noise. In contrast to existing works, the aim is to improve noise robustness focusing on the features only. To derive robust features, amplitude modulation - frequency modulation (AM-FM) based speaker model is proposed. The robust features are derived by fusing the characteristics of speech production and speech perception mechanisms. The performance is evaluated using clean speech corpus from TIMIT database combined with babble noise from the NOISEX-92 database. Experimental results show that the proposed features significantly improve the performance over the conventional Mel frequency cepstral coefficient (MFCC) features under mismatched training and testing environments.

References
  1. Acero, A., Dend, L., Kristjansoon, T. and Zhang, J. 2000. Hmm adaptations using vector Taylor series for noisy speech recognition. In Proceedings of (ICSLP'00), 869-872.
  2. Dimitriadis, D. and Maragos, P. 2003. Robust energy demodulation based on continuous models with application to speech recognition. In Proceedings of (EUROSPEECH'03), 2853-2856, Geneva, Switzerland.
  3. Dimitriadis, V., Maragos, P. and Potamianos, A. Robust AM-FM features for speech recognition. IEEE Signal Process. Letters, vol. 12, no. 9, 621-624, 2005.
  4. Francesco, G., Giorgio, B., Paolo, C. and Claudio, T. Multicomponent AM-FM representations: An asymptotically exact approach. IEEE Trans. Audio, Speech and Language Processing, vol. 15, no. 3, 823-837, 2007.
  5. Gales, M. J. F. and Young, S. J. On stochastic feature and model compensation approaches to robust speech recognition. Speech Communication, vol. 25, 29-47, 1998.
  6. Graciarena, M., Kajarekar, S., Stolcke, A. and Shriberg, E. Noise robust speaker identification for spontaneous Arabic speech. 2007. In Proceedings of IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP'07), IV-245-IV-248.
  7. Grimaldi, M. and Cummins, F. Speaker identification using instantaneous frequencies. IEEE Trans. Audio, Speech and Language Processing, vol. 16, no. 6, 1097-1111, 2008.
  8. Holmes, N. J. and Sedgwick, N. C. 1986. Noise compensation for speech probabilistic models. In Proceedings of IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP'86).
  9. Hung, J.-W. and Lee, L.-S. Optimization of temporal filters for constructing robust features in speech recognition. IEEE Trans. Audio, Speech and Language Processing, vol. 14, no. 3, 808-832, 2006.
  10. Islam, M. R. and Rahman, M. F. Noise robust speaker identification using PCA based genetic algorithm. International Journal of Computer Applications, vol. 4, no. 12, 27-31, 2010.
  11. Jankowski, C. R., Quatieri, T. F. and Reynolds, D. A. 1995. Measuring fine structure in speech: Application to speaker identification. In Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 325-328.
  12. Krishnamurthy, N. and Hansen, J. H. L. Babble noise: modeling, analysis, and applications. IEEE Trans. Audio, Speech and Language Processing, vol. 17, no.7, 1394-1407, 2009.
  13. Lee, C. -H. Cepstral parameter compensation for hmm recognition in noise. Speech Communication, vol. 12, 231-239, 1993.
  14. Li, G., Qiu, L. and Ng, K. L. Signal representation based on instantaneous amplitude models with application to speech synthesis. IEEE Trans. Speech and Audio Processing, vol. 8, no. 3, 353-357, 2000.
  15. Lindemann, E. and Kates, J. M. Phase relationships and amplitude envelopes in auditory perception. In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 17-20, New Paltz, New York, 1999.
  16. Maragos, P., Kaiser, J. F. and Quatieri, T. F. Energy separation in signal modulations with application to speech analysis. IEEE Trans. Signal Processing, vol. 41, no. 10, 3024-3051, 1993.
  17. Marchetto, E., Avanzini, F. and Flego, F. 2009. An automatic speaker recognition system for intelligence applications. In Proceedings of European Signal Processing Conference (EUSIPCO' 09), 1612-1616, Glasgow, Scotland.
  18. McAulay, R. J. and Quatieri, T. F. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoustic Speech and Signal Processing, vol. 34, 744-754, 1986.
  19. Ming, J., Hazen, T. J., Glass, J. R. and Reynolds, D. A. Robust speaker recognition in noisy conditions. IEEE Trans. Audio, Speech and Language Processing, vol. 15, no.5, 1711-1723, 2007.
  20. N. I. of Standards and Technology. The NIST SRE 2008 evaluation plan (SRE-08). Technical report, 2008.
  21. Potamianos, A. and Maragos, P. A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation. Signal Processing, vol. 37, 95-120, 1994.
  22. Potamianos, A. and Maragos, P. Speech formant frequency and bandwidth tracking using multiband energy demodulation. J. Acoust. Soc. Am., vol. 99, no. 6, 3795-3806, 1996.
  23. Potamianos, A. and Maragos, P. Time-frequency distributions for automatic speech recognition. IEEE Trans. Speech and Audio Processing, vol. 9, no. 3, 196-200, 2001.
  24. Rabiner, L. R. and Shafer, R. W. Digital Signal Processing of Speech Signals. Englewood Cliffs, NJ:Prentice-Hall, 1989.
  25. Ramasubramanian, V., Vijaywargiay, D. and Kumar, V. P. 2006. Highly noise robust text-dependent speaker recognition based on hypothesized wiener filtering. In Proceedings of INTERSPEECH 2006 (ICSLP' 06), 1455-1458, Pittsburgh, Pennsylvania.
  26. Rao, A. and Kumaresan, R. On decomposing speech into modulated components. IEEE Trans. Speech Audio Processing, vol. 8, no. 3, 240-254, 2000.
  27. Reynolds, D. A. Experimental evaluation of features for robust speaker identification. IEEE Trans. Speech and Audio Processing, vol. 2, no. 4, 639-642, 1994.
  28. Reynolds, D. A. Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, vol. 17, 91-108, 1995.
  29. Reynolds, D. A. An overview of automatic speaker recognition technology. 2002. In Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP'02), 4072-4075.
  30. Reynolds, D. A. and Rose, R. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech and Audio Processing, vol. 3, no. 1, 72-83, 1995.
  31. Tam, Y. C. and Mark, B. 2000. Optimization of sub-band weights using simulated noisy speech in multi-band speech recognition. In Proceedings of (ICSLP'00), 313-316.
  32. Varga, A. P. and Moore, R. K. 1990. Hidden Markov model decomposition of speech noise. In Proceedings of IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP'90), 845-848.
  33. Wan, V. and Renals, S. 2002. Evaluation of kernel methods for speaker verification and identification. In Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, 669-672.
  34. Boashash, B. 1992. Estimating and interpreting the instantaneous frequency of a signal-part 1: Fundamentals. Proc. IEEE, vol. 80, no. 4, pp. 519-538.
Index Terms

Computer Science
Information Sciences

Keywords

Speaker identification AM-FM model Babble noise