We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Gender Clustering and Classification Algorithms in Speech Processing: A Comprehensive Performance Analysis

by M. Gomathy, K. Meena, K. R. Subramaniam
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 51 - Number 20
Year of Publication: 2012
Authors: M. Gomathy, K. Meena, K. R. Subramaniam
10.5120/8156-1533

M. Gomathy, K. Meena, K. R. Subramaniam . Gender Clustering and Classification Algorithms in Speech Processing: A Comprehensive Performance Analysis. International Journal of Computer Applications. 51, 20 ( August 2012), 9-17. DOI=10.5120/8156-1533

@article{ 10.5120/8156-1533,
author = { M. Gomathy, K. Meena, K. R. Subramaniam },
title = { Gender Clustering and Classification Algorithms in Speech Processing: A Comprehensive Performance Analysis },
journal = { International Journal of Computer Applications },
issue_date = { August 2012 },
volume = { 51 },
number = { 20 },
month = { August },
year = { 2012 },
issn = { 0975-8887 },
pages = { 9-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume51/number20/8156-1533/ },
doi = { 10.5120/8156-1533 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:50:51.994174+05:30
%A M. Gomathy
%A K. Meena
%A K. R. Subramaniam
%T Gender Clustering and Classification Algorithms in Speech Processing: A Comprehensive Performance Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 51
%N 20
%P 9-17
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In speech processing gender clustering and classification is the most outstanding and challenging task. In both gender clustering and classification, one the most vital processes carried out is the selection of features. In speech processing, pitch is the most often used feature for gender clustering and classification. It is essential to note that compared to a female speech the pitch value of a male speech is much different. Also, in terms of frequency there is a considerable dissimilarity between the male and female speech. In some situations, either the frequency of male is almost same as female or the frequency of female is same as male. It is difficult to find out the exact gender in such conditions. This paper focus on rectifying these practical obstacles by extracting three significant features, namely, energy entropy, zero crossing rate, and short time energy. Gender clustering is performed based on these features. However, by means of Euclidean distance, Mahalanobis distance, Manhattan distance & Bhattacharyya distance methods the clustering performance is analyzed. Using fuzzy logic, neural network, hybrid neuro-fuzzy, and support vector machine the gender classification is done. A benchmark dataset and real-time dataset is used for testing to make sure the reliability of the performance. The test results show the performance of various techniques and distance algorithms for different datasets

References
  1. K. Sreenivasa Rao, Ramu Reddy, Sudhamay Maity, and Shashidhar G Koolagudi, "Characterization of Emotions Using the Dynamics of ProsodicFeatures", In Proceeding of Devices and Communications (ICDeCom) International Conference onThe Dynamics of ProsodicFeatures, The Dynamics Of ProsodicFeatures, pp. 1-4, 2010.
  2. Andrea DeMarco, and Stephen J. Cox, "An Accurate and Robust Gender Identification Algorithm", Journal of Neuroscience Methods,Vol. 172, No. 1, pp. 122-130, 2008.
  3. Yakun Hu, Dapeng Wu, and Antonio Nucci, "Pitch-based Gender Identification with Two-stage Classification", Security and Communication Networks,Vol. 5, No. 2, pp. 211–225, Feb 2012.
  4. I. Trancoso, T. Pellegrini , J. Portelo, H. Meinedo, M. Bugalho, A. Abad, and J. Neto, "Audio Contributions to Semantic Video Search ", In proceeding of Multimedia and Expo, ICME. IEEE International Conference onSemantic Video Search, pp. 630-633, July 2009.
  5. AnvitaBajpa, and B. Yegnanaraya, "CombiningEvidencefrom Subsegmental and SegmentalFeaturesfor AudioClipClassification ", In proceeding of IEEE Region Conference (TENCON), Hyderabad, India, No. 10, pp. 1-6, Nov 2008.
  6. Hugo Meinedo, and Joao Neto, "A Stream-based Audio Segmentation, Classification and Clustering Pre-processing System for Broadcast News using ANN Models", In proceeding of IEEE International Conference on Acoustics Speech and Signal Processing, ISBN: 0780376633, Vol. 2, pp. 237-240, 2003.
  7. Arnulf B. A. Graf, and Felix A. Wichmann, "Gender Classification of Human Faces", Lecture Notes in Computer Science, Vol. 2525, pp. 1-18, DOI: 10. 1007/3-540-36181-2_49, 2002.
  8. Kotti. M, and Kotropoulos. C, "Gender Classification In Two Emotional Speech Databases", In Proceedings of 19th International Conference on Pattern Recognition, pp. 1-4, Tampa, Dec 2008.
  9. Yu-Min Zengi, Zhen-Yang Wu, Tiago Falk, and Wai-Yip Chan, "Robust GMM based Gender Classification using Pitch and Rasta-PLP Parameters of Speech", In Proceedings of Fifth International Conference on Machine Learning and Cybernetics, pp. 13-16, Dalian, Aug 2006.
  10. Yoko Hasegawa, and Kazue Hata, "Non-Physiological Differences Between Male and Female Speech: Evidence from the Delayed F0 Fall Phenomenon in Japanese", In Proceedings of 1994 International Conference on Spoken Language Processing, pp. 1179-82, 1994.
  11. Yoko Hasegawa, and Kazue Hata, "The Function of F0-Peak Delay in Japanese", In Proceedings of 21st Annual Meeting of the Berkeley Linguistics Society, pp. 141-151, 1995.
  12. Theodoros Giannakopoulos, Aggelos Pikrakis, and Sergios Theodoridis, "A Speech/Music Discriminator for Radio Recordings using Bayesian Networks", In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 809-812, Toulouse, 2006.
  13. M. Faúndez-Zanuy, S. McLaughlin, A. Esposito, A. Hussain, J. Schoentgen, G. Kubin, W. B. Kleijn and P. Maragos, "Non-linear Speech Processing: Overview and Applications, Control & Intelligent Systems", ACTA Press, Vol. 30, No. 1, pp. 1-10, 2002.
  14. Gurpreet Singh, Akhil Junghare, and Priyam Chokhani, "Multi Utility E-Controlled cum Voice Operated Farm Vehicle", International Journal of Computer Applications, Vol. 1, No. 13, pp. 109-113, 2010.
  15. Ramzi A. Haraty, and Omar El Ariss, "CASRA+: A Colloquial Arabic Speech Recognition Application", American Journal of Applied Sciences, Vol. 4, No. 1, pp. 23-32, 2007.
  16. Ibrahim Patel, and Y. Srinivas Rao, "SpeechRecognition using HMM with MFCC- an Analysis using Frequency Specral De composion Technique", Signal & Image Processing : An International Journal (SIPIJ), Vol. 1, No. 2, pp. 101-110, Dec 2010.
  17. Anandthirtha. B. Gudi, and H. C. Nagaraj, "Optimal Curve Fitting of Speech Signal for Disabled Children", International Journal of Computer science & Information Technology (IJCSIT), Vol. 1, No 2, pp. 99-107, Nov 2009.
  18. R. J. McAulay, and T. F. Quatieri, "Speech Processing Based on a Sinusoidal Model", The Lincoln Laboratory Journal, Vol. 1, No. 2,pp. 153-168, 1988.
  19. Yingyong Qi, and Bobby R. Hunt, "Voiced-Unvoiced-Silence Classifications of Speech using Hybrid Features and a Network Classifier", IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 2, pp. 250-255, April 1993.
  20. James A. Rodger, Parag C. Pendharkar, "A field Study of the Impact of Gender and User's Technical Experience on the Performance of Voice-Activated Medical Tracking Application", Int. J. Human-Computer Studies, Vol. 60, pp. 529–544, 2004.
  21. Hongling Xie, Thomas W. Farmer, and Beverley D. Cairns, "Different forms of aggression among inner-city African–American children: Gender, configurations, and school social networks", Journal of School Psychology, Vol. 41, pp. 355 – 375, March 2003.
  22. Wei Chu, and Abeer Alwan, "Reducing F0 Frame Error of F0 Tracking Algorithms Under Noisy Conditions With An Unvoiced/Voiced Classification Frontend", ' Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Vol. 9, pp. 3969-3972, 2009.
  23. S. G. Patil, S. Mandal, and A V Hegde, "Genetic algorithm based support vector machine regression in predicting wave transmission of horizontally Interlaced Multi-layer Moored floating pipe breakwater ", Engineering & Instrumentation, Elsevier, Vol. 45, No. pp. 203–212, 2012.
  24. Yune-Sang Lee1, Peter Turkeltaub, Richard Granger, and Rajeev D. S. Raizada, "Categorical Speech Processing in Broca's Area: An fMRI Study Using Multivariate Pattern-Based Analysis", The Journal of Neuroscience,Vol. 32, No. 24, JAN 2012.
  25. Marco Jeub, Magnus Schafer, and Peter Vary, "A Binaural Room Impulse Response Database for the Evaluation of Dereverberation Algorithms", IEEE, IET, EURASIP, Vol. 16, pp. 1- 4, July 2009.
Index Terms

Computer Science
Information Sciences

Keywords

Mahalanobis distance Manhattan distance Bhattacharyya distance Neuro fuzzy Support vector machine