Effect of Gender on Improving Speech Recognition System

Amer Sallam; Sreedhar Bhukya

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Effect of Gender on Improving Speech Recognition System

by Amer Sallam, Sreedhar Bhukya

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 179 - Number 14

Year of Publication: 2018

Authors: Amer Sallam, Sreedhar Bhukya

10.5120/ijca2018916200

Amer Sallam, Sreedhar Bhukya . Effect of Gender on Improving Speech Recognition System. International Journal of Computer Applications. 179, 14 ( Jan 2018), 22-30. DOI=10.5120/ijca2018916200

@article{ 10.5120/ijca2018916200,

author = { Amer Sallam, Sreedhar Bhukya },

title = { Effect of Gender on Improving Speech Recognition System },

journal = { International Journal of Computer Applications },

issue_date = { Jan 2018 },

volume = { 179 },

number = { 14 },

month = { Jan },

year = { 2018 },

issn = { 0975-8887 },

pages = { 22-30 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume179/number14/28868-2018916200/ },

doi = { 10.5120/ijca2018916200 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:55:21.452633+05:30

%A Amer Sallam

%A Sreedhar Bhukya

%T Effect of Gender on Improving Speech Recognition System

%J International Journal of Computer Applications

%@ 0975-8887

%V 179

%N 14

%P 22-30

%D 2018

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Speech is the output of a time varying excitation excited by a time varying system. It generates pulses with fundamental frequency F0. This time varying impulse trained as one of the features, characterized by fundamental frequencyF0and its formant frequencies. These features vary from one speaker to another speaker and from gender to gender also. In this paper the effect of gender on improving speech recognition is considered. Variation in F0 and formant frequencies is the main features that characterize variation in a speaker. The variation becomes very less within speaker, medium within the same gender and very high among different genders. This variation in information can be exploited to recognize gender type and to improve performance of speech recognition system through modeling separate models based on gender type information. Five sentences are selected for training. Each of the sentences are spoken and recorded by 20 female’s speakers and 20 male speakers. The speech corpus wills be preprocessed to identify the voiced and unvoiced region. The voiced region is the only region which carries information about F0. From each voiced segment, F0and the first three formant frequencies and also MFCC features are computed. Each forms the feature space labeled with the speaker identification: i.e., male or female. This information misused to parameterize the model for male and female. K-means algorithm is used during training as well as testing. Testing is conducted in two ways: speaker dependent testing and speaker independent testing. SPHINX-III software by Carnegie Mellon University has been used to measure the accuracy of speech recognition of data taking into account the case of gender separation which has been used in this research.

References

Breazeal, C. and Aryanda, L. (2000), 'Recognition of affective communicative intent in robot- directed speech,' in 'Proceedings ofHumanoids2000'.
http://www.ece.auckland.ac.nz/p4p_2005/archive/reports2003/pdfs/p60_ hlai015.pdf.
S. Davis and P. Mermelstein. Compar ison of parametric represent ations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing, 28:357–366, Aug1980.
ParrisE.S.,CareyM.I., Language Inde pendent Gender Identification, Proceedings of IEEEICASSP,pp685-688,1996.
Linde,Y.,A. Buzo, and R.M. Gray, "An Algorithm for Vector Quantizer Design, "IEEE Trans. on Communication, 1980, COM-28(1),pp.84-95.
Hartigan, J.A., Clustering Algorithm, 1975, New York, J. Wiley.
Gersho, A., "On the Structure of Vector Quantization," IEEE Trans. On Information Theory, 1982, IT- 28,pp. 256-261.
Richard P. Lappmann, Speech recog nition by Machines and Humans, SPEECH Comm., pp. 1-15, 1997.
Santosh K. Gaikwad, Bharti W. Gawali and Pravin Yannawar, A Review on speech recognition technique, International Journal of Computer Application, Vol. 10(3), pp. 16-24, 2010.
M. Prabha, P. Viveka and Bharathasreeja, Advanced gender recognition system using speech signal, IJSET, Vol.6(4), pp. 118-120, 2016.
Chetana Prakash and Suryakanth V Gangasetty, Fourier- Bessel based cepstral coefficient features for text- independent speaker identification, IICA, pp.913-930, 2-11.
MusaedAlhussein, Zalfiqar Ali, Muhammad Imran and Wadood Abdul, Automatic gender detection based on characteristics of vocal folds for mobile healthcare system, Hindawi, pp. 1-12, 2016.
Suma Swamy and K. V Ramakrishnan, An efficient speech recognition system, CSEIJ, Vol3(4), pp. 21-27, 2013.
Preeti Saini and ParneetKaur, Automatic speech recognition: A review, IJETT, Vol (2), pp. 132-136, 2013.
Bhupinder Singh, Neha Kapur and Puneet Kaur, Speech recognition with Hidden Markow model: A review, IJARCSSE, Vol. 2(3), pp. 400-403, 2012.
M.A Anusuya and S.K Katti, Speech recognition by Machine: A review, IJCSIS, Vol6(3), pp. 181-205, 2009.

Index Terms

Computer Science

Information Sciences

Keywords

Speech recognition (SR) Linear Prediction Coding (LPC) Accent Speakers.