Speech Emotion Recognition based on Voiced Emotion Unit

Reda Elbarougy

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Design of Instruction Service Quality System in Accordance with the Information and Communication Technology Frameworks

March

2016

Novel Notch Detection Algorithm for Detection of Dicrotic Notch in PPG Signals

January

2014

Design and Simulation of OTA using DTMOS Technique in 180 nm CMOS Process

April

2016

A Survey on FM-UWB Transceivers

January

2013

Reseach Article

Speech Emotion Recognition based on Voiced Emotion Unit

by Reda Elbarougy

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 178 - Number 47

Year of Publication: 2019

Authors: Reda Elbarougy

10.5120/ijca2019919355

Reda Elbarougy . Speech Emotion Recognition based on Voiced Emotion Unit. International Journal of Computer Applications. 178, 47 ( Sep 2019), 22-28. DOI=10.5120/ijca2019919355

@article{ 10.5120/ijca2019919355,

author = { Reda Elbarougy },

title = { Speech Emotion Recognition based on Voiced Emotion Unit },

journal = { International Journal of Computer Applications },

issue_date = { Sep 2019 },

volume = { 178 },

number = { 47 },

month = { Sep },

year = { 2019 },

issn = { 0975-8887 },

pages = { 22-28 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume178/number47/30867-2019919355/ },

doi = { 10.5120/ijca2019919355 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:53:22.268534+05:30

%A Reda Elbarougy

%T Speech Emotion Recognition based on Voiced Emotion Unit

%J International Journal of Computer Applications

%@ 0975-8887

%V 178

%N 47

%P 22-28

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Speech emotion recognition (SER) system is becoming a very important tool for human-computer interaction. Previous studies in SER have been focused on utterance as one unit. They assumed that the emotional state is fixed during the utterance, although, the emotional state may change during the time even in one utterance. Therefore, using utterance as one unit is not suitable for this purpose especially for long utterances. The ultimate goal of this study is to find a novel emotion unit that can be used to improve SER accuracy. Therefore, different emotion units defined based on voiced segments are investigated. To find the optimal emotion unit, SER system based on support vector machine (SVM) classifier is used to evaluate each unit. The classification rate is used as a metric for the evaluation. To validate the proposed method, the Berlin database of emotional speech EMO-DB is used. The experimental results revealed that emotion unit that contains four voiced segments gives the highest recognition rate for SER. Moreover, the final emotional state of the whole utterance is determined by majority voting of emotional states of its units. It is found that the performance of the proposed method using voiced related emotion unit outperforms the conventional method using utterance as one unit.

References

Jiang W., Wang Z., Jin J.S., Han X., Li C. Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network. Sensors (Basel). 2019 Jun 18;19(12):2730.
Alonso-Mart́ın, F., Malfaz, M., Sequeira, J., Gorostiza, J. F., and Salichs, M. A., "A multimodal emotion detection system during human–robot interaction," Sensors, vol. 13, no. 11, pp. 15 549–15 581, 2013.
Busso, C., Bulut, M., Lee, C.C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J.N., Lee, S., and Narayanan, S.S., "IEMOCAP: Interactive emotional dyadic motion capture database," Journal of Language Resources and Evaluation, vol. 42, no. 4, pp. 335-359, December 2008.
Vogt, T., Andr´e, E., "Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition," In: Proceedings of International Conference on Multimedia & Expo., Amsterdam, The Netherlands (2005).
Vogt, T., Andr´e, E., and Wagner, J., “Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realization,” In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2007)
Kerkeni, L., Serrestou, Y., Raoof, K., Cléder, C., Mahjoub, M., Mbarki, M. (2019). "Automatic Speech Emotion Recognition Using Machine Learning"
Batliner, A., Fischer, K., Huber, R., Spilker, J., Noth, E.: "How to find trouble in communication," Speech Communication 40, 117–143 (2003)
Batliner, A., Seppi, D., Steidl, S., and Schuller, B., "Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach," Advances in Human-Computer Interaction, 2010. vol. 2010, Article ID 782802, 15 pages.
Seppi, D., Batliner, A., Steidl, S., Schuller, B., and Noth, E., "Word Accent and Emotion," In Proc. Speech Prosody 2010, Chicago, IL, 2010.
Burkhardt, F., Paeschke, A., Rolfes, M.; Sendlmeier, W., and Weiss, B., "A Database of German Emotional Speech," INTERSPEECH (2005).
Vlasenko, B., Philippou-Ḧubner, D., Prylipko, D., B̈ock, R., Siegert, I., and Wendemuth, A., "Vowels formants analysis allows straightforward detection of high arousal emotions," in 2011 IEEE International Conference on Multimedia and Expo (ICME), 2011.
Ringeval, F., & Chetouani, M. (2008). "A vowel based approach for acted emotion recognition," In INTERSPEECH 2008 9th annual conference of the international speech communication association, Brisbane, Australia, 22–26 September.
Deb S., and Dandapat, S., "Emotion classification using segmentation of vowel-like and non-vowel-like regions," IEEE Transactions on Affective Computing, 2017
Moattar, M., and Homayounpour, M., "A simple but efficient real-time voice activity detection algorithm," in EUSIPCO. EURASIP, 2009, pp. 2549–2553.
Moattar, M., Homayounpour, M., and Kalantari, N., "A New Approach for Robust Real-time Voice Activity Detection Using Spectral Pattern," ICASSP, 2010, pp. 4478-4481.
H. Kawahara, and I.M.-katsuse, and A.D. Cheveign, "Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Communication, vol. 27, pp. 187–207, 1999.
El Ayadi, M., Kamel, M. S., Karray, F., "Survey on speech emotion recognition: Features, classification schemes, and databases," Pattern Recognition 44 (3) (2011) 572{587.
Busso, C., Lee, S., and Narayanan, S., "Analysis of emotionally salient aspects of fundamental frequency for emotion detection," IEEE Trans. Audio, Speech Language Process., vol. 17, no. 4, pp. 582–596, May 2009.
Lee, C., M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S., "Emotion recognition based on phoneme classes,", in Proc. of ICSL, 2004.
Eyben, F., Ẅollmer, M., Graves, A., Schuller, B., Douglas-Cowie, E., and Cowie. R., "On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues," Journal on Multimodal User Interfaces, 3(1-2):7–19, Mar. 2010.
Mansoorizadeh M., Charkari N.M., (2007) "Speech emotion recognition: comparison of speech segmentation approaches," In: Proceedings of IKT, Mashad, Iran
Elbarougy, R. and Akagi, M. “Improving Speech Emotion Dimensions Estimation Using a Three-Layer Model for Human Perception,” Journal of Acoustical Science and Technology, 35, 2, 86-98, March, 2014.
Elbarougy, R. and Akagi, M., “Optimizing fuzzy inference systems for improving speech emotion recognition,” Advances in Intelligent Systems and Computing, vol. 533, pp. 85-95, 2017.
R. Elbarougy, M. Akagi, “Feature selection method for real-time speech emotion recognition,” in Co-ordination and Standardization of Speech Databases and Assessment Techniques (CO-COSDA), 2017 20th Oriental Chapter of the International Committee for the. IEEE, 2017, pp. 1–6.

Index Terms

Computer Science

Information Sciences

Keywords

Acoustic features extraction discriminative features speech emotion recognition voiced segments unvoiced segments.