Step-by-step Approach to Automatic Speech Emotion Recognition

Purnima Chandrasekar; Shailendra Pratap Shastri

Call for Paper

October Edition

IJCA solicits high quality original research papers for the upcoming October edition of the journal. The last date of research paper submission is 22 September 2025

Submit your paper

Know more

The week's pick

RESPONSIVE WEB DESIGN FOR ENHANCED USER EXPERIENCE (UX) AND USER INTERFACE (UI)

Victor Aienobe Muhammad Zahid Iqbal

Random Articles

Reseach Article

Step-by-step Approach to Automatic Speech Emotion Recognition

by Purnima Chandrasekar, Shailendra Pratap Shastri

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 186 - Number 37

Year of Publication: 2024

Authors: Purnima Chandrasekar, Shailendra Pratap Shastri

10.5120/ijca2024923947

Purnima Chandrasekar, Shailendra Pratap Shastri . Step-by-step Approach to Automatic Speech Emotion Recognition. International Journal of Computer Applications. 186, 37 ( Aug 2024), 37-43. DOI=10.5120/ijca2024923947

@article{ 10.5120/ijca2024923947,

author = { Purnima Chandrasekar, Shailendra Pratap Shastri },

title = { Step-by-step Approach to Automatic Speech Emotion Recognition },

journal = { International Journal of Computer Applications },

issue_date = { Aug 2024 },

volume = { 186 },

number = { 37 },

month = { Aug },

year = { 2024 },

issn = { 0975-8887 },

pages = { 37-43 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume186/number37/step-by-step-approach-to-automatic-speech-emotion-recognition/ },

doi = { 10.5120/ijca2024923947 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-08-31T23:18:25+05:30

%A Purnima Chandrasekar

%A Shailendra Pratap Shastri

%T Step-by-step Approach to Automatic Speech Emotion Recognition

%J International Journal of Computer Applications

%@ 0975-8887

%V 186

%N 37

%P 37-43

%D 2024

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Humans use emotions to express themselves naturally either through facial expressions or through speech. Emotions play an important role in influencing the decision-making capability of human beings as human mind is influenced by personal experiences as well as physiological, communicative and behavioral reaction to external stimulus. While considering emotions displayed through speech, one needs to understand that a speech signal not only conveys the emotional state of the speaker which is visible from the intent of the message as well as the gender of the person and the language spoken. While an effective communication between humans through speech ensures exchange of right amount of ideas, messages and perceptions, interaction between human and machine with the same intent becomes challenging as a machine is expected to mimic the mechanism of human perception. Automatic Speech Emotion recognition (ASER) systems has found usefulness in several applications viz. healthcare, counseling, call center communication etc. Primary to this system are three basic components viz. creation of emotional speech corpus, extraction of features relevant to emotion detection and classification of emotion in the test speech using appropriate classifiers. This paper surveys extensively the prominent features extracted, several dimension reduction techniques and classifiers commonly used in recent times. It also throws light on the concept of auto encoders being used in recent times in the process of ASER.

References

Basu, S., Chakraborty, J., Bag, A. and Aftabuddin, M. A review on emotion recognition using speech. 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 2017, pp. 109-114, doi: 10.1109/ICICCT.2017.7975169.
Ayadi, M., Kamel, M. and Karray, F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, vol. 44, Issue 3, pp. 572-587, Mar 2011
Kotsakis, N., Liatsou, A., Dimoulas, C., Kalliris, G. Speech Emotion Recognition for Performance Interaction. Journal of Audio Engineering Society, vol. 66, Issue 6 pp. 457-467, June 2018
Akçay, B., Oguz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116. 10.1016/j.specom.2019.12.001.
Wang, C. et al. Speech emotion recognition based on multi-feature and multi‐lingual fusion. Multimed Tools Appl, 81, 4897–4907 (2022). https://doi.org/10.1007/s11042-021-10553-4
Patel, N., Patel, S., Mankad, S.H. Impact of autoencoder based compact representation on emotion detection from audio. Journal of Ambient Intelligence and Humanized Computing, 13, 867–885 (2022). https://doi.org/10.1007/s12652-021-02979-3
Swain, M., Routray, A. and Kabisatpathy, P. Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology, 21, 93–120 (2018). https://doi.org/10.1007/s10772-018-9491-z
Emotional Speech Databases. [Online]. Available: https://link.springer.com/content/pdf/bbm:978-90-481-3129-7/1.pdf
Koolagudi, S. et al. (2009), IITKGP-SESC: Speech Database for Emotion Analysis. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_46
Koolagudi, S., Reddy, R., Yadav, J. and Rao, K.S. IITKGP-SEHSC: Hindi Speech Corpus for Emotion Analysis. 2011 International Conference on Devices and Communications (ICDeCom), Mesra, India, 2011, pp. 1-5, doi: 10.1109/ICDECOM.2011.5738540.
Shrishrimal, P., Deshmukh, R. and Waghmare, V. Indian Language Speech Database: A Review. Intl. Journal of Computer Applications, vol.47, no. 5, pp. 17-21, June 2012
How to build your own Speech Emotion Recognition? [Online]. Available: https://vivoka.com/how-to-speech-emotion-recognition/
Alex, S., Mary, L and Babu, B. Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features. Circuits Syst Signal Process 39, 5681–5709 (2020). https://doi.org/10.1007/s00034-020-01429-3
Zhang, S., Zhang, S., Huang, T. and Gao, W. Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching. in IEEE Transactions on Multimedia, vol. 20, no. 6, pp. 1576-1590, June 2018, doi: 10.1109/TMM.2017.2766843.
Bandela, S., and Kumar, T. Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC. 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Delhi, India, 2017, pp. 1-5, doi: 10.1109/ICCCNT.2017.8204149.
Letaifa, L., Torres, M. and Justo, R. Adding dimensional features for emotion recognition on speech. 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 2020, pp. 1-6, doi: 10.1109/ATSIP49331.2020.9231766.
Alex, S. and Mary, L. Variational autoencoder for prosody-based speaker recognition. ETRI Journal, 45 (2023), pp. 678–689. https://doi.org/10.4218/etrij.2021-0377
Xia, R. and Liu, Y. Using denoising autoencoder for emotion recognition. In Interspeech, pp. 2886-2889. 2013.
Deng, J. Zhang, Z., Marchi, E. and Schuller, B. Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition. 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2013, pp. 511-516, doi: 10.1109/ACII.2013.90.
Bhaswara, I.D. (2020) Exploration of autoencoder as feature extractor for face recognition system. [Online]. Available: https://essay.utwente.nl/83138/
Chebbi, S. and Jebara, S. On the use of Pitch-based features for fear emotion Detection from Speech. 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Mar 2018
Huang, C., Gong, W., Fu, W. and Feng, D. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM. Mathematical Problems in Engineering, vol. 2014
Khulage, A. and Pathak, B. Analysis of speech under stress using Linear Techniques and Non-Linear techniques for Emotion Recognition System. Jul 2012, https://doi.org/10.48550/arXiv.1207.5104
LPCC Features [Online]. Available: https://link.springer.com/content/pdf/bbm%3A978-3-319-17163-0%2F1.pdf
Shah, A., Kattel, M., Nepal, A. and Shrestha, D. Chroma Feature Extraction. Jan 2019
Revathi, A., Sasikaladevi, N., Nagakrishnan, R. et al. Robust emotion recognition from speech: Gamma tone features and models. Int J Speech Technol 21, 723–739 (2018). https://doi.org/10.1007/s10772-018-9546-1
Dmitrieva, E. and Nikitin, K. Design of Automatic Speech Emotion Recognition System. Proceedings of the International Workshop on Applications in Information Technology, pp. 47-50, 2015
Schuller, B., Reiter, S. and Rigoll, G. Evolutionary feature generation in speech emotion recognition. IEEE International Conference on Multimedia and Expo. IEEE, pp. 5-8, 2006
Kadiri, S., Gangamohan, P., Gangashetty, S. et al., Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference. Circuits Syst Signal Process 39, 4459–4481 (2020). https://doi.org/10.1007/s00034-020-01377-y
Amartya, J.G.M., Kumar, S.M. Speech Emotion Recognition in Machine Learning to Improve Accuracy using Novel Support Vector Machine and Compared with Decision Tree Algorithm. Journal of Pharmaceutical Negative Results, vol. 13, no. 4, pp. 185-192, 2022
Koduru, A., Valiveti, H.B. and Budati, A.K. Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, vol. 23, pp. 45-55, Jan 2020
Sahu, S. et al. Adversarial Auto-encoders for Speech Based Emotion Recognition. arXiv preprint arXiv:1806.02146 (2018).
Partila, P., Voznak, M. and Tovarek, J. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System. The Scientific World Journal, vol. 2015, Article ID 573068, pp. 1-7, 2015. https://doi.org/10.1155/2015/573068
Madanian, S. et al. Speech emotion recognition using machine learning — A systematic review. Intelligent Systems with Applications, vol. 20, Nov 2023
Confusion Matrix, Accuracy, Precision, Recall & F1 Score: Interpretation of Performance Measures [Online]. Available: https://www.linkedin.com/pulse/confusion-matrix-accuracy-precision-recall-f1-score-measures-silwal#:~:text=F1%20score%20is%20a%20weighted,have%20an%20uneven%20class%20distribution.
Classification: Precision and Recall [Online]. Available: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall

Index Terms

Computer Science

Information Sciences

Pattern Recognition

Keywords

ASER feature extraction dimensionality reduction auto encoders