CFP last date
20 February 2025
Reseach Article

Step-by-step Approach to Automatic Speech Emotion Recognition

by Purnima Chandrasekar, Shailendra Pratap Shastri
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 37
Year of Publication: 2024
Authors: Purnima Chandrasekar, Shailendra Pratap Shastri
10.5120/ijca2024923947

Purnima Chandrasekar, Shailendra Pratap Shastri . Step-by-step Approach to Automatic Speech Emotion Recognition. International Journal of Computer Applications. 186, 37 ( Aug 2024), 37-43. DOI=10.5120/ijca2024923947

@article{ 10.5120/ijca2024923947,
author = { Purnima Chandrasekar, Shailendra Pratap Shastri },
title = { Step-by-step Approach to Automatic Speech Emotion Recognition },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2024 },
volume = { 186 },
number = { 37 },
month = { Aug },
year = { 2024 },
issn = { 0975-8887 },
pages = { 37-43 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number37/step-by-step-approach-to-automatic-speech-emotion-recognition/ },
doi = { 10.5120/ijca2024923947 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-08-31T23:18:25+05:30
%A Purnima Chandrasekar
%A Shailendra Pratap Shastri
%T Step-by-step Approach to Automatic Speech Emotion Recognition
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 37
%P 37-43
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Humans use emotions to express themselves naturally either through facial expressions or through speech. Emotions play an important role in influencing the decision-making capability of human beings as human mind is influenced by personal experiences as well as physiological, communicative and behavioral reaction to external stimulus. While considering emotions displayed through speech, one needs to understand that a speech signal not only conveys the emotional state of the speaker which is visible from the intent of the message as well as the gender of the person and the language spoken. While an effective communication between humans through speech ensures exchange of right amount of ideas, messages and perceptions, interaction between human and machine with the same intent becomes challenging as a machine is expected to mimic the mechanism of human perception. Automatic Speech Emotion recognition (ASER) systems has found usefulness in several applications viz. healthcare, counseling, call center communication etc. Primary to this system are three basic components viz. creation of emotional speech corpus, extraction of features relevant to emotion detection and classification of emotion in the test speech using appropriate classifiers. This paper surveys extensively the prominent features extracted, several dimension reduction techniques and classifiers commonly used in recent times. It also throws light on the concept of auto encoders being used in recent times in the process of ASER.

References
  1. Basu, S., Chakraborty, J., Bag, A. and Aftabuddin, M. A review on emotion recognition using speech. 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 2017, pp. 109-114, doi: 10.1109/ICICCT.2017.7975169.
  2. Ayadi, M., Kamel, M. and Karray, F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, vol. 44, Issue 3, pp. 572-587, Mar 2011
  3. Kotsakis, N., Liatsou, A., Dimoulas, C., Kalliris, G. Speech Emotion Recognition for Performance Interaction. Journal of Audio Engineering Society, vol. 66, Issue 6 pp. 457-467, June 2018
  4. Akçay, B., Oguz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116. 10.1016/j.specom.2019.12.001.
  5. Wang, C. et al. Speech emotion recognition based on multi-feature and multi‐lingual fusion. Multimed Tools Appl, 81, 4897–4907 (2022). https://doi.org/10.1007/s11042-021-10553-4
  6. Patel, N., Patel, S., Mankad, S.H. Impact of autoencoder based compact representation on emotion detection from audio. Journal of Ambient Intelligence and Humanized Computing, 13, 867–885 (2022). https://doi.org/10.1007/s12652-021-02979-3
  7. Swain, M., Routray, A. and Kabisatpathy, P. Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology, 21, 93–120 (2018). https://doi.org/10.1007/s10772-018-9491-z
  8. Emotional Speech Databases. [Online]. Available: https://link.springer.com/content/pdf/bbm:978-90-481-3129-7/1.pdf
  9. Koolagudi, S. et al. (2009), IITKGP-SESC: Speech Database for Emotion Analysis. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_46
  10. Koolagudi, S., Reddy, R., Yadav, J. and Rao, K.S. IITKGP-SEHSC: Hindi Speech Corpus for Emotion Analysis. 2011 International Conference on Devices and Communications (ICDeCom), Mesra, India, 2011, pp. 1-5, doi: 10.1109/ICDECOM.2011.5738540.
  11. Shrishrimal, P., Deshmukh, R. and Waghmare, V. Indian Language Speech Database: A Review. Intl. Journal of Computer Applications, vol.47, no. 5, pp. 17-21, June 2012
  12. How to build your own Speech Emotion Recognition? [Online]. Available: https://vivoka.com/how-to-speech-emotion-recognition/
  13. Alex, S., Mary, L and Babu, B. Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features. Circuits Syst Signal Process 39, 5681–5709 (2020). https://doi.org/10.1007/s00034-020-01429-3
  14. Zhang, S., Zhang, S., Huang, T. and Gao, W. Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching. in IEEE Transactions on Multimedia, vol. 20, no. 6, pp. 1576-1590, June 2018, doi: 10.1109/TMM.2017.2766843.
  15. Bandela, S., and Kumar, T. Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC. 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Delhi, India, 2017, pp. 1-5, doi: 10.1109/ICCCNT.2017.8204149.
  16. Letaifa, L., Torres, M. and Justo, R. Adding dimensional features for emotion recognition on speech. 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 2020, pp. 1-6, doi: 10.1109/ATSIP49331.2020.9231766.
  17. Alex, S. and Mary, L. Variational autoencoder for prosody-based speaker recognition. ETRI Journal, 45 (2023), pp. 678–689. https://doi.org/10.4218/etrij.2021-0377
  18. Xia, R. and Liu, Y. Using denoising autoencoder for emotion recognition. In Interspeech, pp. 2886-2889. 2013.
  19. Deng, J. Zhang, Z., Marchi, E. and Schuller, B. Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition. 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2013, pp. 511-516, doi: 10.1109/ACII.2013.90.
  20. Bhaswara, I.D. (2020) Exploration of autoencoder as feature extractor for face recognition system. [Online]. Available: https://essay.utwente.nl/83138/
  21. Chebbi, S. and Jebara, S. On the use of Pitch-based features for fear emotion Detection from Speech. 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Mar 2018
  22. Huang, C., Gong, W., Fu, W. and Feng, D. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM. Mathematical Problems in Engineering, vol. 2014
  23. Khulage, A. and Pathak, B. Analysis of speech under stress using Linear Techniques and Non-Linear techniques for Emotion Recognition System. Jul 2012, https://doi.org/10.48550/arXiv.1207.5104
  24. LPCC Features [Online]. Available: https://link.springer.com/content/pdf/bbm%3A978-3-319-17163-0%2F1.pdf
  25. Shah, A., Kattel, M., Nepal, A. and Shrestha, D. Chroma Feature Extraction. Jan 2019
  26. Revathi, A., Sasikaladevi, N., Nagakrishnan, R. et al. Robust emotion recognition from speech: Gamma tone features and models. Int J Speech Technol 21, 723–739 (2018). https://doi.org/10.1007/s10772-018-9546-1
  27. Dmitrieva, E. and Nikitin, K. Design of Automatic Speech Emotion Recognition System. Proceedings of the International Workshop on Applications in Information Technology, pp. 47-50, 2015
  28. Schuller, B., Reiter, S. and Rigoll, G. Evolutionary feature generation in speech emotion recognition. IEEE International Conference on Multimedia and Expo. IEEE, pp. 5-8, 2006
  29. Kadiri, S., Gangamohan, P., Gangashetty, S. et al., Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference. Circuits Syst Signal Process 39, 4459–4481 (2020). https://doi.org/10.1007/s00034-020-01377-y
  30. Amartya, J.G.M., Kumar, S.M. Speech Emotion Recognition in Machine Learning to Improve Accuracy using Novel Support Vector Machine and Compared with Decision Tree Algorithm. Journal of Pharmaceutical Negative Results, vol. 13, no. 4, pp. 185-192, 2022
  31. Koduru, A., Valiveti, H.B. and Budati, A.K. Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, vol. 23, pp. 45-55, Jan 2020
  32. Sahu, S. et al. Adversarial Auto-encoders for Speech Based Emotion Recognition. arXiv preprint arXiv:1806.02146 (2018).
  33. Partila, P., Voznak, M. and Tovarek, J. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System. The Scientific World Journal, vol. 2015, Article ID 573068, pp. 1-7, 2015. https://doi.org/10.1155/2015/573068
  34. Madanian, S. et al. Speech emotion recognition using machine learning — A systematic review. Intelligent Systems with Applications, vol. 20, Nov 2023
  35. Confusion Matrix, Accuracy, Precision, Recall & F1 Score: Interpretation of Performance Measures [Online]. Available: https://www.linkedin.com/pulse/confusion-matrix-accuracy-precision-recall-f1-score-measures-silwal#:~:text=F1%20score%20is%20a%20weighted,have%20an%20uneven%20class%20distribution.
  36. Classification: Precision and Recall [Online]. Available: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall
Index Terms

Computer Science
Information Sciences
Pattern Recognition

Keywords

ASER feature extraction dimensionality reduction auto encoders