International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 187 - Number 5 |
Year of Publication: 2025 |
Authors: Saumyadeep Singh, Syed Wajahat Abbas Rizvi |
![]() |
Saumyadeep Singh, Syed Wajahat Abbas Rizvi . Speech Emotion Recognition Combining Acoustic Features and Linguistic Information using Network Architecture. International Journal of Computer Applications. 187, 5 ( May 2025), 30-34. DOI=10.5120/ijca2025924860
For more accurate speaker identification in emotion-driven human-robot interaction, we suggest a unique method for combining linguistic information and acoustic to improve automated speech recognition (ASR) performance. This study creates a model with two primary components and divides emotional states into seven distinct categories. Contour, pitch, and energy spectrum characteristics are important criteria for analysis in the first component, which focuses on emotion identification from audio information. Using emotional phrases, the second component uses linguistic information to identify emotions in conversational material. We investigate a number of classification techniques, such as neural networks, auxiliary vector machines, linear classifiers, and Gaussian mixture models, in order to assess the efficacy of our methodology. The accuracy with which these methods can categorize emotional states is the basis for their evaluation. Ultimately, a neural network is used to combine soft judgments from language and auditory models, guaranteeing a more thorough and reliable emotion identification system. Two corpora of emotional speech are used for training and validation in order to evaluate performance. When compared to models that just use individual variables, the results show that combining language and auditory information greatly improves the accuracy of emotion identification. Enhancing ASR reliability and maximizing human-robot interaction depend on improvements in speaker emotion recognition, which this development helps to achieve. We also go over how our strategy stacks up against other approaches, emphasizing quantifiable benefits from our integration approach. The results show how well our model can identify emotions in a variety of speech situations, opening the door for more sophisticated and sensitive speech recognition systems. This work advances the creation of more responsive and intuitive human-robot communication by improving emotion identification algorithms, which is important for applications in assistive technology, customer service, and healthcare.