International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 49 |
Year of Publication: 2024 |
Authors: Syed Murtoza Mushrul Pasha, Shahidur Rahoman Sohag, Muhammad Mahin Ali |
10.5120/ijca2024924154 |
Syed Murtoza Mushrul Pasha, Shahidur Rahoman Sohag, Muhammad Mahin Ali . Enhancing Audio Classification with a CNN-Attention Model: Robust Performance and Resilience Against Backdoor Attacks. International Journal of Computer Applications. 186, 49 ( Nov 2024), 26-33. DOI=10.5120/ijca2024924154
Audio classification plays a vital role in diverse fields such as communication, medical diagnostics, and forensic analysis, where accurate and reliable processing of audio signals is critical. This study presents a Convolutional Neural Network (CNN)-Attention framework designed to enhance performance and robustness in audio classification, addressing challenges such as adversarial threats, including backdoor attacks, which compromise model reliability. The framework achieves notable improvements in classification accuracy, demonstrating up to 43.16% higher accuracy compared to traditional CNN models when evaluated on benchmark datasets such as UrbanSound8K, FSDKaggle2018, and ESC-50. Additionally, the framework achieves a peak accuracy of 98.41% on the UrbanSound8K dataset, underscoring its exceptional performance in real-world scenarios. Alongside its superior classification performance, the system exhibits strong resilience against adversarial attacks, maintaining the integrity and reliability of predictions under challenging conditions. By integrating attention mechanisms and leveraging advanced data augmentation techniques like time-stretching and pitch-shifting, the framework significantly improves testing accuracy by 9.74%, 33.53%, and 43.16% across the three datasets, respectively. These advancements highlight its potential to effectively process and analysis audio data across various environments. This framework demonstrates its significance in applications demanding exceptional reliability and precision, establishing a benchmark for audio classification tasks across vital domains, including environmental monitoring, assistive technologies, and intelligent surveillance systems.