International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 187 - Number 51 |
Year of Publication: 2025 |
Authors: Raihan Tanvir |
![]() |
Raihan Tanvir . A Hybrid Transformer-CNN Framework with Early and Late Fusion for Robust Skin Lesion Classification. International Journal of Computer Applications. 187, 51 ( Oct 2025), 17-23. DOI=10.5120/ijca2025925852
Skin lesion classification is a critical task in dermatological diagnosis, where early detection can significantly improve patient outcomes. The DermaMNIST dataset, a curated benchmark within the MedMNIST collection, provides a challenging testbed due to limited resolution, intraclass similarity, and class imbalance. In this work, we investigate the performance of advanced deep learning architectures, including Swin Transformer, ConvNeXt, and Vision Transformers, alongside fusion strategies that combine complementary representations. Specifically, we implement early fusion through feature concatenation and late fusion through ensemble averaging of logits. Our experiments on DermaMNIST with images of 224 × 224 resolution, demonstrate that Swin Transformer achieves an accuracy of 0.893, outperforming ConvNeXt (0.871), and Vision Transformer (0.873). Fusion strategies further improve robustness, with late fusion achieving the best accuracy of 0.895. Compared to the reported Google AutoML Vision baseline (0.768 accuracy), our models establish a new state-of-the-art on DermaMNIST. These results highlight the efficacy of hybrid deep learning strategies that integrate convolutional and transformerbased architectures for medical image classification.