CFP last date
20 November 2025
Call for Paper
December Edition
IJCA solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 20 November 2025

Submit your paper
Know more
Random Articles
Reseach Article

A Hybrid Transformer-CNN Framework with Early and Late Fusion for Robust Skin Lesion Classification

by Raihan Tanvir
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 51
Year of Publication: 2025
Authors: Raihan Tanvir
10.5120/ijca2025925852

Raihan Tanvir . A Hybrid Transformer-CNN Framework with Early and Late Fusion for Robust Skin Lesion Classification. International Journal of Computer Applications. 187, 51 ( Oct 2025), 17-23. DOI=10.5120/ijca2025925852

@article{ 10.5120/ijca2025925852,
author = { Raihan Tanvir },
title = { A Hybrid Transformer-CNN Framework with Early and Late Fusion for Robust Skin Lesion Classification },
journal = { International Journal of Computer Applications },
issue_date = { Oct 2025 },
volume = { 187 },
number = { 51 },
month = { Oct },
year = { 2025 },
issn = { 0975-8887 },
pages = { 17-23 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number51/a-hybrid-transformer-cnn-framework-with-early-and-late-fusion-for-robust-skin-lesion-classification/ },
doi = { 10.5120/ijca2025925852 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-10-23T00:18:42+05:30
%A Raihan Tanvir
%T A Hybrid Transformer-CNN Framework with Early and Late Fusion for Robust Skin Lesion Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 51
%P 17-23
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Skin lesion classification is a critical task in dermatological diagnosis, where early detection can significantly improve patient outcomes. The DermaMNIST dataset, a curated benchmark within the MedMNIST collection, provides a challenging testbed due to limited resolution, intraclass similarity, and class imbalance. In this work, we investigate the performance of advanced deep learning architectures, including Swin Transformer, ConvNeXt, and Vision Transformers, alongside fusion strategies that combine complementary representations. Specifically, we implement early fusion through feature concatenation and late fusion through ensemble averaging of logits. Our experiments on DermaMNIST with images of 224 × 224 resolution, demonstrate that Swin Transformer achieves an accuracy of 0.893, outperforming ConvNeXt (0.871), and Vision Transformer (0.873). Fusion strategies further improve robustness, with late fusion achieving the best accuracy of 0.895. Compared to the reported Google AutoML Vision baseline (0.768 accuracy), our models establish a new state-of-the-art on DermaMNIST. These results highlight the efficacy of hybrid deep learning strategies that integrate convolutional and transformerbased architectures for medical image classification.

References
  1. Titus Josef Brinker, Achim Hekler, Alexander H Enk, Joachim Klode, Axel Hauschild, Carola Berking, Sebastian Haferkamp, Dirk Schadendorf, Tim Holland-Letz, Jochen S Utikal, et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. European Journal of Cancer, 113:47–54, 2019.
  2. Benjamin Cassidy, Connah Kendrick, Andrzej Brodzicki, Joanna Jaworek-Korjakowska, and Moi Hoon Yap. Analysis of the isic 2020 dataset using ensemble methods for skin lesion classification. Medical Image Analysis, 78:102412, 2022.
  3. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021.
  4. Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, 2017.
  5. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015.
  6. Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021.
  7. Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11976–11986, 2022.
  8. Amirreza Mahbod, Georg Schaefer, Chunliang Wang, Rupert Ecker, and Isabella Ellinger. Transfer learning using vision transformers for skin lesion classification. International Symposium on Biomedical Imaging (ISBI), pages 1157–1160, 2021.
  9. Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks. CoRR, abs/1511.08458, 2015.
  10. Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. Early versus late fusion in semantic video analysis. ACM International Conference on Multimedia, pages 399–402, 2005.
  11. Mingxing Tan and Quoc Le. Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning (ICML), pages 10096–10106. PMLR, 2021.
  12. Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The ham10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions. Scientific Data, 5:180161, 2018.
  13. Jiancheng Yang, Rui Shi, Donglai Wei, Ziming Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, and Bingbing Ni. Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41, 2023.
  14. Jiancheng Yang, Rui Shi, DonglaiWei, Lin Zhao, Yunxiang Lei, Hao Li, Ziyan Xu, Dong Ni, Ali Hatamizadeh, Holger R Roth, et al. Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41, 2023.
  15. Yu Zhang, Xiaohan Li, Haifeng Chen, and Ge Liu. Hybrid cnn-transformer architecture for skin lesion classification. Journal of Medical Imaging, 10(2):024502, 2023.
Index Terms

Computer Science
Information Sciences

Keywords

Skin Lesion Classification DermaMNIST Swin Transformer Vision Transformer ConvNeXt Early Fusion Late Fusion Hybrid Models Deep Learning