CFP last date
22 December 2025
Call for Paper
January Edition
IJCA solicits high quality original research papers for the upcoming January edition of the journal. The last date of research paper submission is 22 December 2025

Submit your paper
Know more
Random Articles
Reseach Article

Deep Learning for Edge AI: SqueezeNet CNN Training on Distributed ARM-based Clusters

by Dimitrios Papakyriakou, Ioannis S. Barbounakis
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 47
Year of Publication: 2025
Authors: Dimitrios Papakyriakou, Ioannis S. Barbounakis
10.5120/ijca2025925785

Dimitrios Papakyriakou, Ioannis S. Barbounakis . Deep Learning for Edge AI: SqueezeNet CNN Training on Distributed ARM-based Clusters. International Journal of Computer Applications. 187, 47 ( Oct 2025), 6-17. DOI=10.5120/ijca2025925785

@article{ 10.5120/ijca2025925785,
author = { Dimitrios Papakyriakou, Ioannis S. Barbounakis },
title = { Deep Learning for Edge AI: SqueezeNet CNN Training on Distributed ARM-based Clusters },
journal = { International Journal of Computer Applications },
issue_date = { Oct 2025 },
volume = { 187 },
number = { 47 },
month = { Oct },
year = { 2025 },
issn = { 0975-8887 },
pages = { 6-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number47/deep-learning-for-edge-ai-squeezenet-cnn-training-on-distributed-arm-based-clusters/ },
doi = { 10.5120/ijca2025925785 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-10-23T00:18:06+05:30
%A Dimitrios Papakyriakou
%A Ioannis S. Barbounakis
%T Deep Learning for Edge AI: SqueezeNet CNN Training on Distributed ARM-based Clusters
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 47
%P 6-17
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The increasing demand for lightweight and energy-efficient deep learning models at the edge has fueled interest in training convolutional neural networks (CNNs) directly on ARM-based CPU clusters. This study examines the feasibility and performance constraints of distributed training for the compact SqueezeNet v1.1 architecture, implemented using an MPI-based parallel framework on a Beowulf cluster composed of Raspberry Pi devices. Experimental evaluation across up to 24 Raspberry Pi nodes (48 MPI processes) reveals a sharp trade-off between training acceleration and model generalization. While wall-clock training time improves by over (11×) under increased parallelism, test accuracy deteriorates significantly, collapsing to chance-level performance (≈10%) as data partitions per process become excessively small. This behavior highlights a statistical scaling limit, beyond which computational gains are offset by learning inefficiency. The findings are consistent with the statistical bottlenecks identified by Shallue et al. (2019) [11], extending their observations from large-scale GPU/CPU systems to energy-constrained ARM-based edge clusters. These findings underscore the importance of balanced task decomposition in CPU-bound environments and contribute new insights into the complex interplay between model compactness, data sparsity, and parallel training efficiency in edge-AI systems. This framework also provides a viable low-power platform for real-time SNN research on edge devices.

References
  1. Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637–646. https://doi.org/10.1109/JIOT.2016.2579198.
  2. Li, S., Xu, L. D., & Zhao, S. (2018). 5G Internet of Things: A survey. Journal of Industrial Information Integration,10,1-9 https://doi.org/10.1016/j.jii.2018.01.005
  3. Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S. (2017). Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12), 2295–2329. https://doi.org/10.1109/JPROC.2017.2761740
  4. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., … & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. https://arxiv.org/abs/1704.04861
  5. Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv Preprint. https://doi.org/10.48550/arXiv.1602.07360
  6. Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1608.08710
  7. Ramesh, S., & Chakrabarty, K. (2021). Challenges and opportunities in training deep neural networks on edge devices. ACM Transactions on Embedded Computing Systems (TECS), 20(5s), 1–26. https://doi.org/10.1145/3477084
  8. Raspberry Pi 4 Model B. [Online]. Available: raspberrypi.com/products/raspberry-pi-4-model-b/.
  9. Raspberry Pi 4 Model B specifications. [Online]. Available: https://magpi.raspberrypi.com/articles/raspberry-pi-4-specs-benchmarks.
  10. Masters, D., & Luschi, C. (2018). Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612. https://arxiv.org/abs/1804.07612
  11. Shallue, C. J., Lee, J., Antognini, J., Sohl-Dickstein, J., Frostig, R., & Dahl, G. E. (2019). Measuring the effects of data parallelism on neural network training. Journal of Machine Learning Research, 20(112), 1–49. http://jmlr.org/papers/v20/18-789.html
  12. Ben-Nun, T., & Hoefler, T. (2019). Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. ACM Computing Surveys, 52(4), 1–43. https://doi.org/10.1145/3320060
Index Terms

Computer Science
Information Sciences

Keywords

SqueezeNet Distributed Deep Learning Edge Computing Raspberry Pi Cluster Beowulf Cluster ARM Architecture MPI (Message Passing Interface) Low-Power AI Strong Scaling Model Generalization Statistical Scaling Limit