CFP last date
20 May 2025
Reseach Article

Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: STREAM Benchmarking

by Dimitrios Papakyriakou, Ioannis S. Barbounakis
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 78
Year of Publication: 2025
Authors: Dimitrios Papakyriakou, Ioannis S. Barbounakis
10.5120/ijca2025924687

Dimitrios Papakyriakou, Ioannis S. Barbounakis . Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: STREAM Benchmarking. International Journal of Computer Applications. 186, 78 ( Apr 2025), 41-55. DOI=10.5120/ijca2025924687

@article{ 10.5120/ijca2025924687,
author = { Dimitrios Papakyriakou, Ioannis S. Barbounakis },
title = { Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: STREAM Benchmarking },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2025 },
volume = { 186 },
number = { 78 },
month = { Apr },
year = { 2025 },
issn = { 0975-8887 },
pages = { 41-55 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number78/performance-analysis-of-raspberry-pi-4b-8gb-beowulf-cluster-stream-benchmarking/ },
doi = { 10.5120/ijca2025924687 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-04-26T02:19:17+05:30
%A Dimitrios Papakyriakou
%A Ioannis S. Barbounakis
%T Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: STREAM Benchmarking
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 78
%P 41-55
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This study presents a detailed performance analysis of a 24-node Beowulf cluster built with Raspberry Pi 4B devices, equipped with 8GB of RAM, running a 64-bit operating system utilizing the STREAM Benchmark which is a widely recognized tool for evaluating memory bandwidth performance in high-performance computing (HPC) environments. Unlike typical processor benchmarks that focus on computing power, STREAM a memory bandwidth benchmark focuses on how quickly data can be transferred between the memory and the processor, which is a critical performance factor in high-performance computing (HPC) systems like Beowulf clusters. Fundamental memory operations Copy, Scale, Add, and Triad, are utilized to assess how efficiently the cluster handles memory-intensive workloads across increasing MPI process counts. Additionally, MPI-based communication benchmarks assess the inter-node message-passing performance, providing deeper insights into memory bandwidth utilization under distributed computing conditions. The findings offer valuable insights on the perspectives of using Raspberry Pi clusters for HPC applications in education, research, and prototyping. Furthermore, recommendations for performance optimizations and system enhancements are proposed to improve scalability, efficiency, and communication overhead in such low-cost HPC clusters.

References
  1. Z. Xu, W. Zhang, and A. Y. Zomaya, "A heterogeneous platform with GPU and FPGA for power-efficient high-performance computing," 2014 IEEE International Symposium on Integrated Circuits (ISIC), 2014, pp. 1-4, doi: 10.1109/ISIC.2014.7029447.
  2. C. Pilato, H. Patel, and J. Teich, "Heterogeneous computing utilizing FPGAs," Journal of Signal Processing Systems, vol. 90, no. 3, pp. 471-482, 2018, doi: 10.1007/s11265-018-1382-7.
  3. Raspberry Pi 4 Model B. [Online]. Available: raspberrypi.com/products/raspberry-pi-4-model-b/.
  4. Raspberry Pi 4 Model B specifications. [Online]. Available: https://magpi.raspberrypi.com/articles/raspberry-pi-4-specs-benchmarks
  5. McCalpin, J. D. (1995). Memory bandwidth and machine balance in current high-performance computers. IEEE Technical Committee on Computer Architecture (TCCA) Newsletter. Retrieved from https://www.cs.virginia.edu/stream/ref.html
  6. Henning, S., & Hasselbring, W. (2023). Benchmarking Distributed Stream Data Processing Systems. arXiv preprint arXiv:2303.11088. Retrieved from https://arxiv.org/pdf/1802.08496
  7. Gupta, N., Brandt, S. R., Wagle, B., Nanmiao, Kheirkhahan,A., Diehl, P., Kaiser, H., & Baumann, F. W. (2020). Deploying a Task-based Runtime System on Raspberry Pi Clusters. arXiv preprint arXiv:2010.04106
  8. Fridman, Y., Desai, S. M., Singh, N., Willhalm, T., & Oren, G. (2023). CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach. arXiv preprint arXiv:2308.10714.
  9. Dimitrios Papakyriakou, Ioannis S. Barbounakis. High Performance Linpack (HPL) Benchmark on Raspberry Pi 4B (8GB) Beowulf Cluster. International Journal of Computer Applications. 185, 25 (Jul 2023), 11-19. DOI=10.5120/ijca2023923005
Index Terms

Computer Science
Information Sciences

Keywords

Raspberry Pi 4 Beowulf cluster Cluster Message Passing Interface (MPI) MPICH Memory Performance Low-cost Clusters Parallel Computing ARM Architecture STREAM Benchmark