| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 121 |
| Year of Publication: 2026 |
| Authors: Md. Attaur Rahman Sofi, Mohd. Yousuf |
10.5120/ijca57e5a1e8f472
|
Md. Attaur Rahman Sofi, Mohd. Yousuf . AutoScale-ML with HASA: A Docker-based Framework for Distributed AutoML Model Selection. International Journal of Computer Applications. 187, 121 ( Jun 2026), 8-14. DOI=10.5120/ijca57e5a1e8f472
This paper presents AutoScale-ML with HASA, a hierarchical adaptive search framework for automated machine learning (AutoML) model selection, implemented within a simulated seven-node distributed computing environment consisting of one master node and six independent Docker containers, each exposing a REST endpoint through Flask. Each worker trains a randomly assigned Scikit-learn classifier drawn from RandomForest, GradientBoosting, ExtraTrees, DecisionTree, and LogisticRegression on a 50,000-sample synthetic classification dataset (50 features, 20 informative) generated via scikit-learn's make_classification, and returns accuracy, training runtime, simulated network delay, and a composite score to a central master process. The master applies a three-phase Hierarchical Adaptive Search Algorithm (HASA): Phase 1 collects all six worker evaluations and retains the top-4 by composite score; Phase 2 re-ranks those four candidates and retains the top-2; Phase 3 selects the single best model by maximum composite score. Experimental results—including per-model benchmarks, phase-by-phase HASA traces, penalty coefficient sensitivity analysis, and network delay characterisation—demonstrate that the framework effectively balances prediction accuracy and computational efficiency through runtime-aware hierarchical model selection. Comprehensive evaluation across five classifier families reveals that the composite scoring function heavily penalises ensemble training times, often favouring lightweight models over higher-accuracy alternatives. The penalty coefficient α is shown to be a critical first-class configuration parameter that must be calibrated to deployment context. The findings highlight the framework's usefulness as a reproducible baseline for containerised AutoML experimentation.