International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 179 - Number 40 |
Year of Publication: 2018 |
Authors: Jasma Balasangameshwara, Chandrakala H. L. |
10.5120/ijca2018916953 |
Jasma Balasangameshwara, Chandrakala H. L. . Performance-Driven Load Balancing for Distributed File Systems in Clouds. International Journal of Computer Applications. 179, 40 ( May 2018), 39-50. DOI=10.5120/ijca2018916953
Distributed file systems are the fundamental units for cloud applications where in the data node concurrently serves the computing and storage functions. In these file systems, a file is split by a master node into a set of file chunks and allotted to separate data nodes such that various jobs can be carried out in parallel across the data nodes. However, the unpredictability of the nodes and dynamism in the number of files raise the need for uniform re-distribution of files to prevent the adverse effects of load imbalance. Hence, the latest enhancement to distributed file systems is a decentralized and asynchronous load rebalancing algorithm that exploits both heterogeneity and movement cost for file chunk allocation among data nodes. But, the load rebalancing protocol has its basis in a randomized method wherein the data node periodically collects and sorts the storage load status of an instance of arbitrary chosen data nodes without considering their computational capabilities or the physical proximity information thereby introducing not only considerable workload on the data nodes but also high overhead on message exchanges among data nodes thus leading to reducing scalability. Moreover, the distributed load re-balancing approach does not consider the additional redundant overhead on the data nodes from the federated, load imbalanced master nodes. In the current study, a completely distributed performance-driven load balancing approach (PDLB) that employs Zero-Hop Hash Table (ZHT) and Modified Firefly Algorithm (MFA) is suggested for coping with the load imbalance issue on both master node and data node. The aim of PDLB is to arrive at data allocations among nodes that could achieve maximum resource utilization at optimized movement cost and minimized message exchanges and algorithmic overhead. The experimental results indicate that PDLB performs better than the earlier distributed protocol about overhead on message exchanges, scalability, movement cost, load imbalance factors as well as algorithmic overheads.