International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 146 - Number 1 |
Year of Publication: 2016 |
Authors: Bharti Gupta, Rajender Nath, Girdhar Gopal, Kartik |
10.5120/ijca2016910611 |
Bharti Gupta, Rajender Nath, Girdhar Gopal, Kartik . An Efficient Approach for Storing and Accessing Small Files with Big Data Technology. International Journal of Computer Applications. 146, 1 ( Jul 2016), 36-39. DOI=10.5120/ijca2016910611
Hadoop is an open source Apache project and a software framework for distributed processing of large datasets across large clusters of computers with commodity hardware. Large datasets include terabytes or petabytes of data where as large clusters means hundreds or thousands of nodes. It supports master slave architecture, which involves one master node and thousands of slave nodes. NameNode acts as the master node which stores all the metadata of files and various data nodes are slave nodes which stores all the application data. It becomes a bottleneck, when there is a need to process numerous number of small files because the NameNode utilizes the more memory to store the metadata of files and data nodes consume more CPU time to process numerous number of small files. This paper presents a novel technique to handle small file problems with Hadoop technology based on file merging, caching and correlation strategies. The experimental results shows that the proposed technique reduces the amount of data storage at NameNode, average memory usage of DataNodes and improves the access efficiency of small files in Hadoop Distributed File System up to 88.57% as compared with the general solution Hadoop Archive.