International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 111 - Number 3 |
Year of Publication: 2015 |
Authors: Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin |
10.5120/19515-1135 |
Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin . A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size. International Journal of Computer Applications. 111, 3 ( February 2015), 1-4. DOI=10.5120/19515-1135
MapReduce is a powerful distributed processing model for large datasets. Hadoop is an open source framework and implementation of MapReduce. Hadoop distributed file system (HDFS) has become very popular to build large scale and high performance distributed data processing system. HDFS is designed mainly to handle big size files, so the processing of massive small files is a challenge in native HDFS. This paper focuses on introducing an approach to optimize the performance of processing of massive small files on HDFS. We design a new HDFS structure model which main idea is to merge the small files and write the small files at source direct into merged file. Experimental results show that the proposed scheme can improve the storage and access efficiencies of massive small files effectively on HDFS.