High Scalability of HDFS using Distributed Namespace

Harcharan Jit Singh; V. P. Singh

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Article:A Comparative study of Face Recognition with Principal Component Analysis and Cross-Correlation Technique

November

2010

Evaluating Embedded GPUs Performance via Computer Vision Applications

Jul

2020

Detection and Identification of Mass Structure in Digital Mammogram

September

2013

A Two Hop Power Adaptive MAC Protocol for Densely Populated Wireless Networks

March

2013

Reseach Article

High Scalability of HDFS using Distributed Namespace

by Harcharan Jit Singh, V. P. Singh

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 52 - Number 17

Year of Publication: 2012

Authors: Harcharan Jit Singh, V. P. Singh

10.5120/8297-1860

Harcharan Jit Singh, V. P. Singh . High Scalability of HDFS using Distributed Namespace. International Journal of Computer Applications. 52, 17 ( August 2012), 30-37. DOI=10.5120/8297-1860

@article{ 10.5120/8297-1860,

author = { Harcharan Jit Singh, V. P. Singh },

title = { High Scalability of HDFS using Distributed Namespace },

journal = { International Journal of Computer Applications },

issue_date = { August 2012 },

volume = { 52 },

number = { 17 },

month = { August },

year = { 2012 },

issn = { 0975-8887 },

pages = { 30-37 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume52/number17/8297-1860/ },

doi = { 10.5120/8297-1860 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:52:33.619695+05:30

%A Harcharan Jit Singh

%A V. P. Singh

%T High Scalability of HDFS using Distributed Namespace

%J International Journal of Computer Applications

%@ 0975-8887

%V 52

%N 17

%P 30-37

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In data intensive computing, Hadoop is widely used by organizations. The client applications of Hadoop require high availability and scalability of the system. Mostly, these applications are online and their data growth rate is unpredictable. The present Hadoop relies on secondary namenode for failover which slows down the performance of the system. Hadoop system's scalability depends on the vertical scalability of namenode server. As the namespace of Hadoop distributed file system grows, it demands additional memory to cache. A namenode server does not have enough primary memory to cache the namespace, its performance and availability effects. A new Hadoop architecture has been proposed to address the issues of namenode scalability, single point of failure and availability of Hadoop. This approach is based on distribution of namespace using distributed hash tables. The growing size of namespace of HDFS is distributed into multiple name node servers. The proposed architecture of Hadoop is simulated by using the multiple name node servers. The name node are arranges in chord ring. This allows HDFS to scale up horizontally. The system provides decartelize managed approach for namespace distribution which gives consistent performance. The results of HDFS namespace to store 1 billion or above files are discussed in this research work. The proposed architecture has shown high availability and adapts to name node failure.

References

Ghemawat, S. , Gobioff, H. and Leung, S. T. , 2003, The Google File System, Google.
Dean, J. and Ghemawat, S. , 2004, MapReduce: Simplified Data Processing on Large Clusters, Google.
Shvachko, K. V. , May 2010, HDFS scalability: the limits to growth, usenix vol 35 no 3, www. usenix. org /publications/login/2010-04/openpdfs/shvachko. pdf.
Borthakur, D. , November 2007, The Hadoop Distributed File System: Architecture and Design.
Porter, G. , April 2010, Decoupling Storage and Computation in Hadoop with SuperDataNodes, ACM SIGOPS Operating Systems Review, Volume 44 issue.
Tankel, D. , May 2010, Scalability of Hadoop Distributed File system, Yahoo developer work.
The RPC server Listener thread is a scalability bottleneck, Apache Jira, https://issues. apache. org/jira/browse/HADOOP-6713.
Borthapur, D. , 2010, Hadoop AvatarNode High Availability, http://hadoopblog. blogspot. com/2010/02/hadoop-namenode-high-availability. html, Facebook.
Wang, F. , Qiu, J. , Yang, J. , Dong, B. , Li, X. and Li, Y. , November 2009, Hadoop High Availability through Metadata Replication, IBM China Research Laboratory, ACM.
Wang, Y. and HaiTao, L. V. , 2011, Efficient Metadata Management in Cloud Computing, IEEE 3rd International Conference on Communication Software and Networks.
Sriniwas, A. V. , Reddy, M. V. and D. Janakiram, March 2006, Distributed Wisdom: Designing a Replication Service for Large Peer to Peer Data Grids, IEEE Distributed Systems Online Vol. 7, No. 3.
Apache Hadoop Project: http://hadoop. apache. org
Shvachko, K. , Kuang, H. , Radia, S. and Chansler, R. , 2010, The Hadoop Distributed File System, Mass Storage Systems and Technologies (MSST), IEEE 26th Symposium.
Attebury, G. and Baranovski, A. , 2009, Hadoop Distributed File System for the Grid, Nuclear Science Symposium Conference Record (NSS/MIC), IEEE.
Guang-hua, S. and Jun-na, C. , 2011, QDFS: A Quality-Aware Distributed File Storage Service Based on HDFS Computer Science and Automation Engineering (CSAE), IEEE International Conference.
Shvachko, K. and Kuang, H. , 2010, The Hadoop Distributed File System, Mass Storage Systems and Technologies (MSST), IEEE 26th Symposium.
An Introduction to HDFS Federation , http://hortonworks. com/blog/an-introduction-to-hdfs-federation/
The Next Generation of Apache Hadoop MapReduce, http://developer. yahoo. com/blogs /hadoop/posts/2011/02/mapreduce-nextgen/
Shvachko, K. V. , june 2010, Apache Hadoop: The Scalability Update, https://www. usenix. org/publications/login/june-2011-volume-36-number-3/apache-hadoop-scalability-update USENIX, The advanced computing system association.
Flocchini, P. , Jan 2007, Enhancing Peer-to-Peer Systems Through Redundancy, Selected Areas in Communications, IEEE Journal, Volume 25.
Huang, H. and Zheng, Y. , 2010, PChord: a distributed hash table for P2P network, Frontiers Of Electrical and Electronic Engineering In China Volume 5, Number 1.
HDFS Federation, http://hadoop. apache. org/common/docs/r0. 23. 0/hadoop-yarn/hadoop-yarn-site/Federation. html
Tom White, Hadoop: The Definitive Guide
Jason Venner, Pro Hadoop
Vu, Q. H. , Lupu, M. and Ooi, B. C. , Peer to Peer Computing principles and Applications, Springer
Antony Chazapis, Georgios Tsoukalas , 2007, Global-scale peer-to-peer ?le services with DFS, IEEE 8th Grid Computing Conference
Manghui Tu, Peng Li, I-Ling Yen, Bhavani Thuraisingham, Latifur Khan, JANUARY-MARCH 2010, Secure Data Objects Replication in Data Grid, IEEE Transactions on Dependable and Secure computing, Vol. 7, No. 1

Index Terms

Computer Science

Information Sciences

Keywords

HDFS Hadoop Chord Namespace namenode datanode