CFP last date
20 January 2025
Reseach Article

Performance improvement in Distributed Systems through Replication and Checkpointing

by Sourabh Dave, Abhishek Raghuvanshi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 42 - Number 19
Year of Publication: 2012
Authors: Sourabh Dave, Abhishek Raghuvanshi
10.5120/5801-8039

Sourabh Dave, Abhishek Raghuvanshi . Performance improvement in Distributed Systems through Replication and Checkpointing. International Journal of Computer Applications. 42, 19 ( March 2012), 17-21. DOI=10.5120/5801-8039

@article{ 10.5120/5801-8039,
author = { Sourabh Dave, Abhishek Raghuvanshi },
title = { Performance improvement in Distributed Systems through Replication and Checkpointing },
journal = { International Journal of Computer Applications },
issue_date = { March 2012 },
volume = { 42 },
number = { 19 },
month = { March },
year = { 2012 },
issn = { 0975-8887 },
pages = { 17-21 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume42/number19/5801-8039/ },
doi = { 10.5120/5801-8039 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:31:44.291034+05:30
%A Sourabh Dave
%A Abhishek Raghuvanshi
%T Performance improvement in Distributed Systems through Replication and Checkpointing
%J International Journal of Computer Applications
%@ 0975-8887
%V 42
%N 19
%P 17-21
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In distributed system fault tolerance is an important issue. Many applications executing in present scenario with several processors have to face with problems related to consistency and availability. Complete process will fail with the failure of a single component. There are many existing approaches which assure reliable execution, are based on fault tolerance mechanisms. We talk about the basic concept of fault tolerance, which is to make a network system tolerant enough to work properly, may be with a little low efficiency, in case of any fault. A good fault tolerant system will avoid further failures. After transient failures main problem is to bring a distributed system to a consistent state. We worked on two parts of this problem by providing a distributed system to create consistent checkpoints as well as replication is focused. We have given an algorithm for replication and implemented it in Java RMI. We have done two things: First the checkpoints are replicated and Second, Servers are replicated on different system using that algorithm.

References
  1. Daniel Oelke, "Overview of Distributed computing", Mithral Communications & Design Inc. 1995-2012 , [Online]Available:http://www. mithral. com/projects/cosm/ch-02. html
  2. Sanjay Bansal, and Sanjeev Sharma, "Identification of Critical Factors in Checkpointing Based Multiple Fault Tolerance for Distributed System", Journal of Emerging Trends in Computing and Information Sciences, Volume 2 No. 1, 2010.
  3. Halpern, J. and Y. Moses, "Knowledge and Common Knowledge in a Distributed Environment," Proc. of the 3rd ACM Symposium on Principles of Distributed Systems, 1984, pp. 50-61 and Lamport, L. , R. Shostak, and M. Pease, "The Byzantine Generals Problem," ACM Transactions on Programming Languages and Systems, Vol. 4 No. 3, July 1982, pp. 382-401.
  4. Jalote, P. Fault Tolerance in Distributed Systems, (Prentice Hall, 1994).
  5. Chris Matthews, "Introduction to Java Remote Method Invocation (RMI)", The Electronic Developer Magzine, [Online]Available: http://www. edm2. com/0601/rmi1. html
  6. A Concept of Replicated Remote Method Invocation Jerzy Brzezinski and Cezary Sobaniec, Institute of Computing Science, Poznan University of Technology, Poland{Jerzy. Brzezinski,Cezary. Sobaniec}@cs. put. poznan. pl.
  7. M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, G. Alonso, "Understanding Replication in Databases and Distributed Systems," Research supported by EPFLETHZ DRAGON project and OFES).
  8. M. Herlihy and J. Wing. "Linearizability: a correctness condition for concurrent objects," ACM Trans. on Progr. Languages and Syst. , 12(3):463-492, 1990. (IJIDCS) International Journal on Internet and Distributed Computing Systems. Vol: 1 No: 1, 39
  9. M. Ahamad, P. W. Hutto, G. Neiger, J. E. Burns, and P. Kohli. , "Causal Memory:Definitions, implementations and Programming," TR GIT-CC-93/55, Georgia Institute of Technology, July 94.
  10. H. P. Reiser, M. J. Danel, and F. J. Hauck. , " A flexible replication framework for scalable andreliable . net services. ," In Proc. of the IADIS Int. Conf. on Applied Computing, volume1, pages 161–169, 2005.
  11. A. Kale, U. Bharambe, "Highly available fault tolerant distributed computing using reflection and replication," Proceedings of the International Conference on Advances in Computing, Communication and Control, Mumbai, India Pages: 251-256 ,: 2009
  12. X. China, "Token-Based Sequential Consistency in Asynchronous Distributed System ," 17 th Internaional Conference on Advanced Information Networking and Applications (AINA'03),March 27-29, ISBN: 0-7695- 1906-7
  13. Sanjay Bansal, Sanjeev Sharma, Ishita Trivedi, "A Detailed Review of Fault-Tolerance Techniques in Distributed System", International Journal on Internet and Distributed Computing Systems. Vol: 1 No: 1 : 2011
  14. D. K. Gifford, "Weighted voting for replicated data," In SOSP '79: Proc. of the seventh ACM symposium on Operating systems principles, pages 150–162, 1979.
  15. J. Osrael, L. Froihofer, K. M. Goeschka, S. Beyer,P. Gald´amez, , and F. Mu˜noz. "A system architecture for enhanced availability of tightly coupled distributed systems," In Proc. of 1st Int. Conf. on Availability, Reliability, and Security. IEEE, 2006
  16. J Maccormick1, C Thekkath, M. Jager,K. Roomp, and L. Peterson , "Niobe: A Practical Replication Protocol. " ACM Journal Name, Vol. V, No. N, Month 20YY.
  17. Cao Huaihu, Zhu Jianming, "An Adaptive Replicas Creation Algorithm with Fault Tolerance in the Distributed Storage Network" 2008 IEEE.
  18. N. Budhiraja, K. Marzullo, F. B. Schneider, and S. Toueg. The Primary-Backup Approach. In Sape Mullender, editor, Distributed Systems, pages 199-216. ACM Press, 1993.
  19. V. Agarwal, Fault Tolerance in Distributed Systems, Institute of Technology Kanpur, www. cse. iitk. ac. in/report-repository, 2004. ,
  20. H. Jung, D. Shin, H. Kim, and Heon Y. Lee, "Design and Implementation of Multiple FaultTolerant MPI over Myrinet (M3) ," SC|05 Nov 1218,2005, Seattle, Washington, USA Copyright 2005 ACM.
  21. M. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. Technical Report CMU-CS-96-81, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, October 1996.
  22. J. Walters and V. Chaudhary," Replication-Based Fault Tolerance for MPI Applications," Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 7, July 2009.
  23. M Chtepen, F. . Claeys, B. Dhoedt, , and P. Vanrolleghem," Adaptive Task Checkpointing and Replication:Toward Efficient Fault-Tolerant Grids", IEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 2, Feb 2009.
  24. S. Jafar, A. Krings, and T. Gautier," Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing", IEEE Transactions On Dependable and Secure Computing, Vol. 6, No. 1, Jan-Mar 2009.
Index Terms

Computer Science
Information Sciences

Keywords

Checkpointing Replication Rmi