CFP last date
20 January 2025
Reseach Article

Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism

by S.Baghavathi Priya, Dr.T.Ravichandran
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 26 - Number 5
Year of Publication: 2011
Authors: S.Baghavathi Priya, Dr.T.Ravichandran
10.5120/3098-4252

S.Baghavathi Priya, Dr.T.Ravichandran . Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism. International Journal of Computer Applications. 26, 5 ( July 2011), 32-37. DOI=10.5120/3098-4252

@article{ 10.5120/3098-4252,
author = { S.Baghavathi Priya, Dr.T.Ravichandran },
title = { Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism },
journal = { International Journal of Computer Applications },
issue_date = { July 2011 },
volume = { 26 },
number = { 5 },
month = { July },
year = { 2011 },
issn = { 0975-8887 },
pages = { 32-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume26/number5/3098-4252/ },
doi = { 10.5120/3098-4252 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:12:01.404183+05:30
%A S.Baghavathi Priya
%A Dr.T.Ravichandran
%T Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism
%J International Journal of Computer Applications
%@ 0975-8887
%V 26
%N 5
%P 32-37
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The check pointing mechanism and rollback recovery is a well-known method to achieve fault tolerance in grid computing systems. If any resource or process is tending to be faulty in run time that will be detected by check pointing mechanism through the Task Dependency Graph (TDG) and their respective worst case execution time and deadline parameters are used to decide the schedulability. The common approach is to use rollback-dependent graph or check point graph. The scheduling of concurrent tasks can be done using the proposed Concurrent Task Scheduling Algorithm (CTSA) algorithm to recover from the faulty states using replication or rollback techniques. The earlier fault detection methods are not scalable with the diversity of user applications and the frequency of faults varies dynamically making the faults hard to detect and recover. The check pointing and replication mechanisms have been used in high performance grid computing where the synchronization between communicating processes is needed to enhance the efficiency of check pointing mechanism. The performance improvements over the faulty conditions can be obtained with or without data and process replication. The experimental results show that the CTSA can lead to significant performance gain for a variety of scenarios.

References
  1. Malarvizhi Nandagopal, V.Rhymend Uthariaraj, “Fault Tolerant Scheduling Strategy for Computational Grid Environment”, International Journal of Engineering Science and Technology, Vol.2(9), 4361-4372, 2010.
  2. J.Jaybharathy and Ayeshaa Parveen.A,”A Fault Tolerant Load Balancing Model for Grid Environment”, Internatinal Journal of Recent trends in Engineering, Vol 2, No.2, 162-164,2009.
  3. Gopi Kandaswamy, Anirban Mandal, and Daniel A.Reed, “Fault Tolerance and Recovery of Scientific Workflows on Computational Grids”, IEEE Computer Society, 2008.
  4. Partha Sarathi Mandal, Krishnendu Mukhopadhyaya, “Performance analysis of different checkpointing and recovery schemes using stochastic model”, Journal of Parallel and Distributed Computing, 66, 99-107,2006.
  5. Youcef Derbal,”A new fault-tolerance framework for grid computing”, An International Journal on Multiagent and Grid System, 2,115-133, 2006.
  6. Jia Yu and Rajkumar Buyya,”A Taxonomy of Scientific Workflow Systems for Grid Computing”, SIGMOD Record, Vol.34, No.3, 2005.
  7. Soonwook Hwang and Carl Kesselman, “A Flexible Framework for Fault Tolerance in the Grid”, Journal of Grid Computing 1,251-272,2003.
  8. Klaus Krauter, Rajkumar Buyya and Muthucumaru Maheswaran, “ A taxonomy and survey of grid resource management systems for distributed computing “, Software – Practice and Experience 32, 135-164,2002.
  9. Fangpeng Dong and Selim G.Akl, “ Scheduling Algorithms for Grid Computing State of the Art and Open Problems”, Technical Report No. 2006-504.
  10. Saeed Parsa, Reza Entezari-Maleki, “RASA : A New Grid Task Scheduling Algorithm”, International Journal of Digital Content Technology and its Applications Volume 3, Number 4, December 2009.
  11. Kobra Etminani, Prof.M.Naghibzadeh,”A Min-min Max-min Selective Algorithm for Grid Task Scheduling”, IEEE, 2007.
  12. Baghavathi Priya.S, Chandrasekaran Subramaniam, Ravichandran.T,”On Demand Check Pointing for Grid Application Reliability using Communicating Process Model”, IEEE, 2011.
  13. S.Baghavathi Priya.S,Dr.T.Ravichandran, “Fault Recovery Mechanisms using Check Point in Grid Environment”, ICFET, 2010.
  14. S.Baghavathi Priya, Dr.K.K.Dhawan,”Fault-Tolerance Genetic Algorithm for Grid Task Scheduling using Check Point”, IEEE, 2007.
Index Terms

Computer Science
Information Sciences

Keywords

Check pointing Reliability Rollback Replication Fault tolerance