Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism

S.Baghavathi Priya; Dr.T.Ravichandran

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism

by S.Baghavathi Priya, Dr.T.Ravichandran

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 26 - Number 5

Year of Publication: 2011

Authors: S.Baghavathi Priya, Dr.T.Ravichandran

10.5120/3098-4252

S.Baghavathi Priya, Dr.T.Ravichandran . Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism. International Journal of Computer Applications. 26, 5 ( July 2011), 32-37. DOI=10.5120/3098-4252

@article{ 10.5120/3098-4252,

author = { S.Baghavathi Priya, Dr.T.Ravichandran },

title = { Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism },

journal = { International Journal of Computer Applications },

issue_date = { July 2011 },

volume = { 26 },

number = { 5 },

month = { July },

year = { 2011 },

issn = { 0975-8887 },

pages = { 32-37 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume26/number5/3098-4252/ },

doi = { 10.5120/3098-4252 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:12:01.404183+05:30

%A S.Baghavathi Priya

%A Dr.T.Ravichandran

%T Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism

%J International Journal of Computer Applications

%@ 0975-8887

%V 26

%N 5

%P 32-37

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The check pointing mechanism and rollback recovery is a well-known method to achieve fault tolerance in grid computing systems. If any resource or process is tending to be faulty in run time that will be detected by check pointing mechanism through the Task Dependency Graph (TDG) and their respective worst case execution time and deadline parameters are used to decide the schedulability. The common approach is to use rollback-dependent graph or check point graph. The scheduling of concurrent tasks can be done using the proposed Concurrent Task Scheduling Algorithm (CTSA) algorithm to recover from the faulty states using replication or rollback techniques. The earlier fault detection methods are not scalable with the diversity of user applications and the frequency of faults varies dynamically making the faults hard to detect and recover. The check pointing and replication mechanisms have been used in high performance grid computing where the synchronization between communicating processes is needed to enhance the efficiency of check pointing mechanism. The performance improvements over the faulty conditions can be obtained with or without data and process replication. The experimental results show that the CTSA can lead to significant performance gain for a variety of scenarios.

References

Malarvizhi Nandagopal, V.Rhymend Uthariaraj, “Fault Tolerant Scheduling Strategy for Computational Grid Environment”, International Journal of Engineering Science and Technology, Vol.2(9), 4361-4372, 2010.
J.Jaybharathy and Ayeshaa Parveen.A,”A Fault Tolerant Load Balancing Model for Grid Environment”, Internatinal Journal of Recent trends in Engineering, Vol 2, No.2, 162-164,2009.
Gopi Kandaswamy, Anirban Mandal, and Daniel A.Reed, “Fault Tolerance and Recovery of Scientific Workflows on Computational Grids”, IEEE Computer Society, 2008.
Partha Sarathi Mandal, Krishnendu Mukhopadhyaya, “Performance analysis of different checkpointing and recovery schemes using stochastic model”, Journal of Parallel and Distributed Computing, 66, 99-107,2006.
Youcef Derbal,”A new fault-tolerance framework for grid computing”, An International Journal on Multiagent and Grid System, 2,115-133, 2006.
Jia Yu and Rajkumar Buyya,”A Taxonomy of Scientific Workflow Systems for Grid Computing”, SIGMOD Record, Vol.34, No.3, 2005.
Soonwook Hwang and Carl Kesselman, “A Flexible Framework for Fault Tolerance in the Grid”, Journal of Grid Computing 1,251-272,2003.
Klaus Krauter, Rajkumar Buyya and Muthucumaru Maheswaran, “ A taxonomy and survey of grid resource management systems for distributed computing “, Software – Practice and Experience 32, 135-164,2002.
Fangpeng Dong and Selim G.Akl, “ Scheduling Algorithms for Grid Computing State of the Art and Open Problems”, Technical Report No. 2006-504.
Saeed Parsa, Reza Entezari-Maleki, “RASA : A New Grid Task Scheduling Algorithm”, International Journal of Digital Content Technology and its Applications Volume 3, Number 4, December 2009.
Kobra Etminani, Prof.M.Naghibzadeh,”A Min-min Max-min Selective Algorithm for Grid Task Scheduling”, IEEE, 2007.
Baghavathi Priya.S, Chandrasekaran Subramaniam, Ravichandran.T,”On Demand Check Pointing for Grid Application Reliability using Communicating Process Model”, IEEE, 2011.
S.Baghavathi Priya.S,Dr.T.Ravichandran, “Fault Recovery Mechanisms using Check Point in Grid Environment”, ICFET, 2010.
S.Baghavathi Priya, Dr.K.K.Dhawan,”Fault-Tolerance Genetic Algorithm for Grid Task Scheduling using Check Point”, IEEE, 2007.

Index Terms

Computer Science

Information Sciences

Keywords

Check pointing Reliability Rollback Replication Fault tolerance