International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 26 - Number 5 |
Year of Publication: 2011 |
Authors: S.Baghavathi Priya, Dr.T.Ravichandran |
10.5120/3098-4252 |
S.Baghavathi Priya, Dr.T.Ravichandran . Fault Tolerance and Recovery for Grid Application Reliability using Check Pointing Mechanism. International Journal of Computer Applications. 26, 5 ( July 2011), 32-37. DOI=10.5120/3098-4252
The check pointing mechanism and rollback recovery is a well-known method to achieve fault tolerance in grid computing systems. If any resource or process is tending to be faulty in run time that will be detected by check pointing mechanism through the Task Dependency Graph (TDG) and their respective worst case execution time and deadline parameters are used to decide the schedulability. The common approach is to use rollback-dependent graph or check point graph. The scheduling of concurrent tasks can be done using the proposed Concurrent Task Scheduling Algorithm (CTSA) algorithm to recover from the faulty states using replication or rollback techniques. The earlier fault detection methods are not scalable with the diversity of user applications and the frequency of faults varies dynamically making the faults hard to detect and recover. The check pointing and replication mechanisms have been used in high performance grid computing where the synchronization between communicating processes is needed to enhance the efficiency of check pointing mechanism. The performance improvements over the faulty conditions can be obtained with or without data and process replication. The experimental results show that the CTSA can lead to significant performance gain for a variety of scenarios.