International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 48 - Number 22 |
Year of Publication: 2012 |
Authors: P. Latchoumy, P. Sheik Abdul Khader |
10.5120/7510-0552 |
P. Latchoumy, P. Sheik Abdul Khader . Improved Fault Tolerant Job Scheduler for Optimal Resource Utilization in Computational Grid. International Journal of Computer Applications. 48, 22 ( June 2012), 6-12. DOI=10.5120/7510-0552
Grid computing provides the ability to access, utilize and control a variety of underutilized heterogeneous resources distributed across multiple administrative domains while it is an error prone environment. The failure of resources affects job execution during runtime. We propose a new strategy named Improved Fault Tolerant Job Scheduler (IFTJS) for Optimal Resource Utilization in Computational Grid which effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline. This system maintains the history of fault occurrence of resources with respect to Processor, Memory and Bandwidth. The usage of this information causes the reduction of selecting chances of the resources which have more failure probability and hence improves the resource utilization. Also, the system guarantees the efficient job execution using Reduced Recovery Time (RRT) strategy. Whenever the scheduler has jobs to schedule, the Improved Fault Tolerant (IFT) algorithm finds the optimal resources based on their failure rate. The resources with lowest failure rate will have highest priority for scheduling. The job manager can monitor the execution of job and return the results to the user after successful completion. If failure occurs it re-executes the job with the same resource using the last saved state when the Failure Rate of the resource is lesser than the optimal value or with the backup resources when it exceeds an optimal value with the last saved state using RRT strategy. Otherwise it reschedules the failed job with the next available optimal resource using the last saved state. Hence the recovery time is getting reduced. Approach is effective in the sense that the resource manager detects the occurrence of resource failures and the job manager guarantees that the submitted jobs executed with optimal resources with the specified deadline.