CFP last date
20 January 2025
Reseach Article

A Holistic Approach to Autonomic Self-Healing Distributed Computing System

by Abhishek Bhavsar, Ameya More, Chinmay Kulkarni, Dheeraj Oswal, Jagannath Aghav
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 76 - Number 3
Year of Publication: 2013
Authors: Abhishek Bhavsar, Ameya More, Chinmay Kulkarni, Dheeraj Oswal, Jagannath Aghav
10.5120/13228-0657

Abhishek Bhavsar, Ameya More, Chinmay Kulkarni, Dheeraj Oswal, Jagannath Aghav . A Holistic Approach to Autonomic Self-Healing Distributed Computing System. International Journal of Computer Applications. 76, 3 ( August 2013), 25-30. DOI=10.5120/13228-0657

@article{ 10.5120/13228-0657,
author = { Abhishek Bhavsar, Ameya More, Chinmay Kulkarni, Dheeraj Oswal, Jagannath Aghav },
title = { A Holistic Approach to Autonomic Self-Healing Distributed Computing System },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 76 },
number = { 3 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 25-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume76/number3/13228-0657/ },
doi = { 10.5120/13228-0657 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:44:56.806042+05:30
%A Abhishek Bhavsar
%A Ameya More
%A Chinmay Kulkarni
%A Dheeraj Oswal
%A Jagannath Aghav
%T A Holistic Approach to Autonomic Self-Healing Distributed Computing System
%J International Journal of Computer Applications
%@ 0975-8887
%V 76
%N 3
%P 25-30
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Distributed Computing systems are prone to errors and faults and a major amount of time is wasted in maintaining the system and bringing it back to a stable state after a fault. Human resources in the distributed systems architecture currently handle this maintenance. Despite the emergence of ultra-reliable components, failure in distributed computing systems is still an unmitigated problem. As a result of this a lot of resources in the form of money and manpower and efforts in the form of man months are wasted. The proposed mechanism focuses efforts to make a distributed systems environment reliable and robust by proposing an autonomic, self-healing architecture. A holistic approach to the problem is adopted and an architecture that is general enough to be adopted by a wide range of existing systems is proposed. Some of the major challenges include selecting the appropriate actions for healing and reducing the overhead thus making healing lightweight and transparent, yet effective. The proposed system architecture makes use of data mining techniques to generate rules based on gathered system data from logs. The rules are used to make decisions of corrective action and hence carry out the self-healing mechanism.

References
  1. R. K. Sahoo, A. J. Oliner, I. Rish, M. Gupta, J. E. Moreira, S. Ma, R. Vilalta, and A. Sivasubramaniam, "Critical event prediction for proactive management in large-scale computer clusters," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD '03. New York, NY, USA: ACM, 2003, pp. 426–435. [Online]. Available: http://doi. acm. org/10. 1145/956750. 956799
  2. Y. Liang and Y. Zhang, "Failure prediction in ibm bluegene/l event logs. "
  3. T. B. Team, T. Domany, M. Dombrowa, W. Donath, M. Eleftheriou, C. Erway, J. Esch, J. Gagliano, A. Gara, R. Garg, R. Germain, M. Giampapa, B. Gopalsamy, J. Gunnels, B. Rubin, A. Ruehli, S. Rus, R. Sahoo, A. Sanomiya, E. Schenfeld, M. Sharma, S. Singh, P. Song, V. Srinivasan, B. Steinmacher-burow, K. Strauss, C. Surovic, T. Ward, J. Marcella, A. Muff, A. Okomo, M. Rouse, A. Schram, M. Tubbs, G. Ulsh, C. Wait, J. Wittrup, M. B. (ibm Server Group, K. D. (ibm
  4. A. Gara, M. Blumrich, D. Chen, G. -T. Chiu, P. Coteus, M. Giampapa, R. Haring, P. Heidelberger, D. Hoenicke, G. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-Burow, T. Takken, and P. Vranas, "Overview of the blue gene/l system architecture," IBM Journal of Research and Development, vol. 49, no. 2. 3, pp. 195–212, 2005.
  5. Y. Liang, Y. Zhang, M. Jette, A. Sivasubramaniam, and R. Sahoo, "Bluegene/l failure analysis and prediction models," in Dependable Systems and Networks, 2006. DSN 2006. International Conference on, 2006, pp. 425–434.
  6. J. Hansen and D. Siewiorek, "Models for time coalescence in event logs," in Fault-Tolerant Computing, 1992. FTCS-22. Digest of Papers. , Twenty-Second International Symposium on, 1992, pp. 221–227.
  7. S. Fu and C. -Z. Xu, "Exploring event correlation for failure prediction in coalitions of clusters," in Supercomputing, 2007. SC '07. Proceedings of the 2007 ACM/IEEE Conference on, 2007, pp. 1–12.
  8. "Data mining and machine learning using weka. " http://www. cs. waikato. ac. nz/ml/weka/.
  9. S. B. Aher and L. L. M. R. J, "Article: A comparative study of association rule algorithms for course recommender system in e-learning," Inter- national Journal of Computer Applications, vol. 39, no. 1, pp. 48–52, February 2012, published by Foundation of Computer Science, New York, USA.
  10. R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," in Proc. of 20th Intl. Conf. on VLDB, 1994, pp. 487–499.
Index Terms

Computer Science
Information Sciences

Keywords

Autonomic Computing Self-Healing Systems Healing Engine Reliable Systems Dependable Systems