We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Classification with an improved Decision Tree Algorithm

by A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 46 - Number 23
Year of Publication: 2012
Authors: A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia
10.5120/7102-9546

A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia . Classification with an improved Decision Tree Algorithm. International Journal of Computer Applications. 46, 23 ( May 2012), 1-6. DOI=10.5120/7102-9546

@article{ 10.5120/7102-9546,
author = { A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia },
title = { Classification with an improved Decision Tree Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 46 },
number = { 23 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume46/number23/7102-9546/ },
doi = { 10.5120/7102-9546 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:40:22.662276+05:30
%A A. S. Galathiya
%A A. P. Ganatra
%A C. K. Bhensdadia
%T Classification with an improved Decision Tree Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 46
%N 23
%P 1-6
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data mining is for new pattern to discover. Data mining is having major functionalities: classification, clustering, prediction and association. Classification is done from the root node to the leaf node of the decision tree. Decision tree can handle both continuous and categorical data. The classified output through decision tree is more under stable and accurate. In this research work, Comparison is made between ID3, C4. 5 and C5. 0 and after that Implementation of system is done. The new system gives more accurate and efficient output with less complexity. The system performs feature selection, cross validation, reduced error pruning and model complexity along with classification. The implemented system supports high accuracy, good speed and low memory usage. The memory used by the system, is low compare to other classifiers as the rules generated by this system is less. The major issues concerning data mining in large databases are efficiency and scalability. While in case of high dimensional data, feature selection is the technique for removing irrelevant data. It reduces the attribute space of a feature set. More reliable estimation of prediction is done by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. By increasing the model complexity, accuracy of the classification is increases. Overfitting is again major problem of decision tree. The system has also facility to do post pruning that is through reduced error pruning technique. Using this proposed system; Accuracy is gained and classification error rate is reduced compare to the existing system.

References
  1. Sohag Sundar Nanda, Soumya Mishra, Sanghamitra Mohanty, Oriya Language Text Mining Using C5. 0 Algorithm, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (1) , 2011
  2. cTomM. Mitchel, McGrawHil, Decision Tree Learning, Lecture slides for textbook Machine Learning, , 197
  3. Zuleyka Díaz Martínez, José Fernández Menéndez, Mª Jesús Segovia Vargas, See5 Algorithm versus Discriminant Analysis, Spain.
  4. XindongWu • Vipin Kumar • J. Ross Quinlan • Joydeep Ghosh • Qiang Yang • Hiroshi Motoda • Geoffrey J. McLachlan • Angus Ng • Bing Liu • Philip S. Yu • Zhi-Hua Zhou • Michael Steinbach • David J. Hand • Dan Steinberg, Top 10 algorithms in data mining, © Springer-Verlag London Limited 2007
  5. J. R, QUINLAN , Induction of Decision Trees, New South Wales Institute of Technology, Sydney 2007, Australia
  6. Rulequest Research, "Data Mining Tools See5 and C5. 0, http:/lwww. rulequest. com/see5-info. html, 1997-2004
  7. Decision Trees for Business Intelligence and Data Mining: Using SAS Enterprise Miner "Decision Trees— What Are They?"
  8. Thair Nu Phyu, "Survey of Classification Techniques in Data Mining", International MultiConference of Engineers and Computer Scientists 2009 Vol I IMECS 2009, March 18 - 20, 2009, Hong Kong
  9. S. B. Kotsiantis, Department of Computer Science and Technology "Supervised Machine Learning: A Review of Classification Techniques" - , University of Peloponnese, Greece
  10. Classification: basic Concepts, Desision Tree, and model evalution
  11. Terri Oda , Data Mining Project, April 14, 2008
  12. Matthew N. Anyanwu manyanwu, Sajjan G. Shiva sshiva, Comparative Analysis of Serial Decision Tree Classification Algorithms
  13. Maria Simi , Decision tree learning
  14. Osmar R. Zaïane, 1999, Introduction to Data Mining, University of Alberta
  15. J. R. B. COCKETT, J. A. HERRERA, Decision tree reduction, University of Tennessee, Knoxville, Tennessee
  16. Hendrik Blockeel , Jan Struyf, Efficient Algorithms for Decision Tree Cross-validation, Department of Computer Science, Katholieke Universiteit Leuven, Belgium
  17. S. Rasoul Safavian and David Landgrebe, A Survey of Decision Tree Classifier Methodology, School of Electrical Engineering Purdue University, West Lafayette
  18. Floriana Esposito, Donato Malerba, and Giovanni Semeraro, A Comparative Analysis of Methods for Pruning Decision Trees, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 5, MAY 1997
  19. Paul E. Utgoff, Neil C. Berkman, Jeffery A. Clouse, Decision Tree Induction Based on Efficient Tree Restructuring, Department of Computer Science, University of Massachusetts, Amherst, MA 01003
  20. Niks, Nikson , Decision Trees, Introduction to machine learning,
  21. Ron Kohavi Ross Quinlan , Decision Tree Discovery, Blue Martini Software 2600 Campus Dr. Suite 175, San Mateo, CA & Samuels Building, G08 University of New South Wales, Sydney 2052 Australia
  22. Paul E. Utgoff , Incremental Induction of Decision Trees, Department of Computer and Information Science, University of Massachusetts, Amherst, MA 01003
  23. Matti K¨a¨ari¨ainen , Tuomo Malinen, Tapio Elomaa, Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees, Department of Computer Science, University of Helsinki, Institute of Software Systems, Tampere University of Technology, Tampere, Finland
  24. Michael Kearns, Yishay Mansour, A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization, AT&T Labs, Tel Aviv University
  25. Zijian Zheng, Constructing new attributes for decision tree learning, Basser Department of Computer Science, the university of Sydney, Australia
  26. Emily Thomas, DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES, Director of Planning and Institutional Research, State University of New York
  27. Kurt Hornik, The RWeka Package August 20, 2006
  28. Zhengping Ma, Eli Lilly and Company, Data mining in SAS® with open source software, SAS Global Forum 2011
  29. Simon Urbanek, Package 'rJava', Jan 2, 2012
  30. M. Govindarajan, Text Mining Technique for Data Mining Application, World Academy of Science, Engineering and Technology 35 2007
  31. A S Galathiya, AP Ganatra, CK Bhensdadia, An Improved decision tree induction algorithm with feature selection, cross validation, model complexity & reduced error pruning, IJSCIT march 2012
Index Terms

Computer Science
Information Sciences

Keywords

Rep Decision Tree Algorithm C5 Classifier C4. 5 Classifier