CFP last date
20 December 2024
Reseach Article

Article:Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy

by Dipti D. Patil, V.M. Wadhai, J.A. Gokhale
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 11 - Number 2
Year of Publication: 2010
Authors: Dipti D. Patil, V.M. Wadhai, J.A. Gokhale
10.5120/1554-2074

Dipti D. Patil, V.M. Wadhai, J.A. Gokhale . Article:Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy. International Journal of Computer Applications. 11, 2 ( December 2010), 23-30. DOI=10.5120/1554-2074

@article{ 10.5120/1554-2074,
author = { Dipti D. Patil, V.M. Wadhai, J.A. Gokhale },
title = { Article:Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy },
journal = { International Journal of Computer Applications },
issue_date = { December 2010 },
volume = { 11 },
number = { 2 },
month = { December },
year = { 2010 },
issn = { 0975-8887 },
pages = { 23-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume11/number2/1554-2074/ },
doi = { 10.5120/1554-2074 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:59:35.287840+05:30
%A Dipti D. Patil
%A V.M. Wadhai
%A J.A. Gokhale
%T Article:Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy
%J International Journal of Computer Applications
%@ 0975-8887
%V 11
%N 2
%P 23-30
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. These classifiers first build a decision tree and then prune subtrees from the decision tree in a subsequent pruning phase to improve accuracy and prevent “overfitting”. In this paper, the different pruning methodologies available & their various features are discussed. Also the effectiveness of pruning is evaluated in terms of complexity and classification accuracy by applying C4.5 decision tree classification algorithm on Credit Card Database with pruning and without pruning. Instead of classifying the transactions either fraud or non-fraud the transactions are classified in four risk levels which is an innovative concept.

References
  1. Jiawei Han, Micheline Kamber, “Data Mining Concepts and Techniques”, pp. 279-328, 2001.
  2. Doina Caragea, “Learning classifiers from distributed, semantically heterogeneous, autonomous data sources”, Thesis, Iowa State University, 2004.
  3. Tom. M. Mitchell, “Machine Learning”, McGraw-Hill Publications, 1997
  4. Zhang Yong, “Decision Tree’s Pruning Algorithm Based on Deficient Data Sets”, In Proceedings of the Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, 2005.
  5. Arun Poojari, “Data Mining techniques”, pp 150 -200, 1999
  6. Manish Mehta, Rakesh Agrawal et al:, “SLIQ- A Fast Scalable Classifier for Data Mining.”, In 5th Intl. Conf. on Extending Database Technology, March 1996
  7. J. Quinlan. C4.5 Programs for Machine Learning, San Mateo, CA:Morgan Kaufmann, 1993.
  8. Salvatore, Philip et al:, “Meta learning agents for fraud and intrusion detection in Financial Information Systems.”, Inv paper Proceedings in International conference of Knowledge Discovery and Data mining, 1996.
  9. S. Stolfo et al., “JAM: Java Agents for Metalearning over Distributed Databases,” Proc. Third Int’l Conf. Knowledge Discovery and Data Mining, AAAI Press, Menlo Park,Calif., 1997, pp. 74–81.
  10. Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis, “Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results”, DARPA, 1999.
  11. J. Quinlan. Simplifying decision trees, Int. J. Human-Computer Studies, (1999)51, pp. 497-491, 1999.
  12. F. Esposito, D. Malerba, and G. Semeraro. A comparative Analysis of Methods for Pruning Decision Trees, IEEE transactions on pattern analysis and machine intelligence, 19(5): pp. 476-491, 1997.
  13. L. Briemann, J. Friedman, R. Olshen and C. Stone. CART: Classification and Regression Trees, EBelmont, s CA:Wadsworth Statistical Press, 1984.
  14. B. Cestnik, and I. Bratko. On Estimating Probabilities in Tree Pruning, EWSL, pp. 138-150, 1991
  15. Esposito F., Malerba D., Semeraro G. A Comparative Analysis of Methods for Pruning Decision Trees, IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 19, NO. 5, 1997, P. 476-491
  16. J. Mingers. An Empirical Comparison of Selection Measures for Decision Tree Induction, Machine Learning, 3 (3): pp. 319-342, 1989
  17. J. Mingers. An Empirical Comparison of Pruning Methods for Decision Tree Induction, Machine Learning, 4: pp. 227-243, 1989
  18. L. A. Breslow and D. W. Aha. Simplifying Decision Trees: A Survey, Technical Report No. AIC-96-014, Navy Center for Applied Research in Artificial Intelligence, Naval Research Laboratory Washington, DC.,1996.
  19. I. Bratko and M. Bohanec. Trading accuracy for simplicity in decision trees, Mach. Learn., vol. 15, pp. 223-250, 1994.
  20. H. Almuallim. An efficient algorithm for optimal pruning of decision trees, Artif. Intell., vol. 83, no. 2, pp. 347-362, 1996.
  21. M. I. Jordan. A Statistical Approach to Decision Tree Modeling, Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, New York: ACM Press, 1994.
  22. B. Andrew, L. Brain C., Cost-Sensitive Decision Tree Pruning Use of the ROC Curve, Proceedings Eight Australian Joint Conference on Artificial Intelligence, pp.1-8, November 1995.
Index Terms

Computer Science
Information Sciences

Keywords

Decision tree classification Pruning Data Mining