CFP last date
20 January 2025
Reseach Article

Classification Through Machine Learning Technique: C4.5 Algorithm based on Various Entropies

by Seema Sharma, Jitendra Agrawal, Sanjeev Sharma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 82 - Number 16
Year of Publication: 2013
Authors: Seema Sharma, Jitendra Agrawal, Sanjeev Sharma
10.5120/14249-2444

Seema Sharma, Jitendra Agrawal, Sanjeev Sharma . Classification Through Machine Learning Technique: C4.5 Algorithm based on Various Entropies. International Journal of Computer Applications. 82, 16 ( November 2013), 28-32. DOI=10.5120/14249-2444

@article{ 10.5120/14249-2444,
author = { Seema Sharma, Jitendra Agrawal, Sanjeev Sharma },
title = { Classification Through Machine Learning Technique: C4.5 Algorithm based on Various Entropies },
journal = { International Journal of Computer Applications },
issue_date = { November 2013 },
volume = { 82 },
number = { 16 },
month = { November },
year = { 2013 },
issn = { 0975-8887 },
pages = { 28-32 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume82/number16/14249-2444/ },
doi = { 10.5120/14249-2444 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:57:55.475425+05:30
%A Seema Sharma
%A Jitendra Agrawal
%A Sanjeev Sharma
%T Classification Through Machine Learning Technique: C4.5 Algorithm based on Various Entropies
%J International Journal of Computer Applications
%@ 0975-8887
%V 82
%N 16
%P 28-32
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data mining is an interdisciplinary field of computer science and is referred to extracting or mining knowledge from large amounts of data. Classification is one of the data mining techniques that maps the data into the predefined classes and groups. It is used to predict group membership for data instances. There are many areas that adapt Data mining techniques such as medical, marketing, telecommunications, and stock, health care and so on. The C4. 5 can be referred as the statistic Classifier. This algorithm uses gain radio for feature selection and to construct the decision tree. It handles both continuous and discrete features. C4. 5 algorithm is widely used because of its quick classification and high precision. This paper proposed a C4. 5 classifier based on the various entropies (Shannon Entropy, Havrda and Charvt entropy, Quadratic entropy) instance of Shannon entropy for classification. Experiment results show that the various entropy based approach is effective in achieving a high classification rate.

References
  1. Agarwal, S. , Pandey, G. N. , & Tiwari, M. D. Data Mining in Education: Data Classification and Decision Tree Approach.
  2. Merceron, A. , & Yacef, K. (2005, May). Educational Data Mining: a Case Study. In AIED (pp. 467-474).
  3. Bakar, A. A. , Othman, Z. A. , & Shuib, N. L. M. (2009, October). Building a new taxonomy for data discretization techniques. In Data Mining and Optimization, 2009. DMO'09. 2nd Conference on (pp. 132-140). IEEE.
  4. Burrows, W. R. , Benjamin, M. , Beauchamp, S. , Lord, E. R. , McCollor, D. , & Thomson, B. (1995). CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. Journal of applied meteorology, 34(8), 1848-1862.
  5. Cover, T. , & Hart, P. (1967). Nearest neighbor pattern classification. Information Theory, IEEE Transactions on, 13(1), 21-27.
  6. Dasarathy, B. V. (1980). Nosing around the neighborhood: A new system structure and classification rule for recognition in partially exposed environments. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (1), 67-71.
  7. de Oña, J. , López, G. , & Abellán, J. (2012). Extracting decision rules from police accident reports through decision trees. Accident Analysis & Prevention.
  8. Devijver, P. A. , & Kittler, J. (1982). Pattern recognition: A statistical approach(p. 448). Englewood Cliffs, NJ: Prentice/Hall International.
  9. Everitt,B. S. , Landau, S. , Leese, M. ,& Stahl, D. Miscellaneous Clustering Methods. Cluster Analysis, 5th Edition, 215-255.
  10. Geisser, S. (1993). Predictive interference: an introduction (Vol. 55). CRC Press.
  11. Han, J. , Kamber, M. , & Pei, J. (2006). Data mining: concepts and techniques. Morgan kaufmann.
  12. Horton, P. , & Nakai, K. (1996, June). A probabilistic classification system for predicting the cellular localization sites of proteins. In Ismb (Vol. 4, pp. 109-115).
  13. Havrda, J. , & Charvát, F. (1967). Quantification method of classification processes. Concept of structural $ a $-entropy. Kybernetika, 3(1), 30-35.
  14. James, G. , Witten, D. , Hastie, T. , & Tibshirani, R. (2013). Support Vector Machines. In An Introduction to Statistical Learning (pp. 337-372). Springer New York.
  15. Jantan, H. , Hamdan, A. R. , & Othman, Z. A. (2011, June). Talent knowledge acquisition using data mining classification techniques. In Data Mining and Optimization (DMO), 2011 3rd Conference on (pp. 32-37). IEEE.
  16. Kim, J. M. , Ahn, H. K. , & Lee, D. H. (2013). A Study on the Occurrence of Crimes Due to Climate Changes Using Decision Tree. In IT Convergence and Security 2012 (pp. 1027-1036). Springer Netherlands.
  17. Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI (Vol. 14, No. 2, pp. 1137-1145).
  18. Lima, C. F. L. , de Assis, F. M. , & de Souza, C. P. (2010, May). Decision tree based on shannon, renyi and tsallis entropies for intrusion tolerant systems. InInternet Monitoring and Protection (ICIMP), 2010 Fifth International Conference on (pp. 117-122). IEEE.
  19. Maszczyk, T. , & Duch, W. (2008). Comparison of Shannon, Renyi and Tsallis entropy used in decision trees. In Artificial Intelligence and Soft Computing–ICAISC 2008 (pp. 643-651). Springer Berlin Heidelberg.
  20. Mathur, N. , Kumar, S. , Kumar, S. , & Jindal, R. The Base Strategy for ID3 Algorithm of Data Mining Using Havrda and Charvat Entropy Based on Decision Tree.
  21. Mosonyi, M. , & Hiai, F. (2011). On the quantum Renyi relative entropies and related capacity formulas. Information Theory, IEEE Transactions on, 57(4), 2474-2487.
  22. Pareek, H. , Eswari, P. R. L. , Babu, N. S. C. , & Bangalore, C. D. A. C. (2013). Entropy and n-gram Analysis of Malicious PDF Documents. International Journal of Engineering, 2(2).
  23. Quinlan, J. R. (1987). Simplifying decision trees. International journal of man-machine studies,27(3), 221-234.
  24. Quinlan, J. R. (1993). C4. 5: programs for machine learning (Vol. 1). Morgan kaufmann.
  25. Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
  26. Ravikumar, S. , Ramachandran, K. I. , & Sugumaran, V. (2011). Machine learning approach for automated visual inspection of machine components. Expert Systems with Applications, 38(4), 3260-3266.
  27. Sangkatsanee, P. , Wattanapongsakorn, N. , & Charnsripinyo, C. (2011). Practical real-time intrusion detection using machine learning approaches. Computer Communications, 34(18), 2227-2235.
  28. ?en, B. , Uçar, E. , & Delen, D. (2012). Predicting & analyzing secondary education placement-test scores: A data mining approach. Expert Systems with Applications, 39(10), 9468-76.
  29. Sharma, B. D. , & Taneja, I. J. (1975). Entropy of type (?, ?) and other generalized measures in information theory. Metrika, 22(1), 205-215.
  30. Su, J. , & Zhang, H. (2006, July). A fast decision tree learning algorithm. InProceedings of the National Conference on Artificial Intelligence (Vol. 21, No. 1, p. 500). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
  31. Jin, C. , De-lin, L. ,& Fen-xiang, M. (2009, July). An improved ID3 decision tree algorithm. In Computer Science &education 2009. 4th International Conference on (pp. 127-130). IEEE.
  32. Balagatabi, Z. N. , & Balagatabi, H. N. (2013). Comparison of Decision Tree and SVM Methods in Classification of Researcher's Cognitive Styles in Academic Environment. Indian Journal of Automation and Artificial Intelligence, 1(1),31-43
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Classification technique Machine learning Decision tree technique C4. 5 algorithm.