International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 46 - Number 23 |
Year of Publication: 2012 |
Authors: A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia |
10.5120/7102-9546 |
A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia . Classification with an improved Decision Tree Algorithm. International Journal of Computer Applications. 46, 23 ( May 2012), 1-6. DOI=10.5120/7102-9546
Data mining is for new pattern to discover. Data mining is having major functionalities: classification, clustering, prediction and association. Classification is done from the root node to the leaf node of the decision tree. Decision tree can handle both continuous and categorical data. The classified output through decision tree is more under stable and accurate. In this research work, Comparison is made between ID3, C4. 5 and C5. 0 and after that Implementation of system is done. The new system gives more accurate and efficient output with less complexity. The system performs feature selection, cross validation, reduced error pruning and model complexity along with classification. The implemented system supports high accuracy, good speed and low memory usage. The memory used by the system, is low compare to other classifiers as the rules generated by this system is less. The major issues concerning data mining in large databases are efficiency and scalability. While in case of high dimensional data, feature selection is the technique for removing irrelevant data. It reduces the attribute space of a feature set. More reliable estimation of prediction is done by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. By increasing the model complexity, accuracy of the classification is increases. Overfitting is again major problem of decision tree. The system has also facility to do post pruning that is through reduced error pruning technique. Using this proposed system; Accuracy is gained and classification error rate is reduced compare to the existing system.