We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data

by Saurabh Mangal, Aditya Shankar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 108 - Number 3
Year of Publication: 2014
Authors: Saurabh Mangal, Aditya Shankar
10.5120/18895-0183

Saurabh Mangal, Aditya Shankar . Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data. International Journal of Computer Applications. 108, 3 ( December 2014), 40-43. DOI=10.5120/18895-0183

@article{ 10.5120/18895-0183,
author = { Saurabh Mangal, Aditya Shankar },
title = { Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data },
journal = { International Journal of Computer Applications },
issue_date = { December 2014 },
volume = { 108 },
number = { 3 },
month = { December },
year = { 2014 },
issn = { 0975-8887 },
pages = { 40-43 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume108/number3/18895-0183/ },
doi = { 10.5120/18895-0183 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:42:03.882878+05:30
%A Saurabh Mangal
%A Aditya Shankar
%T Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 108
%N 3
%P 40-43
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Random Forests are an effective ensemble method which is becoming increasingly popular, particularly for binary classification prediction problems. One of the most popular algorithms for implementing the Random Forest model is the Breiman and Cutler's algorithm and this forms the basis of the "randomForest" package in R. However, a Random Forest model implemented using this package has a limitation, especially in a milieu which has limited computational power, that it cannot handle highly categorical data. In this paper, we present one of the many techniques we tried to improve the performance of a Random Forest Model using highly categorical data. The performance improvement was solely achieved using advanced pre-processing techniques like Optimal Scaling, hence the title of the paper.

References
  1. Greer, J. E. and G. McCalla, 1994. Evaluating a Simulated Student using Real Students Data for Training and Testing.
  2. Anderson, J. R. 1995, Cognitive tutors: Lessons learned, Carnegie Mellon University.
  3. Noboru Matsuda1, William W. Cohen 2010, Tuning Cognitive Tutors into a Platform for Learning by-Teaching with SimStudent Technology Carnegie Mellon University.
  4. Noboru Matsuda, Applying Machine Learning to Cognitive Modelling for Cognitive Tutors 2006, in Machine Learning Department Technical Report (CMU ML).
  5. Muggleton, S. and L. de Raedt 1994, Inductive Logic Programming: Theory and methods
  6. Lau, T. A. and D. S. Weld, 1998 an inductive learning formulation.
  7. Johnson, W. L. 1998, Integrating pedagogical agents into virtual environments.
  8. Baffes, P. and R. Mooney, 1996, Refinement-Based Student Modelling and Automated Bug Library Construction.
  9. Merceron, A and K. Yacef, A web-based tutoring tool with mining facilities to improve learning and teaching, 2003.
  10. Mertz, J. S. 1997, Using Simulated Student for Instructional Design.
  11. Koedinger, K. R. and A. Corbett, 2006, Cognitive Tutors: Technology Bringing Learning Sciences to the Classroom, in The Cambridge Handbook of the Learning Sciences.
  12. Matsuda, N. , W. W. Cohen, and K. R. Koedinger 2005, Applying Programming by Demonstration in an Intelligent Authoring Tool for Cognitive Tutors.
Index Terms

Computer Science
Information Sciences

Keywords

Ensemble Methods Random Forest Prediction with Categorical Variables Optimal Scaling Classification Machine Learning Non-Linear Categorical Prediction.