Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data

Saurabh Mangal; Aditya Shankar

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Impact of using Snowflake Schema and Bitmap Index on Data Warehouse Querying

Jan

2018

Customer Complain Detection in E-commerce Platforms using NLP

Dec

2022

Comparative Analysis of Search Algorithms

Jun

2018

Enhanced HMM Speech Emotion Recognition using SVM and Neural Classifier

February

2014

Reseach Article

Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data

by Saurabh Mangal, Aditya Shankar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 108 - Number 3

Year of Publication: 2014

Authors: Saurabh Mangal, Aditya Shankar

10.5120/18895-0183

Saurabh Mangal, Aditya Shankar . Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data. International Journal of Computer Applications. 108, 3 ( December 2014), 40-43. DOI=10.5120/18895-0183

@article{ 10.5120/18895-0183,

author = { Saurabh Mangal, Aditya Shankar },

title = { Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data },

journal = { International Journal of Computer Applications },

issue_date = { December 2014 },

volume = { 108 },

number = { 3 },

month = { December },

year = { 2014 },

issn = { 0975-8887 },

pages = { 40-43 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume108/number3/18895-0183/ },

doi = { 10.5120/18895-0183 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:42:03.882878+05:30

%A Saurabh Mangal

%A Aditya Shankar

%T Prediction Improvement using Optimal Scaling on Random Forest Models for Highly Categorical Data

%J International Journal of Computer Applications

%@ 0975-8887

%V 108

%N 3

%P 40-43

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Random Forests are an effective ensemble method which is becoming increasingly popular, particularly for binary classification prediction problems. One of the most popular algorithms for implementing the Random Forest model is the Breiman and Cutler's algorithm and this forms the basis of the "randomForest" package in R. However, a Random Forest model implemented using this package has a limitation, especially in a milieu which has limited computational power, that it cannot handle highly categorical data. In this paper, we present one of the many techniques we tried to improve the performance of a Random Forest Model using highly categorical data. The performance improvement was solely achieved using advanced pre-processing techniques like Optimal Scaling, hence the title of the paper.

References

Greer, J. E. and G. McCalla, 1994. Evaluating a Simulated Student using Real Students Data for Training and Testing.
Anderson, J. R. 1995, Cognitive tutors: Lessons learned, Carnegie Mellon University.
Noboru Matsuda1, William W. Cohen 2010, Tuning Cognitive Tutors into a Platform for Learning by-Teaching with SimStudent Technology Carnegie Mellon University.
Noboru Matsuda, Applying Machine Learning to Cognitive Modelling for Cognitive Tutors 2006, in Machine Learning Department Technical Report (CMU ML).
Muggleton, S. and L. de Raedt 1994, Inductive Logic Programming: Theory and methods
Lau, T. A. and D. S. Weld, 1998 an inductive learning formulation.
Johnson, W. L. 1998, Integrating pedagogical agents into virtual environments.
Baffes, P. and R. Mooney, 1996, Refinement-Based Student Modelling and Automated Bug Library Construction.
Merceron, A and K. Yacef, A web-based tutoring tool with mining facilities to improve learning and teaching, 2003.
Mertz, J. S. 1997, Using Simulated Student for Instructional Design.
Koedinger, K. R. and A. Corbett, 2006, Cognitive Tutors: Technology Bringing Learning Sciences to the Classroom, in The Cambridge Handbook of the Learning Sciences.
Matsuda, N. , W. W. Cohen, and K. R. Koedinger 2005, Applying Programming by Demonstration in an Intelligent Authoring Tool for Cognitive Tutors.

Index Terms

Computer Science

Information Sciences

Keywords

Ensemble Methods Random Forest Prediction with Categorical Variables Optimal Scaling Classification Machine Learning Non-Linear Categorical Prediction.