CFP last date
20 January 2025
Reseach Article

Pre-processing and Modelling using Caret Package in R

by Ajeet Kumar Rai
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 181 - Number 6
Year of Publication: 2018
Authors: Ajeet Kumar Rai
10.5120/ijca2018917530

Ajeet Kumar Rai . Pre-processing and Modelling using Caret Package in R. International Journal of Computer Applications. 181, 6 ( Jul 2018), 39-42. DOI=10.5120/ijca2018917530

@article{ 10.5120/ijca2018917530,
author = { Ajeet Kumar Rai },
title = { Pre-processing and Modelling using Caret Package in R },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2018 },
volume = { 181 },
number = { 6 },
month = { Jul },
year = { 2018 },
issn = { 0975-8887 },
pages = { 39-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume181/number6/29772-2018917530/ },
doi = { 10.5120/ijca2018917530 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:05:15.253884+05:30
%A Ajeet Kumar Rai
%T Pre-processing and Modelling using Caret Package in R
%J International Journal of Computer Applications
%@ 0975-8887
%V 181
%N 6
%P 39-42
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

To Implement Machine learning algorithms there are a number of Tools like python, R, Apache Mahout, Cloud-based services etc. In some tools, there are different packages that help us in data preprocessing and implementing machine learning algorithms. So, in this paper, aim to discuss how can use caret package in R software to implement machine learning techniques.

References
  1. Hastie T, Tibshirani R, Friedman JH (2001). The Elements of Statistical Learning. SpringerVerlag, New York. URL http://www-stat.stanford.edu/~tibs/ElemStatLearn/.
  2. Titanic: Machine Learning from disaster https://www.kaggle.com/c/titanic Algorithm-d457d499ffcd
  3. Non-Linear Classification in R with Decision Trees. (2016, September 21). Retrieved February 23, 2018, from https://machinelearningmastery.com/non-linear-classification-in-r-with-decisiontrees/
  4. Kuhn, M. (2017). The Caret Package. GitHub. Retrieved 14 December 2017, from https://topepo.github.io/Caret/
  5. The CARET package, Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer and Allan Engelhardt (2012). caret: Classification and Regression Training. R package version 5.15-044.
  6. Kaggle, Data Science Community, [Online]. Available: http://www.kaggle.com/ [Accessed: 2-Jun-2017
  7. Prediction of Survivors in Titanic Dataset: A Comparative Study using Machine Learning Algorithms Tryambak Chatterjee* Department of Management Studies, NIT Trichy, Tiruchirappalli, Tamilnadu, India
  8. Statistics review 13: Receiver operating characteristic curves. Critical Care (London, England), 8(6), 508512. http://dx.doi.org/10.1186/cc3000
  9. X. Wu, V. Kumar and J. R. Quinlan, “Top 10 algorithms in data mining”, Knowledge and Information Systems, vol. 14, no. 1, (2008), pp. 1-37.
  10. N. Bissantz and J. Hagedorn, “Data mining”, Business and Information Systems Engineering, vol. 1, (2009), pp. 118-122
  11. Eric Lam, Chongxuan Tang. Titanic – Machine LearningFromDisaster.AvailableFTP: cs229.stanford.edu Directory: proj2012 File: LamTang-TitanicMachineLearningFromDisaster.pdf
  12. Trevor Stephens. (2014). Titanic: Getting Started With R - Part 3: Decision Trees [Online]. Available: http://trevorstephens.com/kaggletitanic-tutorial/r-part-3-decision-trees/
  13. Robnik M, Sikonja, (2004): Improving Random Forests, J F Boulicaut et al (eds): Machine Learning, ECML 2004 Proceedings, Springer, Berlin
  14. Zhang H, Wang M, (2009): Search for the smallest Random Forest, Statistics and Its Interface Volume.2, pp 381-388.
  15. Bradley, A.P., 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30 (7), 1145–1159.
  16. Breiman, L., Friedman, J., Olshen, R., Stone, C., 1984. Classification and Regression Trees. Wadsworth International Group, Belmont, CA.
  17. Fawcett, T., 2001. Using rule sets to maximize ROC performance. In: Proc. IEEE Internat. Conf. on Data Mining (ICDM-2001), pp. 131– 138.
Index Terms

Computer Science
Information Sciences

Keywords

Titanic Dataset Machine Learning Decision Tree Random Forest Confusion Matrix ROC Curve.