CFP last date
20 January 2025
Call for Paper
February Edition
IJCA solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 20 January 2025

Submit your paper
Know more
Reseach Article

Data Normalization and Standardization: Impacting Classification Model Accuracy

by Mani Butwall
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 183 - Number 35
Year of Publication: 2021
Authors: Mani Butwall
10.5120/ijca2021921669

Mani Butwall . Data Normalization and Standardization: Impacting Classification Model Accuracy. International Journal of Computer Applications. 183, 35 ( Nov 2021), 6-9. DOI=10.5120/ijca2021921669

@article{ 10.5120/ijca2021921669,
author = { Mani Butwall },
title = { Data Normalization and Standardization: Impacting Classification Model Accuracy },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2021 },
volume = { 183 },
number = { 35 },
month = { Nov },
year = { 2021 },
issn = { 0975-8887 },
pages = { 6-9 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume183/number35/32153-2021921669/ },
doi = { 10.5120/ijca2021921669 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:18:43.081490+05:30
%A Mani Butwall
%T Data Normalization and Standardization: Impacting Classification Model Accuracy
%J International Journal of Computer Applications
%@ 0975-8887
%V 183
%N 35
%P 6-9
%D 2021
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, it was aimed to see the impact of the data normalization on the accuracy of classification model. In first part of this paper, the structure of dataset, features and basic statistical analysis of the data is represented. In this research, the study is done with the medical data set about the patients with the Diabetic disease. In second part of this paper, we present the process of data normalization and the impact of scaling data on the classification model performance. In this research, Deep Learning model is used for classification purpose. The main classification task was to classify whether the patient is diabetic or non-diabetic. Since the data set contains more numerical parameters of different scaling, the main aim of this paper was to investigate the impact of the data normalization (scaling) on the performance of the classification model. The purpose of the study is to show the difference in accuracy achieved by classification model with and without the use of scaling or normalization.

References
  1. Impact of Data Normalization on Classification Model Accuracy Dmitrii BORKIN1, Andrea NÉMETHOVÁ1, German MICHAĽČONOK1, Konstantin MAIOROV2 2019
  2. EngKhaledEskaf, Prof.Dr.Osama ,Badawi , Prof.Dr.TimRitchings,Predicting blood Glucose Levels in Diabetics using feature Extractionand Artificial Neural Networks.
  3. The Effect of the Normalization Method Used in Different Sample Sizes on the Success of Artificial Neural Network Model GökhanAksu 1*, CemOktayGüzeller 2, Mehmet TahaEser 2019
  4. Jack W. Smith, J.E. Everhart, W.C. Dickson, W.C. Knowler, and R.S. Johannes, “Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus”, IEEE Symposium on Computer Applications and Medical Care, pp. 261-265, 1988.
  5. Statistical Normalization and Back Pro International Journal of Computer Theory and Engineering, Vol.3, No.1, February, 2011 1793-8201pagation for Classification T.Jayalakshmi, Dr.A.Santhakumaran
  6. Data Normalization to Accelerate Training for Linear Neural Net to Predict Tropical Cyclone Tracks Jian Jin,1 Ming Li,2 and Long Jin3
  7. Bhatt K., Dalal P., Panwar A., “A Cluster Centres Initialization Method for Clustering Categorical Data Using Genetic Algorithm” International Journal of Digital Application & Contemporary research, 2013, Volume-2 Issue-1
  8. Huang, Zhexue, and Michael K. Ng. "A fuzzy k-modes algorithm for clustering categorical data." Fuzzy Systems, IEEE Transactions on 7.4 (1999): 446-452.
  9. Huang, Zhexue. "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining."DMKD.1997.
  10. Importance of Input Data Normalization for the Application of Neural Networks to Complex Industrial ProblemsJ. Sola and J. SevillaDepartment of Electrical and Electronic Engineering Universidad Pública de Navarra. 31006 Pamplona,Spain
Index Terms

Computer Science
Information Sciences

Keywords

Normalization Classification Diabetes Mellitus