Comparative Analysis of Different Imputation Methods to Treat Missing Values in Data Mining Environment

Rahul Singhai

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

An Easily Comprehendible Unicode based Sorting Algorithm for Bangla Words

October

2013

Detection and Prevention of Sybil Attack in MANET using MAC Address

July

2015

A Comparative Study of Assessing Software Reliability using SPC: An MMLE Approach

July

2012

Performance Comparison of Three Types of Sensor Matrices for Indoor Multi-Robot Localization

Nov

2018

Reseach Article

Comparative Analysis of Different Imputation Methods to Treat Missing Values in Data Mining Environment

by Rahul Singhai

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 82 - Number 6

Year of Publication: 2013

Authors: Rahul Singhai

10.5120/14122-2236

Rahul Singhai . Comparative Analysis of Different Imputation Methods to Treat Missing Values in Data Mining Environment. International Journal of Computer Applications. 82, 6 ( November 2013), 34-42. DOI=10.5120/14122-2236

@article{ 10.5120/14122-2236,

author = { Rahul Singhai },

title = { Comparative Analysis of Different Imputation Methods to Treat Missing Values in Data Mining Environment },

journal = { International Journal of Computer Applications },

issue_date = { November 2013 },

volume = { 82 },

number = { 6 },

month = { November },

year = { 2013 },

issn = { 0975-8887 },

pages = { 34-42 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume82/number6/14122-2236/ },

doi = { 10.5120/14122-2236 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:57:05.885469+05:30

%A Rahul Singhai

%T Comparative Analysis of Different Imputation Methods to Treat Missing Values in Data Mining Environment

%J International Journal of Computer Applications

%@ 0975-8887

%V 82

%N 6

%P 34-42

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Data cleaning is one of the important step of KDD (Knowledge discovery in database) process. One critical problem in data cleaning is the presence of missing values. Various approaches have proposed to find & replace such missing data including use of mean value, use of global constant, replace by more probable value etc. Imputation is one of the important procedures in statistics that is used to replace the missing values in a data set. One advantage of this approach is that the missing data treatment is independent of the learning algorithms that are used. This allows the user to select the most suitable and appropriate imputation method for each situation. This paper analyze the six different imputation methods proposed in the field of statistics and implement them in Data mining environment. An artificial data set of 1000 records is used to analyze the performance of these methods. For testing the significance of these methods Z-test approach were used. Exhaustive experiments show the effectiveness of the proposed methods. It is assumed that all the attributes of input data are of numeric data type.

References

Ahmed, M. S. , Al-Titi, O. , Al-Rawi, Z. and Abu-Dayyeh, W. 2006. Estimation of a population mean using different imputation methods, Statistics in Transition, 7, 6, 1247-1264.
Cochran, W. G. 2005. Sampling Techniques, John Wiley and Sons, New York.
G. E. A. P. A. Batista and M. C. Monard. K-Nearest Neighbour as Imputation Method 2002. Experimental Results. Technical report, ICMC-USP, ISSN-0103-2569.
Heitjan, D. F. and Basu, S. 1996. Distinguishing 'Missing at random' and 'missing completely at random', The American Statistician, 50, 207-213.
J. W. Grzymala-Busse and M. Hu. A Comparison of Several Approaches to Missing Attribute Values in Data Mining 2000. In RSCTC'2000, pages 340–347.
K. Lakshminarayan, S. A. Harp, and T. Samad. 1999. Imputation of Missing Data in Industrial Databases. Applied Intelligence, 11:259–275.
R. J. Little and D. B. Rubin. 1987. Statistical Analysis with Missing Data. John Wiley and Sons, New York, 1987.
Rao, J. N. K. and Sitter, R. R. 1995. Variance estimation under two-phase sampling with application to imputation for missing data, Biometrica, 82, 453-460.
Reddy, V. N. 1978. A study on the use of prior knowledge on certain population parameters in estimation, Sankhya, C, 40, 29-37.
Rubin, D. B. 1976. Inference and missing data, Biometrica, 63, 581-593.
Shukla, D. 2002. F-T estimator under two-phase sampling, Metron, 59, 1-2, 253-263.
Shukla, D. and Thakur, N. S. 2008. Estimation of mean with imputation of missing data using factor-type estimator, Statistics in Transition, 9, 1, 33-48.
Thakur, N. S. , Yadav Kalpana, and Pathak S. 2012. Some imputation methods in double sampling scheme for estimation of population mean, IJMER, Vol. 2, Issue. 1 Jan-Feb 2012 pp-200-207.
Thakur, N. S. , Yadav Kalpana, and Pathak S. 2011. Estimation of mean in presence of missingdata under two-phase sampling scheme, JRSS,Vol 4, issue 2,93-104.
Singh, S. 2009. A new method of imputation in survey sampling, Statistics, Vol. 43, 5 , 499 - 511.
Singh, S. and Horn, S. 2000. Compromised imputation in survey sampling, Metrika, 51, 266-276.
Singh, V. K. and Shukla, D. 1993. An efficient one parameter family of factor - type estimator in sample survey, Metron, 51, 1-2, 139-159.
Singhai, R 2013. Comparative Study of Three Imputation Methods to Treat Missing Values, IJCT, Council of Inovative Research, 2013.

Index Terms

Computer Science

Information Sciences

Keywords

KDD Data mining Imputation methods Data pre-processing sampling attribute missing values.