An Efficient Approach for Filling Incomplete Data

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

An Efficient Approach for Filling Incomplete Data

Published on May 2012 by P. M. Kiran, A. Prakash Rao, B. Ratnamala

National Conference on Advances in Computer Science and Applications (NCACSA 2012)

Foundation of Computer Science USA

NCACSA - Number 4

May 2012

Authors: P. M. Kiran, A. Prakash Rao, B. Ratnamala

P. M. Kiran, A. Prakash Rao, B. Ratnamala . An Efficient Approach for Filling Incomplete Data. National Conference on Advances in Computer Science and Applications (NCACSA 2012). NCACSA, 4 (May 2012), 23-27.

@article{

author = { P. M. Kiran, A. Prakash Rao, B. Ratnamala },

title = { An Efficient Approach for Filling Incomplete Data },

journal = { National Conference on Advances in Computer Science and Applications (NCACSA 2012) },

issue_date = { May 2012 },

volume = { NCACSA },

number = { 4 },

month = { May },

year = { 2012 },

issn = 0975-8887,

pages = { 23-27 },

numpages = 5,

url = { /proceedings/ncacsa/number4/6503-1028/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Advances in Computer Science and Applications (NCACSA 2012)

%A P. M. Kiran

%A A. Prakash Rao

%A B. Ratnamala

%T An Efficient Approach for Filling Incomplete Data

%J National Conference on Advances in Computer Science and Applications (NCACSA 2012)

%@ 0975-8887

%V NCACSA

%N 4

%P 23-27

%D 2012

%I International Journal of Computer Applications

Abstract

Good data preparation is a key prerequisite to successful data mining. Conventional wisdom suggests that data preparation takes about 60 to 80% of the time involved in a data mining exercise. There have been good reviews of the problems associated with data preparation. However the data preprocessing is a crucial step used for variety of data warehousing and mining. Real world data is noisy and can often suffer from corruptions or incomplete values that may impact the models created from the data. Accuracy of any mining algorithm greatly depends on the input datasets. In this paper we describe a novel idea of predicting the missing values in the dataset by a well known principle of Maximum likelihood EM (Expectation Maximization). After doing implementing and applying the EM filter, the dataset is completed with the estimated values, based on the well known principle of expected maximization of attribute instance. We demonstrate the efficacy of the approach on real data sets as a preprocessing step.

References

Sameer S. Prabhune, Dr. S. R. Sathe "Reconstruction of a Complete Dataset from an IncompleteDataset by Expectation Maximization Technique", International Journal of Computer Science and Network Security, VOL. 10 No. 11, November 2010
Data Preparation for Data Mining, D Pyle, 1999, Morgan Kaufmann Inc. , ISBN 1-55860-529-0.
S. Parthsarthy and C. C. Aggarwal, "On the Use of Conceptual Reconstruction for Mining Massively Incomplete Data Sets, "IEEE Trans. Knowledge and Data Eng. , pp. 1512-1521,2003.
J. Quinlan, C4. 5: Programs for Machine Learning, San Mateo, Calif. : Morgan Kaufmann, 1993.
S. Mehta,S. Parthsarthy and H. Yang " Toward Unsupervised correlation preserving discretization", IEEE Trans. Knowledge and Data Eng. ,pp 1174- 1185 ,2005.
Ian H. Witten and Eibe Frank , "Data Mining: Practical Machine Learning Tools and Techniques" Second Edition, Morgan Kaufmann Publishers. ISBN:81-312-0050-
R. Little, D. Rubin. Statastical Analysis with Missing Data. Ch. 8 , pp 164-172,Wiley Series in Probability and Statistics, 2002.
UCI Machine Learning Repository,
Jiawei Han and Micheline Kamber "Data Mining Concepts and techniques "
M. richardson and P. Domingos. Mining Knowledge –sharing sites for viral marketing.
Data Mining Leading Edge: Insurance & Banking, D Romano in Proceedings of Knowledge Discovery and Data Mining, Unicom, Brunel University, 1997.
Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem, M A Hernandez and S J Stolfo, Data Mining and Knowledge Discovery 2,p1-31, 1998.

Index Terms

Computer Science

Information Sciences

Keywords

Data Mining Data Preprocessing Missing Data