National Conference on Advances in Computer Science and Applications (NCACSA 2012) |
Foundation of Computer Science USA |
NCACSA - Number 4 |
May 2012 |
Authors: P. M. Kiran, A. Prakash Rao, B. Ratnamala |
554fb9cc-3abf-4462-961a-2567c02bde2f |
P. M. Kiran, A. Prakash Rao, B. Ratnamala . An Efficient Approach for Filling Incomplete Data. National Conference on Advances in Computer Science and Applications (NCACSA 2012). NCACSA, 4 (May 2012), 23-27.
Good data preparation is a key prerequisite to successful data mining. Conventional wisdom suggests that data preparation takes about 60 to 80% of the time involved in a data mining exercise. There have been good reviews of the problems associated with data preparation. However the data preprocessing is a crucial step used for variety of data warehousing and mining. Real world data is noisy and can often suffer from corruptions or incomplete values that may impact the models created from the data. Accuracy of any mining algorithm greatly depends on the input datasets. In this paper we describe a novel idea of predicting the missing values in the dataset by a well known principle of Maximum likelihood EM (Expectation Maximization). After doing implementing and applying the EM filter, the dataset is completed with the estimated values, based on the well known principle of expected maximization of attribute instance. We demonstrate the efficacy of the approach on real data sets as a preprocessing step.