International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 62 - Number 17 |
Year of Publication: 2013 |
Authors: Sonal Porwal, Deepali Vora |
10.5120/10175-5041 |
Sonal Porwal, Deepali Vora . A Comparative Analysis of Data Cleaning Approaches to Dirty Data. International Journal of Computer Applications. 62, 17 ( January 2013), 30-34. DOI=10.5120/10175-5041
Data Cleansing or (data scrubbing) is an activity involving a process of detecting and correcting the errors and inconsistencies in data warehouse. Thus poor quality data i. e. ; dirty data present in a data mart can be avoided using various data cleaning strategies, and thus leading to more accurate and hence reliable decision making. The quality data can only be produced by cleaning the data and pre-processing it prior to loading it in the data warehouse. As not all the algorithms address the problems related to every type of dirty data, one has to prioritize the need of its organization and use the algorithm according to their requirements and occurrence of dirty data. This paper focuses on the two data cleaning algorithms: Alliance Rules and HADCLEAN and their approaches towards the data quality. It also includes a comparison of the various factors and aspects common to both.