A Comparative Analysis of Data Cleaning Approaches to Dirty Data

Sonal Porwal; Deepali Vora

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Metaheuristic Algorithm for Robotic Path Planning

January

2014

An Adaptive Query based Product Recommendation System

Sep

2019

A Location Tracking Protocol Over Visible Light Communication

Nov

2019

A Regression Analysis and Study of COX-2 Inhibitors Verified by Phylogenetic Tree

Jan

2017

Reseach Article

A Comparative Analysis of Data Cleaning Approaches to Dirty Data

by Sonal Porwal, Deepali Vora

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 62 - Number 17

Year of Publication: 2013

Authors: Sonal Porwal, Deepali Vora

10.5120/10175-5041

Sonal Porwal, Deepali Vora . A Comparative Analysis of Data Cleaning Approaches to Dirty Data. International Journal of Computer Applications. 62, 17 ( January 2013), 30-34. DOI=10.5120/10175-5041

@article{ 10.5120/10175-5041,

author = { Sonal Porwal, Deepali Vora },

title = { A Comparative Analysis of Data Cleaning Approaches to Dirty Data },

journal = { International Journal of Computer Applications },

issue_date = { January 2013 },

volume = { 62 },

number = { 17 },

month = { January },

year = { 2013 },

issn = { 0975-8887 },

pages = { 30-34 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume62/number17/10175-5041/ },

doi = { 10.5120/10175-5041 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:12:06.925604+05:30

%A Sonal Porwal

%A Deepali Vora

%T A Comparative Analysis of Data Cleaning Approaches to Dirty Data

%J International Journal of Computer Applications

%@ 0975-8887

%V 62

%N 17

%P 30-34

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Data Cleansing or (data scrubbing) is an activity involving a process of detecting and correcting the errors and inconsistencies in data warehouse. Thus poor quality data i. e. ; dirty data present in a data mart can be avoided using various data cleaning strategies, and thus leading to more accurate and hence reliable decision making. The quality data can only be produced by cleaning the data and pre-processing it prior to loading it in the data warehouse. As not all the algorithms address the problems related to every type of dirty data, one has to prioritize the need of its organization and use the algorithm according to their requirements and occurrence of dirty data. This paper focuses on the two data cleaning algorithms: Alliance Rules and HADCLEAN and their approaches towards the data quality. It also includes a comparison of the various factors and aspects common to both.

References

Rajiv Arora,PayalPahwa and ShubhaBansal,"Alliance Rules for Data Warehouse Cleansing", 2009. IEEE Press, Pages 743-747.
ArindamPaul,VaruniGanesan,"HADCLEAN:A Hybrid Approach to Data Cleaning in Data Warehouses",2012. IEEE Press,Pages 136-142.
Dr. MortadhaM. Hamad,AlaaAbdulkarJihad,"An Enhanced Technique to Clean Data in the Data Warehouse",2011,IEEE.
Kamran Ali,MubeenAhmed,"A framework to implement Data Cleaning in Enterprise Data Warehouse for Robust Data Quality",2010,IEEE Press,Pages 1-6.
W. Kim, B. Choi, E. Hong, S. Kim and D. Lee, "A taxonomy of dirtydata," Data Mining and Knowledge Discovery, 7, 81–99, 2003.
J. Jebamalar Tamilselvi,Dr. V. Saravanan,"Handling Noisy Data using Attribute Selection and Smart Tokens",2008. IEEE Press,Pages 770-774.
Yan Hao,"Research on Information Quality Driven Data Cleaning Framework",2008. IEEE ,Pages 537-539
WaiLupLow,Mong Li Lee, "A Knowledge based Approach for Duplicate Elimination in Data Cleaning", School of Computing, National University Singapore.
Lukasz Ciszak,"Application of Clustering and Association Methods in Data Cleaning,2008,IEEE,proceedings of the International Multiconference on Computer Science, Pages 97-103.
Mariam Rehman,"Duplicate Record Detection for Database Cleaning", 2009. IEEEconference. ,Pages 333-338.
Deaton, Thao Doan, T. Schweiger, "Semantic Data Matching Principles and Performance", Data Engineering - International Series in Operations Research & Management Science, Springer US, vol. 132, pp. 77-90, 2010

Index Terms

Computer Science

Information Sciences

Keywords

HADCLEAN PNRS phonetic algorithm alliance rules transitive closure near miss strategy scores