CFP last date
20 December 2024
Reseach Article

Analysis of Data Cleansing Approaches regarding Dirty Data – A Comparative Study

by Kofi Adu-manu Sarpong, John Kingsley Arthur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 76 - Number 7
Year of Publication: 2013
Authors: Kofi Adu-manu Sarpong, John Kingsley Arthur
10.5120/13258-0736

Kofi Adu-manu Sarpong, John Kingsley Arthur . Analysis of Data Cleansing Approaches regarding Dirty Data – A Comparative Study. International Journal of Computer Applications. 76, 7 ( August 2013), 14-18. DOI=10.5120/13258-0736

@article{ 10.5120/13258-0736,
author = { Kofi Adu-manu Sarpong, John Kingsley Arthur },
title = { Analysis of Data Cleansing Approaches regarding Dirty Data – A Comparative Study },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 76 },
number = { 7 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 14-18 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume76/number7/13258-0736/ },
doi = { 10.5120/13258-0736 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:45:16.585740+05:30
%A Kofi Adu-manu Sarpong
%A John Kingsley Arthur
%T Analysis of Data Cleansing Approaches regarding Dirty Data – A Comparative Study
%J International Journal of Computer Applications
%@ 0975-8887
%V 76
%N 7
%P 14-18
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data Cleansing is an activity involving a process of detecting and correcting the errors and inconsistencies in data warehouse. It deals with identification of corrupt and duplicate data inherent in the data sets of a data warehouse to enhance the quality of data. The research was directed at investigating some existing approaches and frameworks to data cleansing. That attempted to solve the data cleansing problem and came up with their strengths and weaknesses which led to the identification of gabs in those frameworks and approaches. A comparative analysis of the four frameworks was conducted and by using standard testing parameters a proposed feature was discussed to fit in the gaps.

References
  1. Heiko Muller, Johann-Christoph Freytag. (2003). Problems, Methods, and Challenges in Comprehensive Data Cleansing, pp. 21.
  2. Raman V and Hellerstein J. M, Potter's Wheel: An Interactive Data Cleaning System, Proceedings of the 27th VLDB Conference, Roma, Italy, 2001, pp. 1-10.
  3. F. Naumann, Quality-Driven Query Answering for Integrated Information Systems, Lecture Notes in Computer Science, LNCS 2261, Springer, 2002. pp. 34
  4. Louardi BRADJI, Mahmoud BOUFAIDA. (2011). Open User Involvement in Data Cleaning for Data Warehouse Quality. International Journal of Digital Information and Wireless Communications (IJDIWC) 1(2), pp. 573.
  5. Mong L. L, Tok W. L and Wai L. L. (2000). IntelliClean : A Knowledge-Based Intelligent Data Cleaner, ACM, pp. 290-294
  6. P. Vassiliadis, Z. a Vagena, S. Skiadopoulos, N. Karayannidis, T. Sellis. (2001). ARKTOS: towards the modeling, design, control and execution of ETL Processes . Information Systems, Vol. 26 , pp. 537-556.
  7. Panos V. , Zografoula V, Spiros S. , and Nikos K. (2000). ARKTOS: A Tool For Data Cleaning and Transformation in Data Warehouse Environments. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp. 1-6
  8. H. Galhards, D. Florescu, D. Shasha, E. Simon. (May 2000). AJAX: An extensible data cleaning tool. Proceedings of the ACM SIGMOD on Management of data, Dallas, TX USA, pp. 21-22.
  9. Herbert, K. G. , Wang, J. T. L. (2007). Biological data cleaning: a case study. In Int. J. of Information Quality, vol. 1, number. 1, pp. 60–82
  10. S. Chaudhuri and U. Dayal. (1997). An overview of data warehousing and OLAP technology. In SIGMOD Record. pp. 65-74
Index Terms

Computer Science
Information Sciences

Keywords

Framework Strengths and weaknesses gap analysis