CFP last date
20 December 2024
Reseach Article

A Review of Data Cleansing Concepts – Achievable Goals and Limitations

by Kofi Adu-manu Sarpong, John Kingsley Arthur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 76 - Number 7
Year of Publication: 2013
Authors: Kofi Adu-manu Sarpong, John Kingsley Arthur
10.5120/13259-0737

Kofi Adu-manu Sarpong, John Kingsley Arthur . A Review of Data Cleansing Concepts – Achievable Goals and Limitations. International Journal of Computer Applications. 76, 7 ( August 2013), 19-22. DOI=10.5120/13259-0737

@article{ 10.5120/13259-0737,
author = { Kofi Adu-manu Sarpong, John Kingsley Arthur },
title = { A Review of Data Cleansing Concepts – Achievable Goals and Limitations },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 76 },
number = { 7 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 19-22 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume76/number7/13259-0737/ },
doi = { 10.5120/13259-0737 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:45:21.302274+05:30
%A Kofi Adu-manu Sarpong
%A John Kingsley Arthur
%T A Review of Data Cleansing Concepts – Achievable Goals and Limitations
%J International Journal of Computer Applications
%@ 0975-8887
%V 76
%N 7
%P 19-22
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data Cleansing is an activity involving a process of detecting and correcting the errors and inconsistencies in data warehouse. It deals with identification of corrupt and duplicate data inherent in the data sets of a data warehouse to enhance the quality of data. The study looked into investigating some research works conducted in the area of data cleansing. A thorough review into these existing works was studied to determine the achievable goals and the limitations that arose based on the approaches conducted by the researchers. They identification of errors by most of these researchers has led into the development of several frameworks and systems to be implemented in the area of data warehousing. Generally, these findings will contribute to the emerging empirical evidence of the strategic role data cleansing play in the growth of organizations, institutions and other government agencies in terms of data quality and reporting purposes and also to gain competitive advantage since they will overcome the mere existence of dirty data.

References
  1. Heiko Muller, Johann-Christoph Freytag. (2003). Problems, Methods, and Challenges in Comprehensive Data Cleansing, pp. 21.
  2. Rahm, E. , Do, H. H. (2000). Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bull. Vol 23 No. 4, pp. 3-13
  3. Monge, A. E. (2000). Matching Algorithms within a Duplicate Detection System. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp. 18-19.
  4. Jonathan I. Maletic, Andrian Marcus. (2000). Data Cleansing: Beyond Integrity Analysis, pp. 8.
  5. Louardi BRADJI, Mahmoud BOUFAIDA. (2011). Open User Involvement in Data Cleaning for Data Warehouse Quality. International Journal of Digital Information and Wireless Communications (IJDIWC) 1(2), pp. 573.
  6. Deku JerryYao,Mohammad Sarrab and Hamza Aldabbas (2012). Three Tier level Data Warehouse Architecture for Ghanaian Petroleum Industry. International Journal of Database Management Systems (IJDMS) Vol. 4, No. 5, pp 1
  7. Chapman, A. (2005). Principles and Methods of Data Cleaning – Primary Species and Species-Occurrence Data, Version 1. 0. Report for the Global Biodiversity Information Facility,Copenhagen,pp7
  8. Bradji, L. , Boufaida, M. (2011). Knowledge based data cleaning for data warehouse quality. In: Proc. 2011 International Conference on Digital Information Processing and Communications, ICDIPC2011,LNCS,Part II, CCIS no 189,pp. 373 -384
  9. Vassiliads, P. (2009). A Survey of Extract-Transform-Load Technology. In InternationalJournal of Data Warehousing & Mining,vol. 5 ,no. 3, pp. 1-27
Index Terms

Computer Science
Information Sciences

Keywords

Data inconsistency identification of errors organization growth