International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 32 - Number 8 |
Year of Publication: 2011 |
Authors: Daya Gupta, Payal Pahwa, Rajiv Arora |
10.5120/3922-5533 |
Daya Gupta, Payal Pahwa, Rajiv Arora . Article:Novel Framework and Model for Data Warehouse Cleansing. International Journal of Computer Applications. 32, 8 ( October 2011), 6-13. DOI=10.5120/3922-5533
Data cleansing is a process that deals with identification of corrupt and duplicate data inherent in the data sets of a data warehouse to enhance the quality of data. This paper aims to facilitate the data cleaning process by addressing the problem of duplicate records detection pertaining to the ‘name’ attributes of the data sets. It provides a sequence of algorithms through a novel framework for identifying duplicity in the ‘name’ attribute of the data sets of an already existing data warehouse. The key features of the research includes its proposal of a novel framework through a well defined sequence of algorithms and refining the application of alliance rules [1] by incorporating the use of previously existing and well defined similarity computation measures. The results depicted show the feasibility and validity of the suggested method.