International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 105 - Number 8 |
Year of Publication: 2014 |
Authors: Arup Kumar Bhattacharjee, Partha Chatterjee, Mukesh Prasad Shaw, Manomoy Chakraborty |
10.5120/18399-9661 |
Arup Kumar Bhattacharjee, Partha Chatterjee, Mukesh Prasad Shaw, Manomoy Chakraborty . ETL based Cleaning on Database. International Journal of Computer Applications. 105, 8 ( November 2014), 34-40. DOI=10.5120/18399-9661
The paper analyses the problem of data cleaning and automatically identifying the "incorrect and inconsistent data" in the dataset. Extraction, Transformation and Loading (ETL) are the different steps for cleaning a data warehouse. Authors have implemented different algorithms like: cleanString, cleanNumber, hit ratio, check data dictionary, check metadata etc in addition to various existing data cleaning algorithm like PNRS. This paper tries is to improve the quality of data in the database system. This paper emphasizes on the citizen database system to make it errorless. Some of the results along with certain statistics are also provided here.