National Conference on Role of Engineers in National Building |
Foundation of Computer Science USA |
NCRENB - Number 1 |
June 2014 |
Authors: Ashwini M. Save, Seema Kolkur |
4d58345c-d42d-4ccd-9b7a-7e24489e7a79 |
Ashwini M. Save, Seema Kolkur . Hybrid Technique for Data Cleaning. National Conference on Role of Engineers in National Building. NCRENB, 1 (June 2014), 4-8.
Data warehouse contains large volume of data. Data quality is an important issue in data warehousing projects. Many business decision processes are based on the data entered in the data warehouse. Hence for accurate data, improving the data quality is necessary. Data may include text errors, quantitative errors or even duplication of the data. There are several ways to remove such errors and inconsistencies from the data. Data cleaning is a process of detecting and correcting inaccurate data. Different types of algorithms such as Improved PNRS algorithm, Quantitative algorithm and Transitive algorithm are used for the data cleaning process. In this paper an attempt has been made to clean the data in the data warehouse by combining different approaches of data cleaning. Text data will be cleaned by Improved PNRS algorithm, Quantitative data will be cleaned by special rules i. e. Enhanced technique. And lastly duplication of the data will be removed by Transitive closure algorithm. By applying these algorithms one after other on data sets, the accuracy level of the dataset will get increased.