International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 4 - Number 8 |
Year of Publication: 2010 |
Authors: Manu Pratap Singh, Rajdev Tiwari |
10.5120/847-1182 |
Manu Pratap Singh, Rajdev Tiwari . Article:Correlation-based Attribute Selection using Genetic Algorithm. International Journal of Computer Applications. 4, 8 ( August 2010), 28-34. DOI=10.5120/847-1182
A Data Warehouse (DW) is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site. DWs are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing. Integration of data sources refers to the task of developing a common schema as well as data transformation solutions for a number of data sources with related content. The large number and size of modern data sources make the process cumbersome. In such cases attribute subset selection is done on the basis of relevance analysis, in the form of correlation analysis to detect attributes that do not contribute much as far as characteristics of whole data is concern. After which the redundant attribute or attribute strongly correlated to some other attribute is disqualified to be the part of DW. Automated tools based on the existing methods for attribute subset selection may not yield optimal set of attributes all the time, which may degrade the performance of DW. This paper formulates and validates a method for selecting optimal attribute subset based on correlation using Genetic algorithm (GA)..