We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Article:Correlation-based Attribute Selection using Genetic Algorithm

by Manu Pratap Singh, Rajdev Tiwari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 4 - Number 8
Year of Publication: 2010
Authors: Manu Pratap Singh, Rajdev Tiwari
10.5120/847-1182

Manu Pratap Singh, Rajdev Tiwari . Article:Correlation-based Attribute Selection using Genetic Algorithm. International Journal of Computer Applications. 4, 8 ( August 2010), 28-34. DOI=10.5120/847-1182

@article{ 10.5120/847-1182,
author = { Manu Pratap Singh, Rajdev Tiwari },
title = { Article:Correlation-based Attribute Selection using Genetic Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { August 2010 },
volume = { 4 },
number = { 8 },
month = { August },
year = { 2010 },
issn = { 0975-8887 },
pages = { 28-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume4/number8/847-1182/ },
doi = { 10.5120/847-1182 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:52:34.388650+05:30
%A Manu Pratap Singh
%A Rajdev Tiwari
%T Article:Correlation-based Attribute Selection using Genetic Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 4
%N 8
%P 28-34
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A Data Warehouse (DW) is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site. DWs are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing. Integration of data sources refers to the task of developing a common schema as well as data transformation solutions for a number of data sources with related content. The large number and size of modern data sources make the process cumbersome. In such cases attribute subset selection is done on the basis of relevance analysis, in the form of correlation analysis to detect attributes that do not contribute much as far as characteristics of whole data is concern. After which the redundant attribute or attribute strongly correlated to some other attribute is disqualified to be the part of DW. Automated tools based on the existing methods for attribute subset selection may not yield optimal set of attributes all the time, which may degrade the performance of DW. This paper formulates and validates a method for selecting optimal attribute subset based on correlation using Genetic algorithm (GA)..

References
  1. H. Liu and H.Motoda. Feature Selection for Knowledge Discovery and DataMining. Boston: Kluwer Academic Publishers, 1998.
  2. D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, 1999.
  3. Han & Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, 2006.
  4. Huan Liu and Lei Yu. Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering, Volume 17, Issue 4, Pages: 491 - 502 , 2005
  5. H. Liu and H.Motoda. Feature Selection for Knowledge Discovery and DataMining. Boston: Kluwer Academic Publishers, 1998.
  6. M. Dash, K. Choi, P. Scheuermann, and H. Liu. Feature selection for clustering – a filter solution. In Proceedings of the Second International Conference on Data Mining, pages 115–122, 2002.
  7. M.A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 359–366, 2000.
  8. L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. In Proceedings of the twentieth International Conference on Machine Learning, pages 856–863, 2003.
  9. J. G. Dy and C. E. Brodley. Feature subset selection and order identification for unsupervised learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 247–254, 2000.
  10. Y. Kim, W. Street, and F. Menczer. Feature selection for unsupervised learning via evolutionary search. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 365–369, 2000.
  11. S. Das. Filters, wrappers and a boosting-based hybrid for feature selection. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 74–81, 2001.
  12. E. Xing, M. Jordan, and R. Karp. Feature selection for high-dimensional genomic microarray data. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 601–608, 2001.
  13. M.A.Jayaram , Asha Gowda Karegowda, A.S. Manjunath. Feature Subset Selection Problem using Wrapper Approach in Supervised Learning. International Journal of Computer Applications (0975 – 8887) Volume 1 – No. 7, pages 13-16, 2010.
  14. Ron Kohavi and Dan Sommerfield. Feature subset selection using the wrapper method:Overfitting and Dynamic Search Space Technology. First International Conference on Knowledge Discovery and Data Mining, 1995.
  15. Jihoon Yang and Vasant Honavar. Feature suset selection using Genetic Algorithm. IEEE Intelligent Systems, 1998.
  16. Huan Liu, Hiroshi Motoda and Lie Yu. Feature selection with selective sampling. Proceedings of Nineteenth International Conference on Machine Learning, Morgan Kaufman Publishers Inc. pages 395-402, 2002.
  17. Noelia Sanchez-Marono, Amparo Alonso-Betanzos and Enrique Castillo. A New Wrapper Method for Feature Subset Selection. Proceeding of European Symposium on Artificial Neural Networks, Belgium, 2005.
  18. Payam Refaeilzadeh and Lei Tang and Huan Liu. On Comparison of Feature Selection Algorithms. Association for the Advancement of Artificial Intelligence, 2007.
  19. Feng Tan, Xuezheng Fu, Yanqing Zhang,and Anu G. Bourgeois. A genetic algorithm-based method for feature subset selection. Soft Computing - A Fusion of Foundations, Methodologies and Applications, Volume 12 , Issue 2 , Pages: 111 – 120, Springer-Verlag, 2007.
  20. Li Zhuo , Jing Zheng, Fang Wang, Xia Li, Bin Ai and Junping Qian. A Genetic Algorithm based Wrapper Feature selection method for Classification of Hyperspectral Images using Support Vector Machine. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing, 2008.
  21. I. H. Witten, E. Frank.. Data Mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufman, San Francisco, 2005.
Index Terms

Computer Science
Information Sciences

Keywords

Molecular Electronics Molecular Orbital Atomic Orbital