Feature Selection using Clustering Approach for Big Data

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

An Easily Comprehendible Unicode based Sorting Algorithm for Bangla Words

October

2013

Detection and Prevention of Sybil Attack in MANET using MAC Address

July

2015

A Comparative Study of Assessing Software Reliability using SPC: An MMLE Approach

July

2012

Performance Comparison of Three Types of Sensor Matrices for Indoor Multi-Robot Localization

Nov

2018

Reseach Article

Feature Selection using Clustering Approach for Big Data

Published on December 2014 by Harshali D.gangurde

Innovations and Trends in Computer and Communication Engineering

Foundation of Computer Science USA

ITCCE - Number 4

December 2014

Authors: Harshali D.gangurde

Harshali D.gangurde . Feature Selection using Clustering Approach for Big Data. Innovations and Trends in Computer and Communication Engineering. ITCCE, 4 (December 2014), 1-3.

@article{

author = { Harshali D.gangurde },

title = { Feature Selection using Clustering Approach for Big Data },

journal = { Innovations and Trends in Computer and Communication Engineering },

issue_date = { December 2014 },

volume = { ITCCE },

number = { 4 },

month = { December },

year = { 2014 },

issn = 0975-8887,

pages = { 1-3 },

numpages = 3,

url = { /proceedings/itcce/number4/19058-2024/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 Innovations and Trends in Computer and Communication Engineering

%A Harshali D.gangurde

%T Feature Selection using Clustering Approach for Big Data

%J Innovations and Trends in Computer and Communication Engineering

%@ 0975-8887

%V ITCCE

%N 4

%P 1-3

%D 2014

%I International Journal of Computer Applications

Abstract

Feature selection has been a productive field of research and development in data mining, machine learning and statistical pattern recognition, and is widely applied to many fields such as, image retrieval, genomic analysis and text categorization. Feature selection includes selecting the most useful features from the given data set. The feature selection involves removing irrelevant and redundant features form the data set. The feature selection can be efficient and effective using clustering approach. Based on the criteria of efficiency in terms of time complexity and effectiveness in terms of quality of data, useful features from the big data can be selected. Feature selection reduces the computational complexity of learning and prediction algorithms and saves on the cost of measuring non selected features. The feature selection can be done using the graph clustering approach based on theoretic graph. The most relevant features are selected from the cluster for the relevant target class. The features in every cluster are different and independent of the other.

References

QinbaoSong, Jingjie Ni and Guangtao Wang, "A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data", IEEE transaction on Knowledge and Data Engineering 2013
John G. H. , Kohavi R. and Pfleger K. , "Irrelevant Features and the Subset Selection Problem", Proceedings of the Eleventh International Conference on Machine Learning, pp 121-129, 1994.
Koller D and SahamiM. ,"Toward optimal feature selection", Proceedings of International Conference on Machine Learning, pp 284-292, 1996.
Yu L. and Liu H. ," Redundancy based feature selection for microarray data", Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 737-742, 2004
Press W. H. , Flannery B. P. , Teukolsky S. A. and Vetterling W. T. , "Numerical recipes in C". Cambridge University Press, Cambridge, 1988.
Hall M. A. , "Correlation-Based Feature Subset Selection for Machine Learning," Ph. D. dissertation Waikato, New Zealand: Univ. Waikato, 1999.
Hall M. A. and Smith L. A. , "Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper", Proceedings of the Twelfth international Florida Artificial intelligence Research Society Conference, pp. 235-239, 1999
Yu L. and Liu H. , "Feature selection for high-dimensional data: a fast correlation-based filter solution", Proceedings of 20th International Conference on Machine Leaning, 20(2), pp. 856-863, 2003.
Yu L and Liu H. "Efficient feature selection via analysis of relevance and redundancy", Journal of Machine Learning Research, 10(5), pp. 1205-1224, 2004.
Zhao Z. and Liu H. , "Searching for interacting features", Proceedings of the 20th International Joint Conference on AI, 2007
Zhao Z. and Liu H. , "Searching for Interacting Features in Subset Selection" ,Journal Intelligent Data Analysis, 13(2), pp. 207-228, 2009.
Butterworth R. , Piatetsky-Shapiro G. and Simovici D. A. , "On Feature Selection through Clustering", Proceedings of the Fifth IEEE international Conference on Data Mining, pp 581-584, 2005.
Quinlan J. R. , C4. 5: Programs for Machine Learning. San Mateo, Calif: Morgan Kaufman, 1993
Zhongzhe Xiao, Emmanuel Dellandrea, Weibei Dou, Liming Chen. , "ESFS: A new embedded feature selection method based on SFS", Department of Electronic Engineering, Tsinghua University, Beijing, 100084, P. R. China.
Kononenko I. , Estimating Attributes:. ,"Analysis and Extensions of RELIEF", Proceedings of the 1994 European Conference on Machine Learning, pp 171-182, 1994. ,
Pereira F. , Tishby N. and Lee L. ,"Distributional clustering of EnglishWords", Proceedings of the 31st Annual Meeting on Association forComputationalLinguistics, pp 183-190, 1993.
Dash M. , Liu H. and Motoda H. , "Consistency based feature Selection"Proceedings of the Fourth Pacific Asia Conference on Knowledge Discovery and Data Mining, pp. 98-109,2000.
Fleuret F. , "Fast binary feature selection with conditional mutual Information",Journal of Machine Learning Research, 5, pp 1531-1555, 2004.

Index Terms

Computer Science

Information Sciences

Keywords

Feature Selection Clustering