CFP last date
20 December 2024
Reseach Article

An Ameliorated Methodology for Feature Subset Selection on High Dimensional Data using Precise Relevance Measures

by Kaveri B.V., Asha T.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 127 - Number 7
Year of Publication: 2015
Authors: Kaveri B.V., Asha T.
10.5120/ijca2015906344

Kaveri B.V., Asha T. . An Ameliorated Methodology for Feature Subset Selection on High Dimensional Data using Precise Relevance Measures. International Journal of Computer Applications. 127, 7 ( October 2015), 32-36. DOI=10.5120/ijca2015906344

@article{ 10.5120/ijca2015906344,
author = { Kaveri B.V., Asha T. },
title = { An Ameliorated Methodology for Feature Subset Selection on High Dimensional Data using Precise Relevance Measures },
journal = { International Journal of Computer Applications },
issue_date = { October 2015 },
volume = { 127 },
number = { 7 },
month = { October },
year = { 2015 },
issn = { 0975-8887 },
pages = { 32-36 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume127/number7/22744-2015906344/ },
doi = { 10.5120/ijca2015906344 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:19:17.037632+05:30
%A Kaveri B.V.
%A Asha T.
%T An Ameliorated Methodology for Feature Subset Selection on High Dimensional Data using Precise Relevance Measures
%J International Journal of Computer Applications
%@ 0975-8887
%V 127
%N 7
%P 32-36
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Attribute subset selection refers to the method of choosing the set of attributes that best describes the dataset. The attributes obtained from the attribute subset selection method when applied to machine learning operations such as clustering, classification etc., should provide the same result as that of the original dataset. The method employed for attribute subset selection must be efficient in terms of selecting the relevant attributes and must also be accurate in terms of eliminating the redundant attributes. With the aim of satisfying the above two goals we have designed a feature subset selection method using the precise relevance measures. We first efficiently select the relevant attributes using the relevance measure “symmetric uncertainty (SU)”. The selected relevant attributes are, then divided into clusters based on “graph-theoretic” clustering method using the relevance measure “conditional mutual information (CMI)”. Then the relevance measure “symmetric uncertainty” is used to select the attributes that are strongly related to the target class and also which best represents each cluster, thus giving us an accurate and independent subset of features. The above developed method not only produces smaller more accurate subset of features but also improves the performance of the machine learning operations such as naive base classifier

References
  1. Conference on Machine Learning, pp 74-81 (2001). Guyon I. and Elisseeff A., An introduction to variable and feature selection, Journal of Machine Learning Research, 3, pp 1157-1182 (2003).
  2. Fleuret F., Fast binary feature selection with conditional mutual Information, Journal of Machine Learning Research, 5, pp 1531-1555 (2004).
  3. Dhillon I.S., Mallela S. and Kumar R., A divisive information theoretic feature clustering algorithm for text classification, J. Mach. Learn. Res.,3,pp 1265-1287 (2003).
  4. Dash M. and Liu H., Feature Selection for Classification, Intelligent Data Analysis, 1(3), pp 131-156 (1997).
  5. Das S., Filters, wrappers and a boosting-based hybrid for feature Selection, In Proceedings of the Eighteenth International Conference on Machine Learning, pp 74-81, 2001.
  6. Baker L.D. and McCallum A.K., Distributional clustering of words for text classification, In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval, pp 96103 (1998).
  7. Hall M.A., Correlation-Based Feature Subset Selection for Machine Learning, Ph.D. dissertation Waikato, New Zealand: Univ. Waikato (1999).
  8. John G.H., Kohavi R. and Pfleger K., Irrelevant Features and the Subset Selection Problem, In the Proceedings of the Eleventh International Conference on Machine Learning, pp 121-129 (1994).
  9. Kira K. and Rendell L.A., The feature selection problem: Traditional methods and a new algorithm, In Proceedings of Nineth National Conference on Artificial Intelligence, pp 129-134 (1992).
  10. Kohavi R. and John G.H., Wrappers for feature subset selection, Artificial Intelligence, Intell., 97(1-2), pp 273-324 (1997).
  11. Koller D. and Sahami M., Toward optimal feature selection, In Proceedings of International Conference on Machine Learning, pp 284-292 (1996).
  12. Kononenko I., Estimating Attributes: Analysis and Extensions of RELIEF, In Proceedings of the 1994 European Conference on Machine Learning, pp 171-182 (1994).
  13. Krier C., Francois D., Rossi F. and Verleysen M., Feature clustering and mutual information for the selection of variables in spectral data, In Proc European Symposium on Artificial Neural Networks Advances in Computational Intelligence and Learning, pp 157-162 (2007).
  14. Langley P., Selection of relevant features in machine learning, In Proceedings of the AAAI Fall Symposium on Relevance, pp 1-5 (1994).
  15. Mitchell T.M., Generalization as Search, Artificial Intelligence, 18(2), pp 203-226 (1982).
  16. Ng A.Y., On feature selection: learning with exponentially many irrelevant features as training examples, In Proceedings of the Fifteenth International Conference on Machine Learning, pp 404-412 (1998).
  17. Pereira F., Tishby N. and Lee L., Distributional clustering of English words, In Proceedings of the 31st Annual Meeting on Association For Computational Linguistics, pp 183-190 (1993).
  18. Press W.H., Flannery B.P., Teukolsky S.A. and Vetterling W.T., Numerical recipes in C. Cambridge University Press, Cambridge (1988).
  19. Prim R.C., Shortest connection networks and some generalizations, Bell System Technical Journal, 36, pp 1389-1401 (1957).
  20. Souza J., Feature selection with a general hybrid algorithm, PhD, University of Ottawa, Ottawa, Ontario, Canada (2004).
  21. Van Dijk G. and Van Hullefor M.M., Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy, International Conference on Artificial Neural Networks (2006).
  22. Xing E., Jordan M. and Karp R., Feature selection for high-dimensional genomic microarray data, In Proceedings of the Eighteenth International Conference on Machine Learning, pp 601-608 (2001).
  23. Yu L. and Liu H., Feature selection for high-dimensional data: a fast correlation-based filter solution, in Proceedings of 20th International Conference on Machine Leaning, 20(2), pp 856-863 (2003).
  24. Yu L. and Liu H., Redundancy based feature selection for microarray data, In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 737-742 (2004).
  25. Yu L. and Liu H., Efficient feature selection via analysis of relevance and redundancy, Journal of Machine Learning Research, 10(5), pp 1205-1224 (2004).
  26. Qinbao Song, Jingjie Ni and Guangtao Wang, IEEE Transactions on knowledge and data engineering Vol: 25 No.:1 (2013).
  27. Arey M.R. and Johnson D.S., Computers and Intractability: a Guide to the Theory of Np-Completeness. W. H. Freeman & Co, (1979).
  28. Quinlan J.R., C4.5: Programs for Machine Learning. San Mateo, California: Morgan Kaufman (1993).
  29. Yu J., Abidi S.S.R. and Artes P.H., A hybrid feature selection strategy for image defining features: towards interpretation of optic nerve images, In Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 8, pp 5127-5132, (2005).
  30. Liu H., Motoda H. and Yu L., Selective sampling approach to active feature selection, Artificial Intelligence., 159(1-2), pp 49-74 (2004)
  31. Wyner., A. D.:A definition of conditional mutual information for arbitrary ensembles. Information and Control 38 (1):51–59. doi: 10.1016/s0019-9958 (78)90026-8. http://en.wikipedia.org/wiki/Conditional_ mutual_information (1978)
Index Terms

Computer Science
Information Sciences

Keywords

Relevant feature Redundant features relevance measures symmetric uncertainty (SU) conditional mutual information (CMI).