We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Large Dimensional Data Reduction by Various Feature Selection Techniques: A Short Review

by Bharti Swarnkar, Prateek Pratyasha, Aditya Prasad Padhy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 176 - Number 41
Year of Publication: 2020
Authors: Bharti Swarnkar, Prateek Pratyasha, Aditya Prasad Padhy
10.5120/ijca2020920534

Bharti Swarnkar, Prateek Pratyasha, Aditya Prasad Padhy . Large Dimensional Data Reduction by Various Feature Selection Techniques: A Short Review. International Journal of Computer Applications. 176, 41 ( Jul 2020), 16-24. DOI=10.5120/ijca2020920534

@article{ 10.5120/ijca2020920534,
author = { Bharti Swarnkar, Prateek Pratyasha, Aditya Prasad Padhy },
title = { Large Dimensional Data Reduction by Various Feature Selection Techniques: A Short Review },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2020 },
volume = { 176 },
number = { 41 },
month = { Jul },
year = { 2020 },
issn = { 0975-8887 },
pages = { 16-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume176/number41/31474-2020920534/ },
doi = { 10.5120/ijca2020920534 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:41:01.800510+05:30
%A Bharti Swarnkar
%A Prateek Pratyasha
%A Aditya Prasad Padhy
%T Large Dimensional Data Reduction by Various Feature Selection Techniques: A Short Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 176
%N 41
%P 16-24
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In a data dependent era, dimensions contain a huge number of variables both in rows and columns forming more complex data matrices and these dimensional expansions generate a large dimensional data (LDD). The dimensionality problem of LDD is a massive challenge for analytics purpose and it somehow burdens the machine learning approaches. Due to the faster growing rate in innovative Internet of Things and web-based technologies, static data becomes noisy and non-stochastic that results in data loss and instability. Therefore, the demand for complex data dimension reduction technique (DDR) is growing immensely to improve data prediction, analysis and visualization. Several computational techniques have implemented for DDR which is further segregated into two categories such as feature extraction techniques (FET) and feature selection techniques (FST). But, most of the existing FET methods focus on transforming the higher dimensional data into a lower dimensional space and unable to tackle with the dimensionality reduction problem. Hence, this paper focuses on various dynamic FST that not only reduces the dimensionality load but also catalyze the data analysis process.

References
  1. C. Alexander and L. Wang, "Big data in healthcare: A New frontier in personalized medicine," Am J Hypertens Res, vol. 1, pp. 15-18, 2017.
  2. J. A. Basco and N. Senthilkumar, "Real-time analysis of healthcare using big data analytics," Comput Inf Technol, vol. 263, p. 042056, 2017.
  3. J. Archenaa and E. M. Anita, "A survey of big data analytics in healthcare and government," Procedia Computer Science, vol. 50, pp. 408-413, 2015.
  4. W. Raghupathi and V. Raghupathi, "Big data analytics in healthcare: promise and potential," Health information science and systems, vol. 2, p. 3, 2014.
  5. D. Chen and H. Zhao, "Data security and privacy protection issues in cloud computing," in 2012 International Conference on Computer Science and Electronics Engineering, 2012, pp. 647-651.
  6. K. Nag and N. R. Pal, "A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification," IEEE transactions on cybernetics, vol. 46, pp. 499-510, 2015.
  7. I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of machine learning research, vol. 3, pp. 1157-1182, 2003.
  8. L. Yu and H. Liu, "Efficient feature selection via analysis of relevance and redundancy," Journal of machine learning research, vol. 5, pp. 1205-1224, 2004.
  9. A. Chadha, B. R. Iyer, H. Messatfa, and J. Yi, "Dimension reduction for data mining application," ed: Google Patents, 2000.
  10. J. Tang, S. Alelyani, and H. Liu, "A survey of dimensionality reduction techniques," Data Classification: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2015.
  11. D. Giradi and A. Holzinger, "Dimensionality reduction for exploratory data analysis in daily medical research," in Advanced Data Analytics in Health, ed: Springer, 2018, pp. 3-20.
  12. J. Ye and Q. Li, "LDA/QR: an efficient and effective dimension reduction algorithm and its theoretical foundation," Pattern recognition, vol. 37, pp. 851-854, 2004.
  13. C. J. Burges, Dimension reduction: A guided tour: Now Publishers Inc, 2010.
  14. D. L. Sun and C. Fevotte, "Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence," in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2014, pp. 6201-6205.
  15. Y. Saeys, I. Inza, and P. Larrañaga, "A review of feature selection techniques in bioinformatics," bioinformatics, vol. 23, pp. 2507-2517, 2007.
  16. R. Kumar, "Blending roulette wheel selection & rank selection in genetic algorithms," International Journal of Machine Learning and Computing, vol. 2, pp. 365-370, 2012.
  17. S. Mirjalili, "Genetic algorithm," in Evolutionary algorithms and neural networks, ed: Springer, 2019, pp. 43-55.
  18. D. Gong, J. Sun, and Z. Miao, "A set-based genetic algorithm for interval many-objective optimization problems," IEEE Transactions on Evolutionary Computation, vol. 22, pp. 47-60, 2016.
  19. D. Wang, D. Tan, and L. Liu, "Particle swarm optimization algorithm: an overview," Soft Computing, vol. 22, pp. 387-408, 2018.
  20. N. Ghorbani, A. Kasaeian, A. Toopshekan, L. Bahrami, and A. Maghami, "Optimizing a hybrid wind-PV-battery system using GA-PSO and MOPSO for reducing cost and increasing reliability," Energy, vol. 154, pp. 581-591, 2018.
  21. T. R. Reddy, B. V. Vardhan, M. GopiChand, and K. Karunakar, "Gender prediction in author profiling using ReliefF feature selection algorithm," in Intelligent Engineering Informatics, ed: Springer, 2018, pp. 169-176.
  22. S. Chikhi and S. Benhammada, "ReliefMSS: a variation on a feature ranking ReliefF algorithm," International Journal of Business Intelligence and Data Mining, vol. 4, pp. 375-390, 2009.
  23. Y. Zhang, C. Ding, and T. Li, "Gene selection algorithm by combining reliefF and mRMR," BMC genomics, vol. 9, p. S27, 2008.
  24. R. Cai, Z. Hao, X. Yang, and W. Wen, "An efficient gene selection algorithm based on mutual information," Neurocomputing, vol. 72, pp. 991-999, 2009.
  25. K. Yan and D. Zhang, "Feature selection and analysis on correlated gas sensor data with recursive feature elimination," Sensors and Actuators B: Chemical, vol. 212, pp. 353-363, 2015.
  26. W. You, Z. Yang, and G. Ji, "PLS-based recursive feature elimination for high-dimensional small sample," Knowledge-Based Systems, vol. 55, pp. 15-28, 2014.
  27. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, et al., "Learning to rank using gradient descent," in Proceedings of the 22nd international conference on Machine learning, 2005, pp. 89-96.
  28. J. L. Maryak and D. C. Chin, "Global random optimization by simultaneous perturbation stochastic approximation," in Proceedings of the 2001 American Control Conference.(Cat. No. 01CH37148), 2001, pp. 756-762.
  29. S. Dudoit, J. Fridlyand, and T. P. Speed, "Comparison of discrimination methods for the classification of tumors using gene expression data," Journal of the American statistical association, vol. 97, pp. 77-87, 2002.
  30. A. M. Brunner, F. Campigotto, H. Sadrzadeh, B. J. Drapkin, Y. B. Chen, D. S. Neuberg, et al., "Trends in all‐cause mortality among patients with chronic myeloid leukemia: a Surveillance, Epidemiology, and End Results database analysis," Cancer, vol. 119, pp. 2620-2629, 2013.
  31. E J. Wang, T. H. Bø, I. Jonassen, O. Myklebost, and E. Hovig, "Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data," BMC bioinformatics, vol. 4, pp. 1-12, 2003.
  32. J. R. Berrendero, A. Cuevas, and J. L. Torrecilla, "The mRMR variable selection method: a comparative study for functional data," Journal of Statistical Computation and Simulation, vol. 86, pp. 891-907, 2016.
Index Terms

Computer Science
Information Sciences

Keywords

Large Dimensional Data (LDD) Dimension Reduction (DDR) Techniques Feature Extraction Techniques (FET) Feature Selection Techniques (FST).