CFP last date
20 January 2025
Reseach Article

Classification Imbalanced Data Sets: A Survey

by Shrouk El-Amir, Heba El-Fiqi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 177 - Number 23
Year of Publication: 2019
Authors: Shrouk El-Amir, Heba El-Fiqi
10.5120/ijca2019919682

Shrouk El-Amir, Heba El-Fiqi . Classification Imbalanced Data Sets: A Survey. International Journal of Computer Applications. 177, 23 ( Dec 2019), 20-23. DOI=10.5120/ijca2019919682

@article{ 10.5120/ijca2019919682,
author = { Shrouk El-Amir, Heba El-Fiqi },
title = { Classification Imbalanced Data Sets: A Survey },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2019 },
volume = { 177 },
number = { 23 },
month = { Dec },
year = { 2019 },
issn = { 0975-8887 },
pages = { 20-23 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume177/number23/31037-2019919682/ },
doi = { 10.5120/ijca2019919682 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:46:42.252966+05:30
%A Shrouk El-Amir
%A Heba El-Fiqi
%T Classification Imbalanced Data Sets: A Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 177
%N 23
%P 20-23
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Unbalanced data, a snag often found in real-world applications, can seriously adversely affect machine learning algorithms ' classification efficiency. Various tries are made to classify unbalanced data sets. In order to face the imbalanced data sets snag, we should rebalance them artificially through machine learning classifiers by oversampling and/or under-sampling.

References
  1. Aurelio, Y.S., et al., Learning from imbalanced data sets with weighted cross-entropy function. Neural Processing Letters, 2019: p. 1-13.
  2. Ali, Z., et al. Empirical Study of Associative Classifiers on Imbalanced Datasets in KEEL. in 2018 9th International Conference on Information, Intelligence, Systems and Applications (IISA). 2018. IEEE.
  3. Arafat, M., A. Qusef, and G. Sammour. Detection of Wangiri Telecommunication Fraud Using Ensemble Learning. in 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT). 2019. IEEE.
  4. Wang, H., Utilizing Imbalanced Data and Classification Cost Matrix to Predict Movie Preferences. arXiv preprint arXiv:1812.02529, 2018.
  5. Maheshwari, S., R. Jain, and R. Jadon, A Review on Class Imbalance Problem: Analysis and Potential Solutions. International Journal of Computer Science Issues (IJCSI), 2017. 14(6): p. 43-51.
  6. Bermejo, P., J.A. Gámez, and J.M. Puerta, Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets. Expert Systems with Applications, 2011. 38(3): p. 2072-2080.
  7. Huang, C., et al., Deep imbalanced learning for face recognition and attribute prediction. IEEE transactions on pattern analysis and machine intelligence, 2019.
  8. Chan, R., et al., Application of decision rules for handling class imbalance in semantic segmentation. arXiv preprint arXiv:1901.08394, 2019.
  9. Potharlanka, J.L. and M.P. Turumella, Weighted SVMBoost based Hybrid Rule Extraction Methods for Software Defect Prediction. International Journal of Rough Sets and Data Analysis (IJRSDA), 2019. 6(2): p. 51-60.
  10. Raskutti, B. and A. Kowalczyk, Extreme re-balancing for SVMs: a case study. ACM Sigkdd Explorations Newsletter, 2004. 6(1): p. 60-69.
  11. Folorunso, S. and A. Adeyemo, Empirical Study of Enhanced Sampling Schemes with Ensembles to Alleviate the Class Imbalance Problem.
  12. Schubach, M., et al., Variant relevance prediction in extremely imbalanced training sets. F1000Research, 2017. 6: p. 1392.
  13. Abdullah, Z., et al. 2M-SELAR: A Model for Mining Sequential Least Association Rules. in Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015). 2019. Springer.
  14. Wu, G.P. and K.C. Chan. Clustering driving trip trajectory data based on pattern discovery techniques. in 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA). 2018. IEEE.
  15. Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
  16. Freund, Y. and R. Schapire, A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 1999. 14(771-780): p. 1612.
  17. Liu, H. and M. Cocea, Granular computing-based approach of rule learning for binary classification. Granular Computing, 2019. 4(2): p. 275-283.
  18. Xia, Y., et al., A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 2017. 78: p. 225-241.
  19. Fan, Q., et al., Entropy-based fuzzy support vector machine for imbalanced datasets. Knowledge-Based Systems, 2017. 115: p. 87-99.
  20. Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321-357.
  21. Garcı, S., et al., Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems, 2012. 25(1): p. 3-12.
  22. Yang, P., et al., Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE transactions on cybernetics, 2013. 44(3): p. 445-455.
  23. Ramyachitra, D. and P. Manikandan, Imbalanced dataset classification and solutions: a review. International Journal of Computing and Business Research (IJCBR), 2014. 5(4).
  24. Tomek, I., Two modifications of CNN. IEEE Trans. Systems, Man and Cybernetics, 1976. 6: p. 769-772.
  25. Hart, P., The condensed nearest neighbor rule (Corresp.). IEEE transactions on information theory, 1968. 14(3): p. 515-516.
  26. Zhao, P., et al. Cost-sensitive online classification with adaptive regularization and its applications. in 2015 IEEE International Conference on Data Mining. 2015. IEEE.
  27. Chen, C., A. Liaw, and L. Breiman, Using random forest to learn imbalanced data. University of California, Berkeley, 2004. 110(1-12): p. 24.
  28. Yao, D., J. Yang, and X. Zhan, An improved random forest algorithm for class-imbalanced data classification and its application in PAD risk factors analysis. The Open Electrical & Electronic Engineering Journal, 2013. 7(1).
  29. Gong, J. and H. Kim, RHSBoost: Improving classification performance in imbalance data. Computational Statistics & Data Analysis, 2017. 111: p. 1-13.
  30. Xie, W., et al., An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data. Mathematical Problems in Engineering, 2019. 2019.
  31. Boonchuay, K., K. Sinapiromsaran, and C. Lursinsap, Decision tree induction based on minority entropy for the class imbalance problem. Pattern Analysis and Applications, 2017. 20(3): p. 769-782
Index Terms

Computer Science
Information Sciences

Keywords

Imbalance dataset sampling cost-sensitive learning imbalance ratio