A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach

Mohammad Imran; Vaddi Srinivasa Rao

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach

by Mohammad Imran, Vaddi Srinivasa Rao

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 179 - Number 33

Year of Publication: 2018

Authors: Mohammad Imran, Vaddi Srinivasa Rao

10.5120/ijca2018916743

Mohammad Imran, Vaddi Srinivasa Rao . A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach. International Journal of Computer Applications. 179, 33 ( Apr 2018), 18-21. DOI=10.5120/ijca2018916743

@article{ 10.5120/ijca2018916743,

author = { Mohammad Imran, Vaddi Srinivasa Rao },

title = { A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach },

journal = { International Journal of Computer Applications },

issue_date = { Apr 2018 },

volume = { 179 },

number = { 33 },

month = { Apr },

year = { 2018 },

issn = { 0975-8887 },

pages = { 18-21 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume179/number33/29210-2018916743/ },

doi = { 10.5120/ijca2018916743 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:57:18.765853+05:30

%A Mohammad Imran

%A Vaddi Srinivasa Rao

%T A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach

%J International Journal of Computer Applications

%@ 0975-8887

%V 179

%N 33

%P 18-21

%D 2018

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In this paper, we propose hybrid Random under Sampled Imbalance Big Data (USIBD) framework to extract knowledge from class imbalance big data. A novel under-sampling method for the base learner is also proposed to handle the dynamic class-imbalance problem caused by the gradual evolution of classes in big data. The proposed USIBD knowledge discovery framework is robust and less sensitive to outliers where non-uniform distribution of data is applied. Empirical studies demonstrate the effectiveness of USIBD in various class imbalance big datasets scenarios in comparison to existing methods.

References

O. Maimon, and L. Rokach, Data mining and knowledge discovery handbook, Berlin: Springer, 2010.
Rajiv Sambasivan, SourishDas,”Big Data Classification Using Augmented Decision Trees”, arXiv preprint arXiv:1710.09567, 2017.
Petra Perner,”Big Data, Decision Tree Induction, and Image Analysis for the Discovery of Decision Rules for Colon Examination”, International Journal of Engineering Research & Science (IJOER) ISSN: [2395-6992] [Vol-3, Issue-8, August- 2017].
Tianyi Yang and Anne HeeHiongNgu,”Implementation of Decision Tree Using Hadoop Map Reduce”,Yang and Ngu, Int J Biomed Data Min 2016, 6:1
DOI: 10.4172/2090-4924.1000125.
Armando Segatori, Francesco Marcelloni, and Witold Pedrycz,” On Distributed Fuzzy Decision Trees for BigData”,DOI10.1109/TFUZZ.2016.2646746,IEEE Transactions on Fuzzy Systems.
Hanif Arief Wisesa, M. Anwar Ma’sum, PetrusMursanto, Andreas Febrian,Processing Big Data with Decision TreesA Case Study in Large Traffic Data”, IWBIS 2016 978-1-5090-3477-2/16/2016 IEEE.
Blake C, Merz CJ (2000) UCI repository of machine learning databases. Machine-readable data repository. Department of Information and Computer Science, University of California at Irvine, Irvine.http://www.ics.uci.edu/mlearn/MLRepository.html.
Witten, I.H. and Frank, E. (2005) Data Mining:Practical machine learning tools and techniques.2nd edition Morgan Kaufmann, San Francisco.
J. Quinlan. C4.5 Programs for Machine Learning, San Mateo, CA: Morgan Kaufmann, 1993.

Index Terms

Computer Science

Information Sciences

Keywords

Classification Big data Imbalanced data Under Sampling USIBD