International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 175 - Number 15 |
Year of Publication: 2020 |
Authors: Yathish Aradhya B. C., Y. P. Gowramma |
10.5120/ijca2020920652 |
Yathish Aradhya B. C., Y. P. Gowramma . Progressive Sampling Algorithm with Rademacher Averages for Optimized Learning of Big Data: A Novel Approach. International Journal of Computer Applications. 175, 15 ( Aug 2020), 37-40. DOI=10.5120/ijca2020920652
Sampling of Big Data for its analytics is a tedious task. Progressive Sampling Algorithm (PSA) is a primary tool adopted elsewhere to produce minimal training data set for learning algorithm used in Big Data Analytics. PSA can be characterized by its underlying initial sample size selection ,sampling schedules and stopping criterion are suggested along with process flow of PSA in generating adequate number of samples for training data set. operations used such as initial sample size, sampling schedule and stopping criterion. Training data set is a determining factor of traing cost, computational cost and learning model accuracy. Rademacher Averages Bound of Sampling can be used to bound the sampling process. This paper suggests novel ways to underlying operations of PSA and scope for significant reduction of the cardinality of training dataset while retaining the behavior of Learning model's Accuracy within Probably Acceptable Correct(PAC) Framework using Rademacher Averages Bounds.