International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 182 - Number 43 |
Year of Publication: 2019 |
Authors: Swati T. Piske, Tandle S. R. |
10.5120/ijca2019918533 |
Swati T. Piske, Tandle S. R. . Nearest Neighbor Classification for High-Speed Big Data Streams using Spark. International Journal of Computer Applications. 182, 43 ( Mar 2019), 16-19. DOI=10.5120/ijca2019918533
High speed data streaming and data mining is the most contemporize challenges in machine learning. This demand methods displaying a high process effectiveness, with ability to continuously update their structure and handle ever-arriving big variety of instances. in this paper, we have a tendency to present a new incremental and distributed classifier based on the favored nearest neighbor algorithmic rule, adapted to such a exigent situation. This technique, enforced in Apache Spark, includes a distributed metric-space ordering to perform quicker searches. a vast live {of information of data of knowledge} containing useful data, referred to as big data, is created frequently. For handling such large volume of data, there's a necessity of big data structures, for example, Hadoop Map reduce, Apache Spark then on. Among these, Apache Spark performs up to one hundred circumstances speedier than ancient systems like Hadoop Map reduce. we have a tendency to concentrate on the plan of partition grouping calculation and its execution on Apache Spark.