CFP last date
20 January 2025
Reseach Article

Nearest Neighbor Classification for High-Speed Big Data Streams using Spark

by Swati T. Piske, Tandle S. R.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 182 - Number 43
Year of Publication: 2019
Authors: Swati T. Piske, Tandle S. R.
10.5120/ijca2019918533

Swati T. Piske, Tandle S. R. . Nearest Neighbor Classification for High-Speed Big Data Streams using Spark. International Journal of Computer Applications. 182, 43 ( Mar 2019), 16-19. DOI=10.5120/ijca2019918533

@article{ 10.5120/ijca2019918533,
author = { Swati T. Piske, Tandle S. R. },
title = { Nearest Neighbor Classification for High-Speed Big Data Streams using Spark },
journal = { International Journal of Computer Applications },
issue_date = { Mar 2019 },
volume = { 182 },
number = { 43 },
month = { Mar },
year = { 2019 },
issn = { 0975-8887 },
pages = { 16-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume182/number43/30436-2019918533/ },
doi = { 10.5120/ijca2019918533 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:14:08.214927+05:30
%A Swati T. Piske
%A Tandle S. R.
%T Nearest Neighbor Classification for High-Speed Big Data Streams using Spark
%J International Journal of Computer Applications
%@ 0975-8887
%V 182
%N 43
%P 16-19
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

High speed data streaming and data mining is the most contemporize challenges in machine learning. This demand methods displaying a high process effectiveness, with ability to continuously update their structure and handle ever-arriving big variety of instances. in this paper, we have a tendency to present a new incremental and distributed classifier based on the favored nearest neighbor algorithmic rule, adapted to such a exigent situation. This technique, enforced in Apache Spark, includes a distributed metric-space ordering to perform quicker searches. a vast live {of information of data of knowledge} containing useful data, referred to as big data, is created frequently. For handling such large volume of data, there's a necessity of big data structures, for example, Hadoop Map reduce, Apache Spark then on. Among these, Apache Spark performs up to one hundred circumstances speedier than ancient systems like Hadoop Map reduce. we have a tendency to concentrate on the plan of partition grouping calculation and its execution on Apache Spark.

References
  1. J. Gama, Knowledge Discovery From Data Streams. Boca Raton, FL, USA: Chapman & Hall, 2010.
  2. Xindong Wu, Fellow, IEEE, XingquanZhu,”Data Mining with Big Data” IEEE Trans Big Data. vol. 26, no. 1, pp.97-107,Jan. 2014.
  3. N. Bharill and A. Tiwari, “Handling big data with fuzzy based classification approach,” in Advance Trends in Soft Computing. Berlin, Germany: Springer, 2014, pp. 219–227.
  4. Bo Wu and HaiyingShen, Member, IEEE "Exploiting Efficient Densest Subgraph Discovering Methods"IEEE Trans Big Data,vol.3,pp.334-348,Sept.2017.
  5. Ming Shao, Member, IEEE, Xindong Wu, Fellow, IEEE, and Yun Fu, Senior Member, IEEE "Scalable Nearest Neighbor Sparse Graph Approximation by Exploiting Graph Structure" IEEE Trans Big Data.vol.2,pp.97- 107 Dec.2018.
  6. V. Mayer-Schönberger and K. Cukier, Big Data: A Revolution That Will Transform How We Live, Work and Think. London, U.K.: John Murray, 2013.
  7. D. Han, C. G. Giraud-Carrier, and S. Li, “Efficient mining of high-speed uncertain data streams,” Appl. Intell., vol. 43, no. 4, pp. 773–785, 2015.
  8. U. Fayyad and R. Uthurusamy, “Evolving data into mining solutions for insights,” Commun. ACM, vol. 45, no. 8, pp. 28–31, Aug. 2002. [Online]. Available: http://doi.acm.org/10.1145/545151.545174
  9. H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark: Lightning-Fast Big Data Analytics. Sebastopol, CA, USA: O’Reilly Media, 2015.
  10. Apache Spark: Lightning-Fast Cluster Computing. (2017). Apache Spark. [Online].Accessed on Jan. 2017.[Online]. Available: https://spark.apache.org/.
Index Terms

Computer Science
Information Sciences

Keywords

Nearest Neighbor High-Speed Big Data Data Streams