Nearest Neighbor Classification for High-Speed Big Data Streams using Spark

Swati T. Piske; Tandle S. R.

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

Nearest Neighbor Classification for High-Speed Big Data Streams using Spark

by Swati T. Piske, Tandle S. R.

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 182 - Number 43

Year of Publication: 2019

Authors: Swati T. Piske, Tandle S. R.

10.5120/ijca2019918533

Swati T. Piske, Tandle S. R. . Nearest Neighbor Classification for High-Speed Big Data Streams using Spark. International Journal of Computer Applications. 182, 43 ( Mar 2019), 16-19. DOI=10.5120/ijca2019918533

@article{ 10.5120/ijca2019918533,

author = { Swati T. Piske, Tandle S. R. },

title = { Nearest Neighbor Classification for High-Speed Big Data Streams using Spark },

journal = { International Journal of Computer Applications },

issue_date = { Mar 2019 },

volume = { 182 },

number = { 43 },

month = { Mar },

year = { 2019 },

issn = { 0975-8887 },

pages = { 16-19 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume182/number43/30436-2019918533/ },

doi = { 10.5120/ijca2019918533 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:14:08.214927+05:30

%A Swati T. Piske

%A Tandle S. R.

%T Nearest Neighbor Classification for High-Speed Big Data Streams using Spark

%J International Journal of Computer Applications

%@ 0975-8887

%V 182

%N 43

%P 16-19

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

High speed data streaming and data mining is the most contemporize challenges in machine learning. This demand methods displaying a high process effectiveness, with ability to continuously update their structure and handle ever-arriving big variety of instances. in this paper, we have a tendency to present a new incremental and distributed classifier based on the favored nearest neighbor algorithmic rule, adapted to such a exigent situation. This technique, enforced in Apache Spark, includes a distributed metric-space ordering to perform quicker searches. a vast live {of information of data of knowledge} containing useful data, referred to as big data, is created frequently. For handling such large volume of data, there's a necessity of big data structures, for example, Hadoop Map reduce, Apache Spark then on. Among these, Apache Spark performs up to one hundred circumstances speedier than ancient systems like Hadoop Map reduce. we have a tendency to concentrate on the plan of partition grouping calculation and its execution on Apache Spark.

References

J. Gama, Knowledge Discovery From Data Streams. Boca Raton, FL, USA: Chapman & Hall, 2010.
Xindong Wu, Fellow, IEEE, XingquanZhu,”Data Mining with Big Data” IEEE Trans Big Data. vol. 26, no. 1, pp.97-107,Jan. 2014.
N. Bharill and A. Tiwari, “Handling big data with fuzzy based classification approach,” in Advance Trends in Soft Computing. Berlin, Germany: Springer, 2014, pp. 219–227.
Bo Wu and HaiyingShen, Member, IEEE "Exploiting Efficient Densest Subgraph Discovering Methods"IEEE Trans Big Data,vol.3,pp.334-348,Sept.2017.
Ming Shao, Member, IEEE, Xindong Wu, Fellow, IEEE, and Yun Fu, Senior Member, IEEE "Scalable Nearest Neighbor Sparse Graph Approximation by Exploiting Graph Structure" IEEE Trans Big Data.vol.2,pp.97- 107 Dec.2018.
V. Mayer-Schönberger and K. Cukier, Big Data: A Revolution That Will Transform How We Live, Work and Think. London, U.K.: John Murray, 2013.
D. Han, C. G. Giraud-Carrier, and S. Li, “Efficient mining of high-speed uncertain data streams,” Appl. Intell., vol. 43, no. 4, pp. 773–785, 2015.
U. Fayyad and R. Uthurusamy, “Evolving data into mining solutions for insights,” Commun. ACM, vol. 45, no. 8, pp. 28–31, Aug. 2002. [Online]. Available: http://doi.acm.org/10.1145/545151.545174
H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark: Lightning-Fast Big Data Analytics. Sebastopol, CA, USA: O’Reilly Media, 2015.
Apache Spark: Lightning-Fast Cluster Computing. (2017). Apache Spark. [Online].Accessed on Jan. 2017.[Online]. Available: https://spark.apache.org/.

Index Terms

Computer Science

Information Sciences

Keywords

Nearest Neighbor High-Speed Big Data Data Streams