CFP last date
20 January 2025
Reseach Article

A Survey on Clustering Algorithms for Data Streams

by Neha Sharma, Shraddha Masih, Pawan Makhija
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 182 - Number 22
Year of Publication: 2018
Authors: Neha Sharma, Shraddha Masih, Pawan Makhija
10.5120/ijca2018918014

Neha Sharma, Shraddha Masih, Pawan Makhija . A Survey on Clustering Algorithms for Data Streams. International Journal of Computer Applications. 182, 22 ( Oct 2018), 18-24. DOI=10.5120/ijca2018918014

@article{ 10.5120/ijca2018918014,
author = { Neha Sharma, Shraddha Masih, Pawan Makhija },
title = { A Survey on Clustering Algorithms for Data Streams },
journal = { International Journal of Computer Applications },
issue_date = { Oct 2018 },
volume = { 182 },
number = { 22 },
month = { Oct },
year = { 2018 },
issn = { 0975-8887 },
pages = { 18-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume182/number22/30066-2018918014/ },
doi = { 10.5120/ijca2018918014 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:12:08.856167+05:30
%A Neha Sharma
%A Shraddha Masih
%A Pawan Makhija
%T A Survey on Clustering Algorithms for Data Streams
%J International Journal of Computer Applications
%@ 0975-8887
%V 182
%N 22
%P 18-24
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data stream mining is an emerging area for extracting useful information from continuous arriving data. Web click stream, weather monitoring, network traffic, shopping history, web log are some key resources of generating data stream. Clustering is one of the most useful technique for analsing stream data, as it does not require any predefined class labeling. Data stream mining is challanging as the data is massive and arriving continuously. The traditional clustering algorithms cannot be directly applied on the data streams. Data stream mining needs one scan algorithms to extract rich data in the form of data streams. In this paper we discuss various data stream clustering algorithms with their limitations and required data structures. This paper also provides a comparative study of these algorithms. Real world applications of data streams, data resources and publicly available softwares are also discussed.

References
  1. Gaber, Mohamed Medhat, Arkady Zaslavsky, and Shonali Krishnaswamy. "Mining data streams: a review." ACM Sigmod Record 34.2 (2005): 18-26.
  2. Mahdiraji, Alireza Rezaei. "Clustering data stream: A survey of algorithms." International Journal of Knowledge-based and Intelligent Engineering Systems 13.2 (2009): 39-44.
  3. Kavitha, V., and M. Punithavalli. "Clustering time series data stream-a literature survey." arXiv preprint arXiv:1005.4270 (2010).
  4. Zhang, Tian, Raghu Ramakrishnan, and Miron Livny. "BIRCH: an efficient data clustering method for very large databases." ACM Sigmod Record. Vol. 25. No. 2. ACM, 1996.
  5. Guha, Sudipto, Rajeev Rastogi, and Kyuseok Shim. "CURE: an efficient clustering algorithm for large databases." ACM Sigmod Record. Vol. 27. No. 2. ACM, 1998.
  6. Rodrigues, Pedro Pereira, Joao Gama, and Joao Pedro Pedroso. "ODAC: Hierarchical clustering of time series data streams." Proceedings of the 2006 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2006.
  7. Udommanetanakit, Komkrit, Thanawin Rakthanmanon, and Kitsana Waiyamai. "E-stream: Evolution-based technique for stream clustering." International Conference on Advanced Data Mining and Applications. Springer, Berlin, Heidelberg, 2007.
  8. Meesuksabai, Wicha, Thanapat Kangkachit, and Kitsana Waiyamai. "Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty." International Conference on Advanced Data Mining and Applications. Springer, Berlin, Heidelberg, 2011.
  9. Ordonez, Carlos. "Clustering binary data streams with K-means." Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. ACM, 2003.
  10. Aggarwal, Charu C., et al. "A framework for projected clustering of high dimensional data streams." Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 2004.
  11. Aggarwal, Charu C., et al. "A framework for projected clustering of high dimensional data streams." Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 2004.
  12. Zhou, Aoying, et al. "Tracking clusters in evolving data streams over sliding windows." Knowledge and Information Systems 15.2 (2008): 181-214.
  13. Ackermann, Marcel R., et al. "StreamKM++: A clustering algorithm for data streams." Journal of Experimental Algorithmics (JEA) 17 (2012): 2-4.
  14. Ester, Martin, et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." Kdd. Vol. 96. No. 34. 1996.
  15. Sander, Jörg, et al. "Density-based clustering in spatial databases: The algorithm gdbscan and its applications." Data mining and knowledge discovery 2.2 (1998): 169-194.
  16. Ankerst, Mihael, et al. "OPTICS: ordering points to identify the clustering structure." ACM Sigmod record. Vol. 28. No. 2. ACM, 1999.
  17. Ester, Martin, et al. "Incremental clustering for mining in a data warehousing environment." VLDB. Vol. 98. 1998.
  18. Cao, F., Estert, M., Qian, W., & Zhou, A. (2006, April). Density-based clustering over an evolving data stream with noise. In Proceedings of the 2006 SIAM international conference on data mining (pp. 328-339). Society for Industrial and Applied Mathematics.
  19. Liu, Li-xiong, et al. "A three-step clustering algorithm over an evolving data stream." Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on. Vol. 1. IEEE, 2009.
  20. Tu, Li, and Yixin Chen. "Stream data clustering based on grid density and attraction." ACM Transactions on Knowledge Discovery from Data (TKDD) 3.3 (2009): 12.
  21. Wan, L., Ng, W. K., Dang, X. H., Yu, P. S., & Zhang, K. (2009). Density-based clustering of data streams at multiple resolutions. ACM Transactions on Knowledge discovery from Data (TKDD), 3(3), 14.
  22. Namadchian, Amin, and Gholamreza Esfandani. "DSCLU: a new Data Stream CLUstring algorithm for multi density environments." Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on. IEEE, 2012.
  23. Wang, Huan, et al. "A density-based clustering structure mining algorithm for data streams." Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. ACM, 2012.
  24. Agrawal, Rakesh, et al. Automatic subspace clustering of high dimensional data for data mining applications. Vol. 27. No. 2. ACM, 1998.
  25. Sheikholeslami, Gholamhosein, Surojit Chatterjee, and Aidong Zhang. "WaveCluster: a wavelet-based clustering approach for spatial data in very large databases." The VLDB Journal—The International Journal on Very Large Data Bases 8.3-4 (2000): 289-304.
  26. Lu, Yansheng, et al. "A grid-based clustering algorithm for high-dimensional data streams." Advanced Data Mining and Applications. Springer, Berlin, Heidelberg, 2005. 824-831.
  27. Sun, Yufen, and Yansheng Lu. "A grid-based subspace clustering algorithm for high-dimensional data streams." International Conference on Web Information Systems Engineering. Springer, Berlin, Heidelberg, 2006.
  28. Gama, Joao, Pedro Pereira Rodrigues, and Luís Lopes. "Clustering distributed sensor data streams using local processing and reduced communication." Intelligent Data Analysis 15.1 (2011): 3-28.
  29. Gama, Joao, Pedro Pereira Rodrigues, and Luís Lopes. "Clustering distributed sensor data streams using local processing and reduced communication." Intelligent Data Analysis 15.1 (2011): 3-28.
  30. Fisher, Doug. "Iterative optimization and simplification of hierarchical clusterings." Journal of artificial intelligence research 4 (1996): 147-178.
  31. Zhou, Aoying, et al. "Distributed data stream clustering: A fast EM-based approach." Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 2007.
  32. Dang, Xuan Hong, et al. "Incremental and adaptive clustering stream data over sliding window." International Conference on Database and Expert Systems Applications. Springer, Berlin, Heidelberg, 2009.
  33. Arthur, David, and Sergei Vassilvitskii. "k-means++: The advantages of careful seeding." Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007.
  34. Zhang, Pengfei, and Zonghuai Guo. "An Improved Speculative Strategy for Heterogeneous Spark Cluster." MATEC Web of Conferences. Vol. 173. EDP Sciences, 2018.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Data Stream Clustering