CFP last date
20 January 2025
Reseach Article

Mining Data Streams using Clustering Techniques

by Manal Mansour Alharthi, Manal Abdulaziz Abdullah
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 184 - Number 7
Year of Publication: 2022
Authors: Manal Mansour Alharthi, Manal Abdulaziz Abdullah
10.5120/ijca2022922027

Manal Mansour Alharthi, Manal Abdulaziz Abdullah . Mining Data Streams using Clustering Techniques. International Journal of Computer Applications. 184, 7 ( Apr 2022), 9-15. DOI=10.5120/ijca2022922027

@article{ 10.5120/ijca2022922027,
author = { Manal Mansour Alharthi, Manal Abdulaziz Abdullah },
title = { Mining Data Streams using Clustering Techniques },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2022 },
volume = { 184 },
number = { 7 },
month = { Apr },
year = { 2022 },
issn = { 0975-8887 },
pages = { 9-15 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume184/number7/32339-2022922027/ },
doi = { 10.5120/ijca2022922027 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:20:50.484161+05:30
%A Manal Mansour Alharthi
%A Manal Abdulaziz Abdullah
%T Mining Data Streams using Clustering Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 184
%N 7
%P 9-15
%D 2022
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This research highlights the significant aspects to consider when building a framework for mining the data streams. The main aspects include the methods of data summarizing and creating synopsis. Besides, the main approaches of clustering the data synopses. The research also shows some real applications and tools for mining the streaming data. The main goal is to present the ClusKmeans model as an example for data stream clustering, and to study its performance with different scenarios of data speed, noise levels, and horizon ranges.

References
  1. "How Much Data Is Created Every Day," 28 October 2021. [Online]. Available: https://seedscientific.com/how-much-data-is-created-every-day/. [Accessed 18 12 2021].
  2. M. Garofalakis, J. Gehrke and R. Rastogi, in Data Stream Management: Processing High-Speed Data Streams, Springer, 2016.
  3. A. Bifet, R. Gavalda, G. Holmes and B. Pfahringer, Machine Learning for Data Streams:with Practical Examples in MOA, MIT Press, 2018.
  4. P. K. SRIMANI and M. M. PATIL, "Mining data streams with concept drift in massive online analysis frame work," WSEAS Trans. Comput, vol. 15, pp. 133-142, 2016.
  5. Gaber, Mohamed Medhat; Gama, Jo˜ao ; Krishnaswamy, Shonali; Gomes, Jo˜ao B´ artolo ; Stahl, Frederic;, "Data stream mining in ubiquitous environments: state-of-the-art and current directions," WIREs: Data Mining & Knowledge Discovery, vol. 4, no. 2, pp. 116-138, 2014.
  6. D. J. Brus and N. Saby, "Approximating the variance of estimated means for systematic random sampling, illustrated with data of the French Soil Monitoring Network," Elsevier, vol. 279, pp. 77-86, 2016.
  7. S. S. Ramkrishna and . P. S. Housila, "Efficient classes of estimators in stratified random," Statistical Papers, vol. 56, no. 1, pp. 83-103, 2015.
  8. W. Li , "Joint Image-Text News Topic Detection and Tracking by Multimodal Topic And-Or Graph," IEEE transactions on multimedia, vol. 19, no. 2, pp. 367-381, 2017.
  9. M. Carnein and H. Trautmann , "Optimizing data stream representation: An extensive survey on stream clustering algorithms," Business & Information Systems Engineering, vol. 61, no. 3, p. 277–297, 2019.
  10. E. Ntoutsi, N. Pelekis and Y. Theodoridis, "An evaluation of data stream clustering algorithms," Statistical Analysis and Data Mining, vol. 11, no. 4, pp. 167-187, 2018.
  11. Y. Ioannidis, "The history of histograms (abridged)," Proceedings, pp. 19-30, 2003.
  12. J. Gama and T. Mendonça, "Constructing fading histograms from data streams," PROGRESS IN ARTIFICIAL INTELLIGENCE, vol. 3, no. 1, pp. 15-28, 2014.
  13. R. Jayaram, "Sketching and Sampling Algorithms for," Carnegie Mellon University Pittsburgh, Pittsburgh, 2021.
  14. P. Flajolet and G. N. Martin, "Probabilistic counting algorithms for data base applications," Journal of computer and system sciences, vol. 31, no. 2, pp. 182-209, 1985.
  15. M. Charikar, K. Chen and M. Farach-Colton, "Finding frequent items in data streams," International Colloquium on Automata, Languages, and Programming, vol. 2380, pp. 693-703, 2002.
  16. D. Ting, "Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions," in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.
  17. J. Zgraja and M. Woźniak, "Drifted data stream clustering based on ClusTree algorithm," International Conference on Hybrid Artificial Intelligence Systems, vol. 10870, pp. 338-349, 2018.
  18. G. Pitolli, L. Aniello, G. Laurenza, L. Querzoni and R. Baldoni, "Malware family identification with BIRCH clustering," International Carnahan Conference on Security Technology (ICCST), pp. 1-6, 2017.
  19. M. R. Ackermann, M. Märtens, C. Raupach, K. Swierkot, C. Lammersen and C. Sohler, "StreamKM++: A Clustering Algorithms for Data Streams," Journal of Experimental Algorithmics (JEA), vol. 17, pp. 2-1, 2012.
  20. F. Cao, M. Estert, W. Qian and A. Zhou, "Density-Based Clustering over an Evolving Data Stream with Noise," in Proceedings of the 2006 SIAM International Conference on Data Mining (SDM).
  21. M. H. Ali, A. Sundus, W. Qaiser, Z. Ahmed and Z. Halim, "Applicative implementation of D-stream clustering algorithm for the real-time data of telecom sector," in International conference on computer networks and information technology, Abbottabad, Pakistan, 293-297.
  22. M. Ghesmoune, M. Lebbah and H. Azzag, "A new Growing Neural Gas for clustering data streams," Neural Networks, vol. 78, pp. 36-50, 2016.
  23. M. Z.-u. Rehman, T. Li, Y. Yang and H. Wang, "Hyper-ellipsoidal clustering technique for evolving data stream," Knowledge-Based Systems," Knowledge-Based Systems, vol. 70, pp. 3-14, 2014.
  24. X. Yang , M. Xu, S. Fu and Y. Luo , "PPDC: A Privacy-Preserving Distinct Counting Scheme for Mobile Sensing," Applied Sciences, vol. 9(18), p. 3695, 2019.
  25. M. Li, A. Croitoru and S. Yue, "GeoDenStream: An improved DenStream clustering method for managing entity data within geographical data streams," Computers & Geosciences, vol. 144, p. 104563, 2020.
  26. N. Park and S. Kim, "FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams," Sensors, vol. 21, no. 4, p. 1080, 2021.
  27. J. Han, M. Kamber and J. Pei, "10 - Cluster Analysis: Basic Concepts and Methods," in Data Mining (Third Edition), ISBN 9780123814791, 2012, pp. 443-495.
Index Terms

Computer Science
Information Sciences

Keywords

Data Streams Data Stream Mining ClusKmeans Data synopsis Pyramidal window Micro-clusters