CFP last date
20 January 2025
Reseach Article

A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms

by S. Vijayalaksmi, M. Punithavalli
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 60 - Number 14
Year of Publication: 2012
Authors: S. Vijayalaksmi, M. Punithavalli
10.5120/9757-8924

S. Vijayalaksmi, M. Punithavalli . A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms. International Journal of Computer Applications. 60, 14 ( December 2012), 1-7. DOI=10.5120/9757-8924

@article{ 10.5120/9757-8924,
author = { S. Vijayalaksmi, M. Punithavalli },
title = { A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 60 },
number = { 14 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume60/number14/9757-8924/ },
doi = { 10.5120/9757-8924 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:06:33.825157+05:30
%A S. Vijayalaksmi
%A M. Punithavalli
%T A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 60
%N 14
%P 1-7
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Among the various clustering algorithms, DBSCAN is an effective clustering algorithm used in many applications. It has various advantages like no a priori assumption needed about the number of clusters, can find arbitrarily shaped clusters and can perform well even in the presence of outliers. However, the performance is seriously affected when the dataset size becomes large. Moreover, the selection of the two input parameters, Eps and MinPts, has a great impact on the clustering performance. To solve these two problems, this paper modifies the traditional DBSCAN algorithm in two manners. The first method uses K-dimensional tree instead of the traditional R-tree algorithm while the second method includes a locally sensitive hash procedure to speed up the process of clustering and increase the efficiency of clustering. The algorithms use a k-distance graph method to automatically calculate Eps and MinPts. Experimental results show that both the algorithms are efficient in terms of scalability and speeds up the clustering process in an efficient manner.

References
  1. Bayer, U. , Comparetti, P. M. , Hlauschek, C. , Kruegel, C. and Kirda, E. (2009) Scalable, behavior-based malware clustering, Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS 2009), Pp. 1-18.
  2. Bentley, J. L. (1975) Multidimensional binary search trees used for associative searching, Communications of the ACM, Vol. 18, No. 9, Pp. 509-517.
  3. Bhakare, K. R. , Krishna, R. K. and Bhakare, S. (2012) An Energy-Efficient Grid based Clustering Topology for a Wireless Sensor Network, International Journal of Computer Applications, Vol. 39, No. 14, Pp. 24-28.
  4. Borah, B. and Bhattacharyya, D. (2004) An improved sampling-based DBSCAN for large spatial databases, Proceedings of International Conference on Intelligent Sensing and Information Processing, Pp. 92–96.
  5. Cao, Y. , Jiang, T. and Girke, T. (2010) Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. Bioinformatics, Vol. 26, No. 7, Pp. 953-959.
  6. El-Sonbaty, Y. , Ismail, M. and Farouk, M. (2004) An efficient density based clustering algorithm for large databases, 16th IEEE International Conference on Tools with Artificial Intelligence, ICTAI, Pp. 673–677.
  7. Ester, M. , Kriegel, H. , Sander, J. and Xu, X. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. KDD 96, Pp. 226–231.
  8. Fan, Y. and Yuta, R. (2011) A Density-based Path Clustering Algorithm, International Conference on Intelligent Computation and Bio-Medical Instrumentation (ICBMI), Hubei, Pp. 224-227.
  9. Folino, G. , Forestiero, A. , Spezzano, G. , 2009. An adaptive flocking algorithm for performing approximate clustering. Inform. Sci. 179 (18), 3059–3078.
  10. Garai, G. and Chaudhuri, B. B. (2004) A novel genetic algorithm for automatic clustering, Pattern Recognition Letters, Vol. 25, No. 2, Pp. 173–187.
  11. Guo, Y. , Grossman, R. , Xu, X. , Jäger, J. and Kriegel, H. (2002) A fast parallel clustering algorithm for large spatial databases, High Performance Data Mining, Springer, US, Pp. 263–290.
  12. http://archive. ics. uci. edu/ml/datasets. html
  13. Ivancsy, R. and Kovacs, F. (2006) Clustering techniques utilized in web usage mining, Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, World Scientific and Engineering Academy and Society (WSEAS), Knowledge Engineering and Data Bases, Pp. 237-242.
  14. Jain, A. K. , Mutry, M. N. and Flynn, P. J. (1999) Data Clustering: A Review, ACM Computing Surveys, Vol. 31, No. 3, Pp. 264-323.
  15. Jiang, S. and Li, X. (2009) A hybrid clustering algorithm, Fourth International Conference on Fuzzy Systems and Knowledge Discovery, Vol. 1. IEEE Computer Society, Los Alamitos, CA, USA, Pp. 366–370.
  16. Li, J. , Yi, S. , Wang, X. , Hu, X. and Jiang, H. (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering, Expert Systems with Applications, Elsevier, Vol. 38, Issue 8, Pp. 9373-9381.
  17. Phung, D. , Adams, B. , Tran, K. , Venkatesh, S. and Kumar, M. (2009) High Accuracy Context Recovery using Clustering Mechanisms, In proceedings of the seventh international conference on Pervasive Computing and Communications, PerCom Galveston, USA, Pp. 122-130.
  18. Raiser, S. , Lughofer, E. , Eitzinger, C. and Smith, J. E. (2010) Impact of object extraction methods on classification performance in surface inspection systems, Special Issue Paper, Machine Vision and Applications, Springer-Verlag, PP. 627~641.
  19. Sakellariou, R. , Gurd, J. , Freeman, L. , Keane, J. , Arlia, D. and Coppola, M. (2001) Experiments in parallel clustering with DBSCAN, Euro-Par 2001 Parallel Processing, CASES'04, Springer, Berlin/Heidelberg, Vol. 2150, Pp. 26–331.
  20. Tsai, C. and Sung, C. (2010) EIDBSCAN: An extended improving DBSCAN algorithm with sampling techniques. International Journal of Business Intelligence and Data Mining, Vol. 5, No. 1, Pp. 94–111.
  21. Wei, C. P. , Lee, Y. H. and Hsu, C. M. (2000) Empirical comparison of fast clustering algorithms for large data sets, Proceedings of the 33rd Hawaii International Conference on System Sciences, IEEE Explore Digital Library, Pp. 1-10.
Index Terms

Computer Science
Information Sciences

Keywords

DBSCAN Speed Optimization Nearest Neighbour Search KD-Tree Locally Sensitive Hashing