A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms

S. Vijayalaksmi; M. Punithavalli

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Real-time Synchronization Mechanisms Between Batch-oriented Legacy Systems and Modern Interfaces in the Retirement Domain

Balamurugan Krishnaswamy Gnanasekaran

Random Articles

Factors affecting MAC Protocol Performance in Underwater Wireless Sensor Networks

Jul

2017

Simulation of Beam Forming Solutions for High Altitude Air Bone Systems

June

2012

A Survey Paper on Concept Mining in Text Documents

May

2017

Multimodal Biometrics for Person Authentication using Hand Image

May

2013

Reseach Article

A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms

by S. Vijayalaksmi, M. Punithavalli

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 60 - Number 14

Year of Publication: 2012

Authors: S. Vijayalaksmi, M. Punithavalli

10.5120/9757-8924

S. Vijayalaksmi, M. Punithavalli . A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms. International Journal of Computer Applications. 60, 14 ( December 2012), 1-7. DOI=10.5120/9757-8924

@article{ 10.5120/9757-8924,

author = { S. Vijayalaksmi, M. Punithavalli },

title = { A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms },

journal = { International Journal of Computer Applications },

issue_date = { December 2012 },

volume = { 60 },

number = { 14 },

month = { December },

year = { 2012 },

issn = { 0975-8887 },

pages = { 1-7 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume60/number14/9757-8924/ },

doi = { 10.5120/9757-8924 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:06:33.825157+05:30

%A S. Vijayalaksmi

%A M. Punithavalli

%T A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms

%J International Journal of Computer Applications

%@ 0975-8887

%V 60

%N 14

%P 1-7

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Among the various clustering algorithms, DBSCAN is an effective clustering algorithm used in many applications. It has various advantages like no a priori assumption needed about the number of clusters, can find arbitrarily shaped clusters and can perform well even in the presence of outliers. However, the performance is seriously affected when the dataset size becomes large. Moreover, the selection of the two input parameters, Eps and MinPts, has a great impact on the clustering performance. To solve these two problems, this paper modifies the traditional DBSCAN algorithm in two manners. The first method uses K-dimensional tree instead of the traditional R-tree algorithm while the second method includes a locally sensitive hash procedure to speed up the process of clustering and increase the efficiency of clustering. The algorithms use a k-distance graph method to automatically calculate Eps and MinPts. Experimental results show that both the algorithms are efficient in terms of scalability and speeds up the clustering process in an efficient manner.

References

Bayer, U. , Comparetti, P. M. , Hlauschek, C. , Kruegel, C. and Kirda, E. (2009) Scalable, behavior-based malware clustering, Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS 2009), Pp. 1-18.
Bentley, J. L. (1975) Multidimensional binary search trees used for associative searching, Communications of the ACM, Vol. 18, No. 9, Pp. 509-517.
Bhakare, K. R. , Krishna, R. K. and Bhakare, S. (2012) An Energy-Efficient Grid based Clustering Topology for a Wireless Sensor Network, International Journal of Computer Applications, Vol. 39, No. 14, Pp. 24-28.
Borah, B. and Bhattacharyya, D. (2004) An improved sampling-based DBSCAN for large spatial databases, Proceedings of International Conference on Intelligent Sensing and Information Processing, Pp. 92–96.
Cao, Y. , Jiang, T. and Girke, T. (2010) Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. Bioinformatics, Vol. 26, No. 7, Pp. 953-959.
El-Sonbaty, Y. , Ismail, M. and Farouk, M. (2004) An efficient density based clustering algorithm for large databases, 16th IEEE International Conference on Tools with Artificial Intelligence, ICTAI, Pp. 673–677.
Ester, M. , Kriegel, H. , Sander, J. and Xu, X. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. KDD 96, Pp. 226–231.
Fan, Y. and Yuta, R. (2011) A Density-based Path Clustering Algorithm, International Conference on Intelligent Computation and Bio-Medical Instrumentation (ICBMI), Hubei, Pp. 224-227.
Folino, G. , Forestiero, A. , Spezzano, G. , 2009. An adaptive flocking algorithm for performing approximate clustering. Inform. Sci. 179 (18), 3059–3078.
Garai, G. and Chaudhuri, B. B. (2004) A novel genetic algorithm for automatic clustering, Pattern Recognition Letters, Vol. 25, No. 2, Pp. 173–187.
Guo, Y. , Grossman, R. , Xu, X. , Jäger, J. and Kriegel, H. (2002) A fast parallel clustering algorithm for large spatial databases, High Performance Data Mining, Springer, US, Pp. 263–290.
http://archive. ics. uci. edu/ml/datasets. html
Ivancsy, R. and Kovacs, F. (2006) Clustering techniques utilized in web usage mining, Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, World Scientific and Engineering Academy and Society (WSEAS), Knowledge Engineering and Data Bases, Pp. 237-242.
Jain, A. K. , Mutry, M. N. and Flynn, P. J. (1999) Data Clustering: A Review, ACM Computing Surveys, Vol. 31, No. 3, Pp. 264-323.
Jiang, S. and Li, X. (2009) A hybrid clustering algorithm, Fourth International Conference on Fuzzy Systems and Knowledge Discovery, Vol. 1. IEEE Computer Society, Los Alamitos, CA, USA, Pp. 366–370.
Li, J. , Yi, S. , Wang, X. , Hu, X. and Jiang, H. (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering, Expert Systems with Applications, Elsevier, Vol. 38, Issue 8, Pp. 9373-9381.
Phung, D. , Adams, B. , Tran, K. , Venkatesh, S. and Kumar, M. (2009) High Accuracy Context Recovery using Clustering Mechanisms, In proceedings of the seventh international conference on Pervasive Computing and Communications, PerCom Galveston, USA, Pp. 122-130.
Raiser, S. , Lughofer, E. , Eitzinger, C. and Smith, J. E. (2010) Impact of object extraction methods on classification performance in surface inspection systems, Special Issue Paper, Machine Vision and Applications, Springer-Verlag, PP. 627~641.
Sakellariou, R. , Gurd, J. , Freeman, L. , Keane, J. , Arlia, D. and Coppola, M. (2001) Experiments in parallel clustering with DBSCAN, Euro-Par 2001 Parallel Processing, CASES'04, Springer, Berlin/Heidelberg, Vol. 2150, Pp. 26–331.
Tsai, C. and Sung, C. (2010) EIDBSCAN: An extended improving DBSCAN algorithm with sampling techniques. International Journal of Business Intelligence and Data Mining, Vol. 5, No. 1, Pp. 94–111.
Wei, C. P. , Lee, Y. H. and Hsu, C. M. (2000) Empirical comparison of fast clustering algorithms for large data sets, Proceedings of the 33rd Hawaii International Conference on System Sciences, IEEE Explore Digital Library, Pp. 1-10.

Index Terms

Computer Science

Information Sciences

Keywords

DBSCAN Speed Optimization Nearest Neighbour Search KD-Tree Locally Sensitive Hashing