Analysis of Different Similarity Measure Functions and Their Impacts on Shared Nearest Neighbor Clustering Approach

Anil Kumar Patidar; Jitendra Agrawal; Nishchol Mishra

Call for Paper

November Edition

IJCA solicits high quality original research papers for the upcoming November edition of the journal. The last date of research paper submission is 20 October 2025

Submit your paper

Know more

The week's pick

Zero Trust Architecture Implementation in Enterprise Networks: Evaluating Effectiveness Against Cyber Threats

Stephen Kofi Dotse Samuel Yao Sebuabe Augustus Obeng Silas Asani Abudu Edna Awisie Pappoe

Random Articles

Reseach Article

Analysis of Different Similarity Measure Functions and Their Impacts on Shared Nearest Neighbor Clustering Approach

by Anil Kumar Patidar, Jitendra Agrawal, Nishchol Mishra

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 40 - Number 16

Year of Publication: 2012

Authors: Anil Kumar Patidar, Jitendra Agrawal, Nishchol Mishra

10.5120/5061-7221

Anil Kumar Patidar, Jitendra Agrawal, Nishchol Mishra . Analysis of Different Similarity Measure Functions and Their Impacts on Shared Nearest Neighbor Clustering Approach. International Journal of Computer Applications. 40, 16 ( February 2012), 1-5. DOI=10.5120/5061-7221

@article{ 10.5120/5061-7221,

author = { Anil Kumar Patidar, Jitendra Agrawal, Nishchol Mishra },

title = { Analysis of Different Similarity Measure Functions and Their Impacts on Shared Nearest Neighbor Clustering Approach },

journal = { International Journal of Computer Applications },

issue_date = { February 2012 },

volume = { 40 },

number = { 16 },

month = { February },

year = { 2012 },

issn = { 0975-8887 },

pages = { 1-5 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume40/number16/5061-7221/ },

doi = { 10.5120/5061-7221 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:28:12.235921+05:30

%A Anil Kumar Patidar

%A Jitendra Agrawal

%A Nishchol Mishra

%T Analysis of Different Similarity Measure Functions and Their Impacts on Shared Nearest Neighbor Clustering Approach

%J International Journal of Computer Applications

%@ 0975-8887

%V 40

%N 16

%P 1-5

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Clustering is a technique of grouping data with analogous data content. In recent years, Density based clustering algorithms especially SNN clustering approach has gained high popularity in the field of data mining. It finds clusters of different size, density, and shape, in the presence of large amount of noise and outliers. SNN is widely used where large multidimensional and dynamic databases are maintained. A typical clustering technique utilizes similarity function for comparing various data items. Previously, many similarity functions such as Euclidean or Jaccard similarity measures have been worked upon for the comparison purpose. In this paper, we have evaluated the impact of four different similarity measure functions upon Shared Nearest Neighbor (SNN) clustering approach and the results were compared subsequently. Based on our analysis, we arrived on a conclusion that Euclidean function works best with SNN clustering approach in contrast to cosine, Jaccard and correlation distance measures function.

References

Levent Ertoz, Michael Steinback, Vipin Kumar, “Finding Clusters of Different Sizes, Shapes, and Density in Noisy, High Dimensional Data”, Second SIAM International Conference on Data Mining, San Francisco, CA, USA, 2003.
Anna Huang, “Similarity Measures for Text Document Clustering”, NZCSRSC 2008, April 2008, Christchurch, New Zealand.
Kazem Taghva and Rushikesh Veni, “Effects of Similarity Metrics on Document Clustering”, 2010 Seventh International Conference on Information Technology.
R. A. Jarvis and E. A. Patrick, “Clustering Using a Similarity Measure Based on Shared Nearest Neighbors,” IEEE Transactions on Computers, Vol. C-22,
M. R. Anderherg, “Cluster Analysis for Application”, Academic Press, New York, 1973.
Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, San Francisco, USA, 2001, ISBN 1558604898.
Lori Bowen Ayre, ”Data Mining for Information Professionals”, 2006.
Arun K Pujari, “Data Mining Techniques- Second Edition”, Universities Press. No. 11, November 1973.
Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” KDD 96, Portland, OR, pp. 226-231, 1996.
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim,“CURE: An Efficient Clustering Algorithm for Large Databases”, ACM, 1998.
Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes”, In Proceedings of the 15th International Conference on Data Engineering, 1998.
George Karypis, Eui-Hong Han, and Vipin Kumar, “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” IEEE Computer, Vol. 32, No. 8,. pp. 68-75, August 1999.

Index Terms

Computer Science

Information Sciences

Keywords

Data mining Clustering SNN (Shared Nearest Neighbor) Density Noise Outlier Similarity Measure