Improving Clustering Performance on High Dimensional Data using Kernel Hubness

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Improving Clustering Performance on High Dimensional Data using Kernel Hubness

Published on May 2014 by R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan

International Conference on Simulations in Computing Nexus

Foundation of Computer Science USA

ICSCN - Number 2

May 2014

Authors: R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan

R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan . Improving Clustering Performance on High Dimensional Data using Kernel Hubness. International Conference on Simulations in Computing Nexus. ICSCN, 2 (May 2014), 27-30.

@article{

author = { R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan },

title = { Improving Clustering Performance on High Dimensional Data using Kernel Hubness },

journal = { International Conference on Simulations in Computing Nexus },

issue_date = { May 2014 },

volume = { ICSCN },

number = { 2 },

month = { May },

year = { 2014 },

issn = 0975-8887,

pages = { 27-30 },

numpages = 4,

url = { /proceedings/icscn/number2/16156-1023/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on Simulations in Computing Nexus

%A R. Shenbakapriya

%A M. Kalimuthu

%A P. Sengottuvelan

%T Improving Clustering Performance on High Dimensional Data using Kernel Hubness

%J International Conference on Simulations in Computing Nexus

%@ 0975-8887

%V ICSCN

%N 2

%P 27-30

%D 2014

%I International Journal of Computer Applications

Abstract

Clustering high dimensional data becomes difficult due to the increasing sparsity of such data. One of the inherent properties of high dimensional data is hubness phenomenon, which is used for clustering such data. Hubness is the tendency of high-dimensional data to contain points (hubs) that occurs frequently in k-nearest neighbor lists of other data points. The k-nearest-neighbor lists are used to measure the hubness score of each data point. The simple hub based clustering algorithms detect only hyperspherical clusters in the high dimensional dataset. But the real time high dimensional dataset contains more number of arbitrary shaped clusters. To improve the performance of clustering, a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.

References

J. Han and M. Kamber (2006), "Data Mining: Concepts and Techniques," 2nd ed. Morgan Kaufmann Publishers.
Milo?s Radovanovi´c, Alexandros Nanopoulos, and Mirjana Ivanovi´c (2010), "Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data," Journal of Machine Learning Research, pp. 2487-2531.
N. Toma?sev and D. Mladeni´c (2012), "Nearest neighbor voting in high dimensional data: Learning from past occurrences," Computer Science and Information Systems, vol. 9, no. 2, pp. 691–712.
N. Tomasev, M. Radovanovic, D. Mladenic, M. Ivanovic (2013), "The Role of Hubness in Clustering High-Dimensional data," IEEE Transactions on Knowledge and Data Engineering, vol:pp, issue:99, ISSN:1041-4347.
N. Tomasev, R. Brehar, D. Mladenic, and S. Nedevschi (2011), "The influence of hubness on nearest-neighbor methods in object recognition," in Proc. 7th IEEE Int. Conf. on Intelligent Computer Communication and Processing (ICCP), pp. 367–374
Grigorios F. Tzortzis and Aristidis C. Likas,(2009), "The Global Kernel K-Means Algorithm for Clustering in Feature Space" IEEE Transactions on Neural Networks, Vol. 20. No. 7,PP:1181-1194.
I. S. Dhillon, Y. Guan, and B. Kulis, "Kernel k-means: spectral clustering and normalized cuts," in Proc. 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2004, pp. 551–556.
C. -T. Chang, J. Z. C. Lai, and M. D. Jeng (2010), "Fast agglomerative clustering using information of k-nearest neighbors," Pattern Recognition, vol. 43, no. 12, pp. 3958–3968.
R. Xu, D. Wunsch (2005), "Survey of clustering algorithms," IEEE Transactions on Neural Networks 16 (3) pp. 645–678.
Nanopoulos A. , M. Radovanovi´c, and M. Ivanovi´c (2009), "How does high dimensionality affect collaborative filtering?" in Proc. 3rd ACM Conf. on Recommender Systems (RecSys), pp. 293–296.
A. K. Jain, M. N. Murty, P. J. Flynn (1999), "Data clustering: a review," ACM Computing Surveys 31 (3) pp. 264–323.
E. Plaka and L. E. Kavraki (2007), "Distributed computation of the Knn graph for large high dimensional point sets," Journal of Parallel and DistributeComputing, 67(3): 346-

Index Terms

Computer Science

Information Sciences

Keywords

High Dimensional Data Hubness Phenomenon Kernel Mapping And K-nearest Neighbor.