International Conference on Simulations in Computing Nexus |
Foundation of Computer Science USA |
ICSCN - Number 2 |
May 2014 |
Authors: R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan |
R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan . Improving Clustering Performance on High Dimensional Data using Kernel Hubness. International Conference on Simulations in Computing Nexus. ICSCN, 2 (May 2014), 27-30.
Clustering high dimensional data becomes difficult due to the increasing sparsity of such data. One of the inherent properties of high dimensional data is hubness phenomenon, which is used for clustering such data. Hubness is the tendency of high-dimensional data to contain points (hubs) that occurs frequently in k-nearest neighbor lists of other data points. The k-nearest-neighbor lists are used to measure the hubness score of each data point. The simple hub based clustering algorithms detect only hyperspherical clusters in the high dimensional dataset. But the real time high dimensional dataset contains more number of arbitrary shaped clusters. To improve the performance of clustering, a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.