Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis

R.Indhumathi; Dr.S.Sathiyabama

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis

by R.Indhumathi, Dr.S.Sathiyabama

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 11 - Number 8

Year of Publication: 2010

Authors: R.Indhumathi, Dr.S.Sathiyabama

10.5120/1606-2158

R.Indhumathi, Dr.S.Sathiyabama . Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis. International Journal of Computer Applications. 11, 8 ( December 2010), 1-4. DOI=10.5120/1606-2158

@article{ 10.5120/1606-2158,

author = { R.Indhumathi, Dr.S.Sathiyabama },

title = { Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis },

journal = { International Journal of Computer Applications },

issue_date = { December 2010 },

volume = { 11 },

number = { 8 },

month = { December },

year = { 2010 },

issn = { 0975-8887 },

pages = { 1-4 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume11/number8/1606-2158/ },

doi = { 10.5120/1606-2158 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:59:59.900963+05:30

%A R.Indhumathi

%A Dr.S.Sathiyabama

%T Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis

%J International Journal of Computer Applications

%@ 0975-8887

%V 11

%N 8

%P 1-4

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

High dimensional data is phenomenon in real-world data mining applications. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. Usually k-means clustering algorithm is used but it results in time consuming, computationally expensive and the quality of the resulting clusters depends on the selection of initial centroid and the dimension of the data. The accuracy of the resultant value perhaps not up to the level of expectation when the dimension of the dataset is high because we cannot say that the dataset chosen are free from noisy and flawless. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be pre-processed by an efficient dimensionality reduction method. This paper proposes a method in which the high dimensional data is reduced through Principal Component Analysis and then bisecting k-means clustering is performed on the reduced data where there is no initialization of the centroids.

References

Pang-Ning Tang, Michal Steinbach and Vipin Kumar, “ Introduction to Data Mining”, Pearson Education,Third edition, 2009.
Chris Ding and Xiaofeng He, “K-Means Clustering via Principal Component Analysis”,In proceedings of the 21stInternational Conference on Machine Learning, Banff, Canada, 2004
Sandro Saitta, Combining PCA and K-means March 26, 2007 by Filed under: PCA, k-means
Chris Ding and Xiaofeng He ,K-means Clustering via Principal Component Analysis: Proceedings of the twenty-first international conference on Machine learning, Page: 29 ,Year of Publication: 2004
Zhang Z., Zhang J. and Xue H.2008.Improved K-means clustering algorithm Proceedings of the congress on Image and signal Processing, Vol.5,n0.5,pp.162-172
Principal component analysis From Wikipedia, the free encyclope
I.T. Jolliffe. Principal Component Analysis. Springer, 2nd edition2002, ISBN 978-0-387-95442-4.
Rajashree Dash,Debahuti Mishra,Amiya Kumar Rath,Milu Acharya ,A hybridized K- means clustering approach for high dimensional dataset, ,Inertnatioanl Journal of Engineering Science and Technology,Vol 2,No 2, 2010,pp,59-66.
Merz C and Murphy P, UCI Repository of Machine Learning Databases.
A Deterministic Method for Initializing K- Means Clustering, Ting Su,Jennifer Dy, Proceedings of the 16th IEEE International Conference on Tools with Artifical Intelligence,pp.784-786.
Valarrnathie P.,Srinath M.and Dinakaran K., 2009.An Increased performance of Clustering high dimensional data through dimensionality reduction technique,Journal of Theoretical and Applied Information Technology,Vol 13,pp 271-273.
Sergio M. Savaresi and Daniel L. Boley, On the performance of Bisecting K-Means and PDDP.
N.Tajunisha and V.Saravanan,”An increased performance of clustering high dimensional data using Priniciapl Component Analysis, 2010 First International Conference on Integrated Intelligent Computing”DOI 10.11.09
A k-Means-Based Projected Clustering Algorithm,Yufen Sun,Gang Liy and Kun Xu, 2010 Third International Joint Conference on Computational Science and Optimization, DOI 10.11.09

Index Terms

Computer Science

Information Sciences

Keywords

Keywords K-means Dimensionality Reduction Principal Component Analysis