Emerging Trends in Computing |
Foundation of Computer Science USA |
ETC2016 - Number 4 |
March 2017 |
Authors: Snehal D.borase, Satish S.banait |
Snehal D.borase, Satish S.banait . Dimensionality Reduction using Clustering Technique. Emerging Trends in Computing. ETC2016, 4 (March 2017), 17-22.
Clustering is a method of finding homogeneous classes of the known objects. Clustering plays a major role in various applications in data mining such as, computational biology, medical diagnosis, information recovery, CRM, scientific data investigation, selling, and web analysis. Most of the researchers have a major interest in designing clustering algorithms. "Big data" involves terabytes and petabytes of data. Big data is challenging because of its five important characteristics such as volume, velocity, variety, variability and complexity. Therefore big data is difficult to handle using conventional tools and techniques. There are so many issues in clustering techniques, so some of the issues is how to process the data and big data is clustered in more compact format, Clustering algorithm suffer from stability problem, ensemble of single and multi level clustering. An important issue in clustering is that we do not have earlier knowledge regarding data. Also selection of input parameters such as number of nearest neighbours, number of clusters in these algorithms makes clustering a challenging task. The main objective is to study and analyze the existing clustering algorithms, impact of dimensionality reduction and dealing with outliers.