Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012) |
Foundation of Computer Science USA |
DRISTI - Number 1 |
April 2012 |
Authors: Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh |
a79b5c91-8db1-4375-a043-0188b44a9afd |
Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh . Enhancement of CURE Clustering Technique in Data Mining. Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012). DRISTI, 1 (April 2012), 7-11.
The precious information is embedded in large databases. To extract them has become an interesting area of Data mining. Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data [5]. Among several clustering algorithms, we have considered CURE method from hierarchical clustering. CURE (Clustering usage Representatives) method find clusters from a large database that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size. CURE employs a combination of data collection, data reduction by using random sampling and partitioning. With the availability of large data sets in application areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, it is becoming increasingly important to execute data mining tasks in parallel. At the same time, technological advances have made shared memory parallel machines commonly available to organizations and individuals. Although CURE provide high quality clustering, a parallel version was not available. Our new algorithm enabled it to outperform existing algorithms as well as to scale well for large databases without declining clustering quality.