CFP last date
20 December 2024
Reseach Article

Enhancement of CURE Clustering Technique in Data Mining

Published on April 2012 by Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh
Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012)
Foundation of Computer Science USA
DRISTI - Number 1
April 2012
Authors: Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh
a79b5c91-8db1-4375-a043-0188b44a9afd

Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh . Enhancement of CURE Clustering Technique in Data Mining. Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012). DRISTI, 1 (April 2012), 7-11.

@article{
author = { Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh },
title = { Enhancement of CURE Clustering Technique in Data Mining },
journal = { Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012) },
issue_date = { April 2012 },
volume = { DRISTI },
number = { 1 },
month = { April },
year = { 2012 },
issn = 0975-8887,
pages = { 7-11 },
numpages = 5,
url = { /proceedings/dristi/number1/5922-1003/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012)
%A Seema Maitrey
%A C. K. Jha
%A Rajat Gupta
%A Jaiveer Singh
%T Enhancement of CURE Clustering Technique in Data Mining
%J Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012)
%@ 0975-8887
%V DRISTI
%N 1
%P 7-11
%D 2012
%I International Journal of Computer Applications
Abstract

The precious information is embedded in large databases. To extract them has become an interesting area of Data mining. Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data [5]. Among several clustering algorithms, we have considered CURE method from hierarchical clustering. CURE (Clustering usage Representatives) method find clusters from a large database that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size. CURE employs a combination of data collection, data reduction by using random sampling and partitioning. With the availability of large data sets in application areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, it is becoming increasingly important to execute data mining tasks in parallel. At the same time, technological advances have made shared memory parallel machines commonly available to organizations and individuals. Although CURE provide high quality clustering, a parallel version was not available. Our new algorithm enabled it to outperform existing algorithms as well as to scale well for large databases without declining clustering quality.

References
  1. Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, 1988.
  2. Bernd Mohr Introduction to Parallel Computing. Computational Nanoscience NIC Series, Vol. 31, ISBN 3-00-017350-1, pp. 491-505, 2006.
  3. Clark F. Olson. Parallel algorithms for hierarchical clustering. Technical report, University of California at Berkeley, December 1993.
  4. Devendra Kumar Tiwary ,"Application of Data Mining In Customer Relationship Management (CRM)", Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 3 Number 4 (2010) pp. 527– 540
  5. Fayyad, Usama; Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996). "From Data Mining to Knowledge Discovery in Databases". http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf. Retrieved 2008-12-17.
  6. http:// www. thearling. com/ text/ dmwhite/dmwhite.htm
  7. J. Han and M. Kamber; 2000, "Data Mining: Concepts and Techniques", Morgan Kaufmann.
  8. M.H. Dunham ," http:// engr. smu. edu/~mhd/dmbook/part2. ppt."
  9. Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, and Pano Vassiliadis. Fundamentals of Data Warehouses. Springer, 1999.
  10. Osmar R. Zaïane: "Principles of Knowledge Discovery in Databases - Chapter 8: Data Clustering". http:// www. cs. ualberta. ca/~ zaiane/ courses /cmput690 /slides/ Chapter8 /index.html.
  11. Pavel Berkin , "Survey Of Clustering Data Mining Techniques", 2000
  12. Richard J. Roiger, Michael W. Geatz, 2007, Data Mining A tutorial-based Primer", Pearson Education, New Delhi
  13. Shashikumar G. Totad, Geeta R. B, Chennupati R Prasanna, N Krishna SanthosH , PVGD Prasad Reddy. Scaling Data Mining Algorithms to Large and Distributed Datasets. International Journal of Database Management Systems (IJDMS), Vol.2, No.4, November 2010
  14. U.S. Fayyad, G. Piatetsky Shapiro, P. Smyth, R. Uthurusamy ."Advances in Knowledge Discovery and Data Mining.", AAAI/MIT Press, 1996.
  15. Hinneburg, Keim. Clustering Techniques for Large Data Sets. First publ. in: ACM SIGKDD 1999 Int. Conf. on Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, September, 1999, pp. 141-181
  16. Wang,Aggarwal, C., J. Han, P.S. Yu. 2003. A framework for clustering evolving data streams. In Proc. of the 29th International Conference on Very Large Data Bases, Vol. 29, pp. 81-92.
  17. Guha, S.; Rastogi, R.; Shim, K.; CURE: an efficient clustering algorithm for large databases . 1998 ACM SIGMOD International Conference on Management of Data Seattle, WA, USA 1-4 June 1998 PUBLICATION: SIGMOD Rec. (USA), SIGMOD Record, vol.27, no.2, p. 73-84, 0163-5808 ACM June 1998 .
  18. O'Callaghan, L., N. Mishra, A. Meyerson, S. Guha, R. Motwani. 2002. Streaming-data algorithms for high-quality clustering. In Proc. of the 18th Intl. Conf. on Data Engineering, pp. 685-684.
  19. M. Kaya, R. Alhajj / Fuzzy Sets and Systems 152 (2005) 587–601. Genetic algorithm based framework for mining fuzzy association rules.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Kdd Clustering Issues Parallelism