Enhancement of CURE Clustering Technique in Data Mining

Call for Paper

November Edition

IJCA solicits high quality original research papers for the upcoming November edition of the journal. The last date of research paper submission is 20 October 2025

Submit your paper

Know more

The week's pick

Zero Trust Architecture Implementation in Enterprise Networks: Evaluating Effectiveness Against Cyber Threats

Stephen Kofi Dotse Samuel Yao Sebuabe Augustus Obeng Silas Asani Abudu Edna Awisie Pappoe

Random Articles

Reseach Article

Enhancement of CURE Clustering Technique in Data Mining

Published on April 2012 by Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh

Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012)

Foundation of Computer Science USA

DRISTI - Number 1

April 2012

Authors: Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh

Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh . Enhancement of CURE Clustering Technique in Data Mining. Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012). DRISTI, 1 (April 2012), 7-11.

@article{

author = { Seema Maitrey, C. K. Jha, Rajat Gupta, Jaiveer Singh },

title = { Enhancement of CURE Clustering Technique in Data Mining },

journal = { Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012) },

issue_date = { April 2012 },

volume = { DRISTI },

number = { 1 },

month = { April },

year = { 2012 },

issn = 0975-8887,

pages = { 7-11 },

numpages = 5,

url = { /proceedings/dristi/number1/5922-1003/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012)

%A Seema Maitrey

%A C. K. Jha

%A Rajat Gupta

%A Jaiveer Singh

%T Enhancement of CURE Clustering Technique in Data Mining

%J Development of Reliable Information Systems, Techniques and Related Issues (DRISTI 2012)

%@ 0975-8887

%V DRISTI

%N 1

%P 7-11

%D 2012

%I International Journal of Computer Applications

Abstract

The precious information is embedded in large databases. To extract them has become an interesting area of Data mining. Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data [5]. Among several clustering algorithms, we have considered CURE method from hierarchical clustering. CURE (Clustering usage Representatives) method find clusters from a large database that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size. CURE employs a combination of data collection, data reduction by using random sampling and partitioning. With the availability of large data sets in application areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, it is becoming increasingly important to execute data mining tasks in parallel. At the same time, technological advances have made shared memory parallel machines commonly available to organizations and individuals. Although CURE provide high quality clustering, a parallel version was not available. Our new algorithm enabled it to outperform existing algorithms as well as to scale well for large databases without declining clustering quality.

References

Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, 1988.
Bernd Mohr Introduction to Parallel Computing. Computational Nanoscience NIC Series, Vol. 31, ISBN 3-00-017350-1, pp. 491-505, 2006.
Clark F. Olson. Parallel algorithms for hierarchical clustering. Technical report, University of California at Berkeley, December 1993.
Devendra Kumar Tiwary ,"Application of Data Mining In Customer Relationship Management (CRM)", Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 3 Number 4 (2010) pp. 527â 540
Fayyad, Usama; Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996). "From Data Mining to Knowledge Discovery in Databases". http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf. Retrieved 2008-12-17.
http:// www. thearling. com/ text/ dmwhite/dmwhite.htm
J. Han and M. Kamber; 2000, "Data Mining: Concepts and Techniques", Morgan Kaufmann.
M.H. Dunham ," http:// engr. smu. edu/~mhd/dmbook/part2. ppt."
Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, and Pano Vassiliadis. Fundamentals of Data Warehouses. Springer, 1999.
Osmar R. ZaÃ¯ane: "Principles of Knowledge Discovery in Databases - Chapter 8: Data Clustering". http:// www. cs. ualberta. ca/~ zaiane/ courses /cmput690 /slides/ Chapter8 /index.html.
Pavel Berkin , "Survey Of Clustering Data Mining Techniques", 2000
Richard J. Roiger, Michael W. Geatz, 2007, Data Mining A tutorial-based Primer", Pearson Education, New Delhi
Shashikumar G. Totad, Geeta R. B, Chennupati R Prasanna, N Krishna SanthosH , PVGD Prasad Reddy. Scaling Data Mining Algorithms to Large and Distributed Datasets. International Journal of Database Management Systems (IJDMS), Vol.2, No.4, November 2010
U.S. Fayyad, G. Piatetsky Shapiro, P. Smyth, R. Uthurusamy ."Advances in Knowledge Discovery and Data Mining.", AAAI/MIT Press, 1996.
Hinneburg, Keim. Clustering Techniques for Large Data Sets. First publ. in: ACM SIGKDD 1999 Int. Conf. on Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, September, 1999, pp. 141-181
Wang,Aggarwal, C., J. Han, P.S. Yu. 2003. A framework for clustering evolving data streams. In Proc. of the 29th International Conference on Very Large Data Bases, Vol. 29, pp. 81-92.
Guha, S.; Rastogi, R.; Shim, K.; CURE: an efficient clustering algorithm for large databases . 1998 ACM SIGMOD International Conference on Management of Data Seattle, WA, USA 1-4 June 1998 PUBLICATION: SIGMOD Rec. (USA), SIGMOD Record, vol.27, no.2, p. 73-84, 0163-5808 ACM June 1998 .
O'Callaghan, L., N. Mishra, A. Meyerson, S. Guha, R. Motwani. 2002. Streaming-data algorithms for high-quality clustering. In Proc. of the 18th Intl. Conf. on Data Engineering, pp. 685-684.
M. Kaya, R. Alhajj / Fuzzy Sets and Systems 152 (2005) 587â601. Genetic algorithm based framework for mining fuzzy association rules.

Index Terms

Computer Science

Information Sciences

Keywords

Data Mining Kdd Clustering Issues Parallelism