K-Means Clustering Algorithm based on Entity Resolution

B. Vinay Kumar; B. Raghu Ram; B. Hanmanthu

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

K-Means Clustering Algorithm based on Entity Resolution

by B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 108 - Number 6

Year of Publication: 2014

Authors: B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu

10.5120/18919-0254

B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu . K-Means Clustering Algorithm based on Entity Resolution. International Journal of Computer Applications. 108, 6 ( December 2014), 41-44. DOI=10.5120/18919-0254

@article{ 10.5120/18919-0254,

author = { B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu },

title = { K-Means Clustering Algorithm based on Entity Resolution },

journal = { International Journal of Computer Applications },

issue_date = { December 2014 },

volume = { 108 },

number = { 6 },

month = { December },

year = { 2014 },

issn = { 0975-8887 },

pages = { 41-44 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume108/number6/18919-0254/ },

doi = { 10.5120/18919-0254 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:42:19.074004+05:30

%A B. Vinay Kumar

%A B. Raghu Ram

%A B. Hanmanthu

%T K-Means Clustering Algorithm based on Entity Resolution

%J International Journal of Computer Applications

%@ 0975-8887

%V 108

%N 6

%P 41-44

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Entity resolution is the problem of recognizing which entry in database refers to same cluster. in this we have to run the ER in order to reduce the running time and to obtain good results. This paper investigates how we can reduce the running of ER with minimum amount of work using k-means clustering algorithm. In this, clustering can be done according to the matching of entries. We introduce a concept of technique called as k-means clustering to maximize the matching of entries identified using a limited amount of work. We illustrate the potential gains of this entity resolution approach using k-means.

References

A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge Data Eng. , vol. 19, no. 1, pp. 1-16, Jan. 2007.
A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999
H. B. Newcombe and J. M. Kennedy, "Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information," Comm. ACM, vol. 5, no. 11 pp. 563-566, 1962.
M. A. Herna´ndez and S. J. Stolfo, "The Merge/Purge Problem for Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 127-138, 1995.
A. K. McCallum, K. Nigam, and L. Ungar, "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching," Proc. ACM Sixth SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 169-178, 2000.
Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. 25th Int'l Conf. Very Large Databases (VLDB), pp. 518-529, 1999.
X. Dong, A. Y. Halevy, and J. Madhavan, "Reference Reconciliation in Complex Information Spaces," Proc. ACM SIGMOD Int'lConf. Management of Data, pp. 85-96, 2005.
M. Weis and F. Naumann, "Detecting Duplicates in ComplexXML Data," Proc. 22nd Int'l Conf. Data Eng. (ICDE),p. 109. 2006.

Index Terms

Computer Science

Information Sciences

Keywords

Data cleaning Entity resolution-means Clustering Algorithm