We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

K-Means Clustering Algorithm based on Entity Resolution

by B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 108 - Number 6
Year of Publication: 2014
Authors: B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu
10.5120/18919-0254

B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu . K-Means Clustering Algorithm based on Entity Resolution. International Journal of Computer Applications. 108, 6 ( December 2014), 41-44. DOI=10.5120/18919-0254

@article{ 10.5120/18919-0254,
author = { B. Vinay Kumar, B. Raghu Ram, B. Hanmanthu },
title = { K-Means Clustering Algorithm based on Entity Resolution },
journal = { International Journal of Computer Applications },
issue_date = { December 2014 },
volume = { 108 },
number = { 6 },
month = { December },
year = { 2014 },
issn = { 0975-8887 },
pages = { 41-44 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume108/number6/18919-0254/ },
doi = { 10.5120/18919-0254 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:42:19.074004+05:30
%A B. Vinay Kumar
%A B. Raghu Ram
%A B. Hanmanthu
%T K-Means Clustering Algorithm based on Entity Resolution
%J International Journal of Computer Applications
%@ 0975-8887
%V 108
%N 6
%P 41-44
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Entity resolution is the problem of recognizing which entry in database refers to same cluster. in this we have to run the ER in order to reduce the running time and to obtain good results. This paper investigates how we can reduce the running of ER with minimum amount of work using k-means clustering algorithm. In this, clustering can be done according to the matching of entries. We introduce a concept of technique called as k-means clustering to maximize the matching of entries identified using a limited amount of work. We illustrate the potential gains of this entity resolution approach using k-means.

References
  1. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge Data Eng. , vol. 19, no. 1, pp. 1-16, Jan. 2007.
  2. A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999
  3. H. B. Newcombe and J. M. Kennedy, "Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information," Comm. ACM, vol. 5, no. 11 pp. 563-566, 1962.
  4. M. A. Herna´ndez and S. J. Stolfo, "The Merge/Purge Problem for Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 127-138, 1995.
  5. A. K. McCallum, K. Nigam, and L. Ungar, "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching," Proc. ACM Sixth SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 169-178, 2000.
  6. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. 25th Int'l Conf. Very Large Databases (VLDB), pp. 518-529, 1999.
  7. X. Dong, A. Y. Halevy, and J. Madhavan, "Reference Reconciliation in Complex Information Spaces," Proc. ACM SIGMOD Int'lConf. Management of Data, pp. 85-96, 2005.
  8. M. Weis and F. Naumann, "Detecting Duplicates in ComplexXML Data," Proc. 22nd Int'l Conf. Data Eng. (ICDE),p. 109. 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Data cleaning Entity resolution-means Clustering Algorithm