We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Conflict Identification and Resolution in Heterogeneous Datasets: A Comprehensive Survey

by I.carol, S.britto Ramesh Kumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 113 - Number 12
Year of Publication: 2015
Authors: I.carol, S.britto Ramesh Kumar
10.5120/19879-1885

I.carol, S.britto Ramesh Kumar . Conflict Identification and Resolution in Heterogeneous Datasets: A Comprehensive Survey. International Journal of Computer Applications. 113, 12 ( March 2015), 22-27. DOI=10.5120/19879-1885

@article{ 10.5120/19879-1885,
author = { I.carol, S.britto Ramesh Kumar },
title = { Conflict Identification and Resolution in Heterogeneous Datasets: A Comprehensive Survey },
journal = { International Journal of Computer Applications },
issue_date = { March 2015 },
volume = { 113 },
number = { 12 },
month = { March },
year = { 2015 },
issn = { 0975-8887 },
pages = { 22-27 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume113/number12/19879-1885/ },
doi = { 10.5120/19879-1885 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:50:46.163023+05:30
%A I.carol
%A S.britto Ramesh Kumar
%T Conflict Identification and Resolution in Heterogeneous Datasets: A Comprehensive Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 113
%N 12
%P 22-27
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data Integration has become the vital necessities of today's interconnected world. Information is scattered everywhere and to retain the strategic advantage, it becomes mandatory for organizations to obtain as much information as possible. Hence combining the scattered data sources to obtain information becomes the only solution. Data integration is posed by several challenges including the basic nature (heterogeneity) of the data. This paper describes the basic elements of a data integration system and emphasizes on the data fusion phase which forms the core functionality of the architecture. The problems occurring during data fusion (conflicts) are discussed and it also provides a comprehensive survey of the techniques used to resolve conflicts. Functionalities lacking in the current system and future research directions are discussed in detail.

References
  1. Dong, X. , Naumann F. 2009. Data Fusion – Resolving Data Conflicts for Integration . VLDB 09. 24-28.
  2. Naumann, F. , Bilke, A. , Bleiholder, J and Weis, M. 2006. Data fusion in three steps: Resolving inconsistencies at schema-, tuple-, and value-level. IEEE Data Engineering Bulletin, 29(2):21–31.
  3. Bleiholder, J and Naumann, F. 2006. Conflict handling strategies in an integrated information system. In Proceedings of the International Workshop on Information Integration on the Web (IIWeb), Edinburgh, UK.
  4. Litwin, W. , Abdellatif, A. , Zeroual, A. , Nicolas, B and Vigier, Ph. 1989. MSQL: A multidatabase language. Published by Elsevier. Volume 49, Issues 1–3. Pages 59–101.
  5. Tresch, Markus and Scholl, M. 1994. A classification of multi-database languages. Parallel and Distributed Information Systems. Proceedings of the Third International Conference on. IEEE.
  6. Litwin, W and Abdellatif, A. 1987. An overview of the multi-database manipulation language MDSL. Proceedings of the IEEE (Volume: 75, Issue: 5). 621 - 632.
  7. Lakshmanan, V. S. , Sadri, F. , Subramanian, S. 2001. SchemaSQL: An extension to SQL for multidatabase interoperability. ACM Transactions on Database Systems (TODS), Volume 26 Issue 4.
  8. LaksLakshmanan ,Sadri, F. , Subramanian, I. , 1996. SchemaSQL -- A Language for Interoperability in Relational Multi-database Systems.
  9. Anokhin, Philipp, and Motro, A. 2001. Data integration: Inconsistency detection and resolution based on source properties. Proceedings of the International Workshop on Foundations of Models for Information Integration (FMII'01).
  10. Hernandez, M. A. , Stolfo, S. J. 1998. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem. Data Mining and Knowledge Discovery 2(1): 9–37.
  11. Rasmussen, E. M. Clustering Algorithms. Information Retrieval: Data Structures and Algorithm. 419-442.
  12. Kaufman, L and Rousseeuw, P. J. 1990. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons.
  13. Naiman, Channah F. , and Ouksel, A. 1995. A classification of semantic conflicts in heterogeneous database systems. Journal of Organizational Computing and Electronic Commerce 5. 2. 167-193.
  14. Lim, E. P. , Srivastava, J. , Prabhakar, S and Richardson, J. 1993. Entity identification in database integration. in Proc. 8th Int. Conf. Data Engineering, pp. 294-301.
  15. Burdick, Doug, Deshpande, P. , Jayram, T. S. , Ramakrishnan, R and Vithyanathan, S. 2007. OLAP over uncertain and imprecise data. The VLDB Journal—The International Journal on Very Large Data Bases 16, no. 1: 123-144.
  16. Yan, Ling, L and Ozsu, M. 1999. Conflict tolerant queries in AURORA. Cooperative Information Systems. CoopIS'99. Proceedings. IFCIS International Conference on. IEEE.
  17. Arenas, Marcelo, Bertossi, L and Chomicki, J. Consistent query answers in inconsistent databases. Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM.
  18. Fuxman, Ariel, Fazli, E and Miller, R. 2005. Conquer: Efficient management of inconsistent databases. Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM.
  19. Landers, Terry and Rosenberg, R. 1986. An overview of Multibase. Distributed systems. Vol. II: distributed data base systems. Artech House, Inc.
  20. Motro, Amihai, and Anokhin, P. 2006. Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources. Information fusion 7. 2: 176-196.
  21. Chawathe, Sudarshan, Molina, H. , Hammer, J. , Ireland, K. , Papakonstantinou, Y. , Ullman, J and Widom, J. 1994. The TSIMMIS project: Integration of heterogenous information sources.
  22. Hellerstein, Joseph M. , Avnur, R and Raman, V. 2000. Informix under control: Online query processing. Data Mining and Knowledge Discovery 4. 4: 281-314.
  23. Chomicki, Jan, Marcinkowski, J and Staworko, S. 2004. Hippo: A system for computing consistent answers to a class of SQL queries. Advances in Database Technology-EDBT 2004. Springer Berlin Heidelberg. 841-844.
  24. Deelman, Ewa, Singh, G. , Su, M. , Blythe, J. , Gil, Y. , Kesselman, C and Mehta, C. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13, no. 3: 219-237.
  25. Draper, Denise, Halevy, A and Weld, D. The Nimble XML data integration system. Data Engineering. Proceedings. 17th International Conference on. IEEE.
  26. Bayardo Jr, Roberto J. , Bohrer, W. , Brice, R. , Cichocki, A. , Fowler,J. , Helal, A. , Kashyap, V. 1997. InfoSleuth: agent-based semantic integration of information in open and dynamic environments. In ACM SIGMOD Record, vol. 26, no. 2, pp. 195-206. ACM.
  27. Raman, Vijayshankar, and Hellerstein, J. 2001. Potter's wheel: An interactive data cleaning system. VLDB. Vol. 1.
Index Terms

Computer Science
Information Sciences

Keywords

Conflict Identification Resolution Datasets