We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Feature-based Clustering of Web Data Sources

by Alsayed Algergawy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 60 - Number 5
Year of Publication: 2012
Authors: Alsayed Algergawy
10.5120/9685-4127

Alsayed Algergawy . Feature-based Clustering of Web Data Sources. International Journal of Computer Applications. 60, 5 ( December 2012), 1-4. DOI=10.5120/9685-4127

@article{ 10.5120/9685-4127,
author = { Alsayed Algergawy },
title = { Feature-based Clustering of Web Data Sources },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 60 },
number = { 5 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume60/number5/9685-4127/ },
doi = { 10.5120/9685-4127 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:05:47.933585+05:30
%A Alsayed Algergawy
%T Feature-based Clustering of Web Data Sources
%J International Journal of Computer Applications
%@ 0975-8887
%V 60
%N 5
%P 1-4
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The proliferation of web data sources increasingly demands the integration of these sources. To facilitate the integration process, a pre-analysis step is required to classify and group data sources into their correct domains. In this paper, we propose a feature-based clustering approach for clustering web data sources without any human intervention and based only on features extracted from the source schemas. In particular, we make use of both linguistic and structural schema features. We experimentally demonstrate the effectiveness of the proposed approach in terms of both the clustering quality and runtime.

References
  1. L. Barbosa and J. Freire. Combining classifiers to identify online databases. In WWW, 2007.
  2. L. Barbosa, J. Freire, and A. S. da Silva. Organizing hidden-web databases by clustering visible web documents. In ICDE, pages 326–335, 2007.
  3. L. Chiticariu, M. A. Hernndez, P. G. Kolaitis, and L. Popa. Semi-automatic schema integration in Clio. In VLDB'07, pages 1326–1329, 2007.
  4. H. H. Do and E. Rahm. Matching large schemas: Approaches and evaluation. Information Systems, 32(6):857– 885, 2007.
  5. T. M. Ghanem and W. G. Aref. Databases deepen the web. Computer, 37(1):116–117, 2004.
  6. J. Madhavan, S. R. Jeffery, S. Cohen, X. L. Dong, D. Ko, C. Yu, and A. Halevy. Web-scale data integration: You can only afford to pay as you go. In CIDR, pages 342–350, 2007.
  7. H. A. Mahmoud and A. Aboulnaga. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In SIGMOD, 2010.
  8. S. Massmann and E. Rahm. Evaluating instance-based matching of web directories. In 11th Workshop on Web and Databases (WebDB), 2008.
  9. W. Meng and C. T. Yu. Advanced Metasearch Engine Technology. Morgan & Claypool Publishers, 2010.
  10. E. Peukert, S. Massmann, and K. Konig. Comparing similarity combination methods for schema matching. In GIWorkshop, pages 692–701, 2010.
  11. N. Yuruk, M. Mete, X. Xu, and T. A. J. Schweiger. AHSCAN: Agglomerative hierarchical structural clustering algorithm for networks. In ASONAM´ 09.
Index Terms

Computer Science
Information Sciences

Keywords

Web data source Data integration Clustering Performance