Feature-based Clustering of Web Data Sources

Alsayed Algergawy

Call for Paper

November Edition

IJCA solicits high quality original research papers for the upcoming November edition of the journal. The last date of research paper submission is 20 October 2025

Submit your paper

Know more

The week's pick

Zero Trust Architecture Implementation in Enterprise Networks: Evaluating Effectiveness Against Cyber Threats

Stephen Kofi Dotse Samuel Yao Sebuabe Augustus Obeng Silas Asani Abudu Edna Awisie Pappoe

Random Articles

Reseach Article

Feature-based Clustering of Web Data Sources

by Alsayed Algergawy

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 60 - Number 5

Year of Publication: 2012

Authors: Alsayed Algergawy

10.5120/9685-4127

Alsayed Algergawy . Feature-based Clustering of Web Data Sources. International Journal of Computer Applications. 60, 5 ( December 2012), 1-4. DOI=10.5120/9685-4127

@article{ 10.5120/9685-4127,

author = { Alsayed Algergawy },

title = { Feature-based Clustering of Web Data Sources },

journal = { International Journal of Computer Applications },

issue_date = { December 2012 },

volume = { 60 },

number = { 5 },

month = { December },

year = { 2012 },

issn = { 0975-8887 },

pages = { 1-4 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume60/number5/9685-4127/ },

doi = { 10.5120/9685-4127 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:05:47.933585+05:30

%A Alsayed Algergawy

%T Feature-based Clustering of Web Data Sources

%J International Journal of Computer Applications

%@ 0975-8887

%V 60

%N 5

%P 1-4

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The proliferation of web data sources increasingly demands the integration of these sources. To facilitate the integration process, a pre-analysis step is required to classify and group data sources into their correct domains. In this paper, we propose a feature-based clustering approach for clustering web data sources without any human intervention and based only on features extracted from the source schemas. In particular, we make use of both linguistic and structural schema features. We experimentally demonstrate the effectiveness of the proposed approach in terms of both the clustering quality and runtime.

References

L. Barbosa and J. Freire. Combining classifiers to identify online databases. In WWW, 2007.
L. Barbosa, J. Freire, and A. S. da Silva. Organizing hidden-web databases by clustering visible web documents. In ICDE, pages 326–335, 2007.
L. Chiticariu, M. A. Hernndez, P. G. Kolaitis, and L. Popa. Semi-automatic schema integration in Clio. In VLDB'07, pages 1326–1329, 2007.
H. H. Do and E. Rahm. Matching large schemas: Approaches and evaluation. Information Systems, 32(6):857– 885, 2007.
T. M. Ghanem and W. G. Aref. Databases deepen the web. Computer, 37(1):116–117, 2004.
J. Madhavan, S. R. Jeffery, S. Cohen, X. L. Dong, D. Ko, C. Yu, and A. Halevy. Web-scale data integration: You can only afford to pay as you go. In CIDR, pages 342–350, 2007.
H. A. Mahmoud and A. Aboulnaga. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In SIGMOD, 2010.
S. Massmann and E. Rahm. Evaluating instance-based matching of web directories. In 11th Workshop on Web and Databases (WebDB), 2008.
W. Meng and C. T. Yu. Advanced Metasearch Engine Technology. Morgan & Claypool Publishers, 2010.
E. Peukert, S. Massmann, and K. Konig. Comparing similarity combination methods for schema matching. In GIWorkshop, pages 692–701, 2010.
N. Yuruk, M. Mete, X. Xu, and T. A. J. Schweiger. AHSCAN: Agglomerative hierarchical structural clustering algorithm for networks. In ASONAM´ 09.

Index Terms

Computer Science

Information Sciences

Keywords

Web data source Data integration Clustering Performance