Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review

Yogesh W. Wanjari; Dipali B. Gaikwad; Vivek D. Mohod; Sachin N. Deshmukh

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review

by Yogesh W. Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 88 - Number 18

Year of Publication: 2014

Authors: Yogesh W. Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh

10.5120/15454-3994

Yogesh W. Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh . Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review. International Journal of Computer Applications. 88, 18 ( February 2014), 23-28. DOI=10.5120/15454-3994

@article{ 10.5120/15454-3994,

author = { Yogesh W. Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh },

title = { Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review },

journal = { International Journal of Computer Applications },

issue_date = { February 2014 },

volume = { 88 },

number = { 18 },

month = { February },

year = { 2014 },

issn = { 0975-8887 },

pages = { 23-28 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume88/number18/15454-3994/ },

doi = { 10.5120/15454-3994 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:07:59.047008+05:30

%A Yogesh W. Wanjari

%A Dipali B. Gaikwad

%A Vivek D. Mohod

%A Sachin N. Deshmukh

%T Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review

%J International Journal of Computer Applications

%@ 0975-8887

%V 88

%N 18

%P 23-28

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Web contain huge amount of information on Web sites the user can retrieve this with help of the search input query to Web databases & fetch the relevant information. Perhaps Web databases return the multiple search output records dynamically on Web browser, these search record are containing the Deep Web pages in the form of HTML pages. It is time consuming &human efforts are involved. The traditional search engine does not index the hidden Web pages from Web databases, such as (Google, Yahoo etc. ). Many existing proposed techniques have addressed the problem of how to extract efficient structure data from Deep Web. The deep web refers to the hidden database used by web sites. But the information extraction & annotation is key challenge in web mining. The information retrieval should be done automatically & arrange in a systematic way for further processing. Various methodologies like wrapper induction is been induced. The labeling is done to the extracted information as per the concept. Various types of annotators are used on the basis of the data to be annotated. In this paper survey the automatic annotation approach on the basis of different feature of text node and data units.

References

Y. Lu, H. He, H. Zhao, W. Meng, C. Yu "Annotating Search Results from Web Databases", IEEE Knowledge and Data Engg". , vol. 25, March-2013.
J. Wang and F. H. Lochovsky, "Data Extraction and Label Assignment for Web Databases," Proc. 12th Int'l Conf. World Wide Web (WWW), 2003.
S. Mukherjee, I . V. Ramakrishnan and A. Singh, "Bootstrapping Semantic Annotation for Content-Rich HTML Documents", Proc. IEEE Int'l Conf. Data Eng. (ICDE)", 2005.
Davi de Casto Reis, Paulo B. Golgher and Altigran S. da Silva, "Automatic Web News Extraction Using Tree Edit Distance", Proc. ACM World Wide Web (WWW), 2004.
L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo, "Automatic Annotation of Data Extracted from Large Web Sites," Proc. Sixth Int'l Workshop the Web and Databases (WebDB), 2003.
Y. Lu, H. He, H. Zhao, W. Meng, and C. Yu, "Annotating Structured Data of the Deep Web," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007.
W. Liu, X Meng and W. Meng, "ViDE: A Vision-Based Approach for Deep Web Data Extraction," IEEE Trans. Knowledge and Data Engg. , vol. 22, no. 3, pp. 447-460, March 2010.
H. He, W. Meng, C. Yu and Z. Wu, "Automatic Integration of Web Interface with WISE-Intigrator," VLDB J. , vol. 13, no. 3 pp. 256-273, Sept 2004.
Chia-Hui Chang, Mohammed Kayed, Moheb Ramzy Girgis and Khaled Shaalan "A Survey of Web Information Extraction Systems" IEEE, TKDE-0475-1104. R3.
J. Madhavan, D. Ko, L. Lot, V. Ganapathy, A. Rasmussen, and A. Y. Halevy, "Google's Deep Web Crawl," Proc. VLDB Endowment, vol. 1, no. 2, pp.
V. Crescenzi, G. Mecca, and P. Merialdo, "RoadRunner: Towards Automatic Data Extraction from Large Web Sites," Proc. Int'l Conf. Very Large Data Bases(VLDB),pp. 109-118,2001.

Index Terms

Computer Science

Information Sciences

Keywords

Data Extraction Data annotation Annotators Text nodes Data Units and Wrapper