Structure based Data Extraction from Hidden Web Sources: A Review

Anuradha; A.K.Sharma

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Structure based Data Extraction from Hidden Web Sources: A Review

by Anuradha, A.K.Sharma

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 25 - Number 3

Year of Publication: 2011

Authors: Anuradha, A.K.Sharma

10.5120/3010-4060

Anuradha, A.K.Sharma . Structure based Data Extraction from Hidden Web Sources: A Review. International Journal of Computer Applications. 25, 3 ( July 2011), 32-37. DOI=10.5120/3010-4060

@article{ 10.5120/3010-4060,

author = { Anuradha, A.K.Sharma },

title = { Structure based Data Extraction from Hidden Web Sources: A Review },

journal = { International Journal of Computer Applications },

issue_date = { July 2011 },

volume = { 25 },

number = { 3 },

month = { July },

year = { 2011 },

issn = { 0975-8887 },

pages = { 32-37 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume25/number3/3010-4060/ },

doi = { 10.5120/3010-4060 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:10:49.594388+05:30

%A Anuradha

%A A.K.Sharma

%T Structure based Data Extraction from Hidden Web Sources: A Review

%J International Journal of Computer Applications

%@ 0975-8887

%V 25

%N 3

%P 32-37

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In order to extract data from the web pages of Hidden web sources, many semi-automatic and automatic techniques are proposed based on structure and tags of HTML documents. These techniques include machine learning and schema- matching approaches to solve the problem of data extraction. This paper discusses the research that has been done in the area of data extraction from Hidden Web sources. The goal of this paper is to discuss the advantages and disadvantages of currently existing techniques.

References

Chen Hong-ping; Fang Wei; Yang Zhou; Zhuo Lin; Cui Zhi-Ming; Automatic Data Records Extraction from List Page in Deep Web Sources; 978-0-7695-3699- 6/09 © 2009 IEEE pages 370-373.
Bing Liu, Robert Grossman, and Yanhong Zhai. Mining data records in web pages. In KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–606, New York, NY, USA, 2003.ACM Press.
YalinWang and Jianying Hu. A machine learning based approach for table detection on the web. In WWW ’02: Proceedings of the 11th international conference on World Wide Web, pages 242–250, New York, NY, USA, 2002. ACM Press.
Cai, D., Yu, S., Wen, J.-R., and Ma, W.-Y. 2003. VIPS: a Vision-based Page Segmentation Algorithm. Tech. Rep. MSR-TR-2003-79, Microsoft Technical Report.
Simon, K., Lausen, G., and Boley, H. 2006. From HTML documents to web tables and rules. In ICEC, M. S. Fox and B. Spencer, Eds. ACM International Conference Proceeding Series, vol. 156. ACM, 125–131.
Chang, K. C.-C., He, B., Li, C., Patel, M., and Zhang, Z. 2004. Structured databases on the web: observations and implications. SIGMOD Rec. 33, 3, 61–70.
Freitag, D. 1998. Information Extraction from HTML: Application of a General MachineLearning Approach. In AAAI/IAAI. 517–523.
B. Liu and Y. Zhai. NET: System for extracting Web data from °at and nested data records. In Proceedings of the Conference on Web Information Systems Engineering, pages 487-495, 2005.
S. Raghavan and H. Garcia-Molina. Crawling the Hidden Web. In Proceedings of VLDB, pages 129–138, 2001.
S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(5360):98–100, 1998.

Index Terms

Computer Science

Information Sciences

Keywords

Surface Web Hidden Web Information Extraction