An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler

Published on June 2018 by Rohini Navnathkhedkar, Madhuri Dalal

International Conference on Emerging Trends in Computing and Communication

Foundation of Computer Science USA

ICETCC2017 - Number 3

June 2018

Authors: Rohini Navnathkhedkar, Madhuri Dalal

Rohini Navnathkhedkar, Madhuri Dalal . An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler. International Conference on Emerging Trends in Computing and Communication. ICETCC2017, 3 (June 2018), 18-22.

@article{

author = { Rohini Navnathkhedkar, Madhuri Dalal },

title = { An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler },

journal = { International Conference on Emerging Trends in Computing and Communication },

issue_date = { June 2018 },

volume = { ICETCC2017 },

number = { 3 },

month = { June },

year = { 2018 },

issn = 0975-8887,

pages = { 18-22 },

numpages = 5,

url = { /proceedings/icetcc2017/number3/29474-c129/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on Emerging Trends in Computing and Communication

%A Rohini Navnathkhedkar

%A Madhuri Dalal

%T An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler

%J International Conference on Emerging Trends in Computing and Communication

%@ 0975-8887

%V ICETCC2017

%N 3

%P 18-22

%D 2018

%I International Journal of Computer Applications

Abstract

As deep web grows at a very fast pace, there has been increased interest in techniques that help efficiently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. We propose a two-stage framework, for harvesting deep web interfaces. In the first stage of harvesting, performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl ranks websites to prioritize highly relevant ones for a given topic. In the second stage, it achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking.

References

Feng Zhao, Jingyu Zhou, Chang Nie, Heqing Huang, Hai Jin "SmartCrawler: A Two Stage Crawler for efficiently harvesting Deep-Web interfaces" IEEE Transactions on Services Computing Volume: 99 PP Year: 2015.
L. Barbosa and J. Freire, "An adaptive crawler for locating hidden web entry points," in Proc. 16th Int. Conf. World Wide Web, 2007, pp. 441–450.
. Olston and M. Najork , "Web Crawling", Foundations and Trends in Information Retrieval, vol. 4, No. 3 ,pp. 175–246, 20.
Y. He, D. Xin, V. Ganti, S. Rajaraman, and N. Shah, "Crawling deep web entity pages," in Proc. 6th ACM Int. Conf. Web Search Data Mining, 2013, pp. 355–364.
Barbosa and J. Freire, "Searching for hidden-web databases,"in Proc. 8th Int. Workshop Web Databases, 2005, pp. 1–6.
Rabia and Sami, Lalitha K. , "Understanding the Deep Web" (2010). Library Philosophy and Practice (e-journal). Paper 364. http://digitalcommons. unl. edu/libphilprac.

Index Terms

Computer Science

Information Sciences

Keywords

Deep Web Ranking Adaptive Learning Two-stage Crawler.