International Conference on Emerging Trends in Computing and Communication |
Foundation of Computer Science USA |
ICETCC2017 - Number 3 |
June 2018 |
Authors: Rohini Navnathkhedkar, Madhuri Dalal |
48b0d5f2-8db8-4ccc-bd43-79d3c495e666 |
Rohini Navnathkhedkar, Madhuri Dalal . An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler. International Conference on Emerging Trends in Computing and Communication. ICETCC2017, 3 (June 2018), 18-22.
As deep web grows at a very fast pace, there has been increased interest in techniques that help efficiently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. We propose a two-stage framework, for harvesting deep web interfaces. In the first stage of harvesting, performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl ranks websites to prioritize highly relevant ones for a given topic. In the second stage, it achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking.