CFP last date
20 December 2024
Reseach Article

Deep Web Crawler: Exploring and Re-ranking of Web Forms

by Rashmi K. B., Vijaya Kumar T., H. S. Guruprasad
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 150 - Number 1
Year of Publication: 2016
Authors: Rashmi K. B., Vijaya Kumar T., H. S. Guruprasad
10.5120/ijca2016911448

Rashmi K. B., Vijaya Kumar T., H. S. Guruprasad . Deep Web Crawler: Exploring and Re-ranking of Web Forms. International Journal of Computer Applications. 150, 1 ( Sep 2016), 32-35. DOI=10.5120/ijca2016911448

@article{ 10.5120/ijca2016911448,
author = { Rashmi K. B., Vijaya Kumar T., H. S. Guruprasad },
title = { Deep Web Crawler: Exploring and Re-ranking of Web Forms },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2016 },
volume = { 150 },
number = { 1 },
month = { Sep },
year = { 2016 },
issn = { 0975-8887 },
pages = { 32-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume150/number1/26059-2016911448/ },
doi = { 10.5120/ijca2016911448 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:54:45.663364+05:30
%A Rashmi K. B.
%A Vijaya Kumar T.
%A H. S. Guruprasad
%T Deep Web Crawler: Exploring and Re-ranking of Web Forms
%J International Journal of Computer Applications
%@ 0975-8887
%V 150
%N 1
%P 32-35
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A huge portion of the web known as deep web is accessible via search interfaces to myriads of databases on the web. Deep web crawl is concerned with the problem of surfacing hidden content behind search interfaces on the web. Given the dynamic nature of the web, where data sources are constantly changing, it is crucial to discover these resources. The paper proposes a two level application namely deep web crawler for gathering relevant searchable forms. In the first level deep web crawler explores the forms based on reverse searching for a given seed site, ranking the sites to prioritize highly relevant sites and by extracting the links to find the forms. In the next level, it searches the forms based on preference and the result is enhanced by re ranking, given the user feedback.

References
  1. Peter Lyman and Hal R. Varian. How much information? 2003. Technical report, UC Berkeley, 2003.
  2. Roger E. Bohn and James E. Short. How much information? 2009 report on American consumers. Technical report, University of California, San Diego, 2009.
  3. Idc worldwide predictions 2014: Battles for dominance and survival on the 3rd platform. http://www.idc.com/ research/Predictions14/index.jsp, 2014.
  4. Michael K. Bergman. White paper: The deep web: Surfacing hidden value. Journal of electronic publishing, 7(1), 2001.
  5. Jayant Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy. Google’s deep web crawl. Proceedings of the VLDB Endowment, 1(2):1241–1252, 2008.
  6. Bergholz A. and Childlovskii B. Crawling for Domain-Specific Hidden Web Resources. In: Proc. of WISE 2003, pp. 125–133 (2003).
  7. Cope J., Craswell N. and Hawking D. Automated Discovery of Search Interfaces on the Web. In: Proc. of ADC 2003, pp. 181–189 (2003).
  8. Kevin Chen-Chuan Chang, Bin He, and Zhen Zhang. Toward large scale integration: Building a metaquerier over databases on the web. In CIDR, pages 44–55, 2005.
  9. Luciano Barbosa and Juliana Freire. Searching for hidden-web databases. In WebDB, pages 1–6, 2005.
  10. Luciano Barbosa and Juliana Freire. An adaptive crawler for locating hidden-web entry points. In Proceedings of the 16th international conference on World Wide Web, pages 441–450.ACM, 2007.
  11. Shestakov, D.: Characterization of National Deep Web. TUCS Technical Report 892(2008).
  12. Shestakov Denis. On building a search interface discovery system. In Proceedings of the 2nd international conference on Resource discovery, pages 81–93, Lyon France, 2010. Springer.
  13. Olston Christopher and Najork Marc. Web crawling. Foundations and Trends in Information Retrieval, 4(3):175–246, 2010.
  14. Denis Shestakov. Databases on the web: national web domain survey. In Proceedings of the 15th Symposium on International Database Engineering & Applications, pages 179–184. ACM, 2011.
  15. Yeye He, Dong Xin, Venkatesh Ganti, Sriram Rajaraman and Nirav shah. Crawling deep web entity pages. In proceedings of the sixth ACM international conference on web search and data mining, pages 355-364. ACM, 2013.
  16. Feng Zhao, Jingyu Zhou, Chang Nie, Heqing Huang and Hai Jin. SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces. IEEE Transactions on Services Computing, 2015.
Index Terms

Computer Science
Information Sciences

Keywords

Deep web adaptive learning ranking