CFP last date
20 January 2025
Reseach Article

A Novel Approach to Priority based Focused Crawler

by Rishabh Dixit, Shiva Gupta, Rajkumar Singh Rathore, Shivesh Gupta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 116 - Number 19
Year of Publication: 2015
Authors: Rishabh Dixit, Shiva Gupta, Rajkumar Singh Rathore, Shivesh Gupta
10.5120/20445-2796

Rishabh Dixit, Shiva Gupta, Rajkumar Singh Rathore, Shivesh Gupta . A Novel Approach to Priority based Focused Crawler. International Journal of Computer Applications. 116, 19 ( April 2015), 22-25. DOI=10.5120/20445-2796

@article{ 10.5120/20445-2796,
author = { Rishabh Dixit, Shiva Gupta, Rajkumar Singh Rathore, Shivesh Gupta },
title = { A Novel Approach to Priority based Focused Crawler },
journal = { International Journal of Computer Applications },
issue_date = { April 2015 },
volume = { 116 },
number = { 19 },
month = { April },
year = { 2015 },
issn = { 0975-8887 },
pages = { 22-25 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume116/number19/20445-2796/ },
doi = { 10.5120/20445-2796 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:57:35.708863+05:30
%A Rishabh Dixit
%A Shiva Gupta
%A Rajkumar Singh Rathore
%A Shivesh Gupta
%T A Novel Approach to Priority based Focused Crawler
%J International Journal of Computer Applications
%@ 0975-8887
%V 116
%N 19
%P 22-25
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The web continues to grow at an exponential rate so fetching relevant information about a specific topic is gaining importance. Web crawlers are programs that traverse the web and fetch the web documents in an automated manner. Focused crawlers search for a specific keyword in a web page. Link based focused crawlers focus on the anchor links of the page and seeks out the most relevant links without actually downloading the web page itself. This paper is based on assigning priorities to different links so that the most relevant links are displayed to the user first. The insignificant links are avoided which leads to significant savings in the computational costs involved in query processing, network, as well as the hardware resources.

References
  1. S. Chakrabarti, M. van den Berg, B. Dom, 1999 on "Focused crawling: a new approach to topic-specific Web resource discovery". In "8th International WWWConference on WWW",Toronto,Canada ,pp. 1623.
  2. J. Cho, H. Garcia-Molina, and L. Page, "Efficient crawling through URL ordering," Seventh WWW Conference, 1998.
  3. C. Olston, M. Najorl, 2010 on "Web Crawling", Foundations and Trends in Information Retrieval, Vol. 4, (3), . pp – 175-246.
  4. B. Ganguly and D. Raich, 2014 on "Performance Optimization of Focused Web Crawling Using Content Block Segmentation" at International Conference on Electronic-Systems, Signal-Processing and ComputingTech.
  5. O. Heinonen, K. Hatonen, and K. Klemettinen, 1996 on "WWW robots andsearch engines. " Seminar on Mobile Code, Report TKO-C79, HUT, Department of CS.
  6. K. Bharat and M. Henzinger, 1998 on "Improved algorithms for topic distillation in hyperlinked environments,"at Twenty first Int'l ACM SIGIR Conference.
  7. J. Kleinberg, 1997 on "Authoritative sources in a hyperlinked environment. " Report RJ 10076, IBM.
  8. De Bra,P. and Post, R. , 1994 on "Information Retrieval in the World-Wide Web: Making Client-based searching feasible".
  9. M. Hersovici, A. Heydon, M. Mitzenmacher, D. pelleg, 1998 on "The Sharksearch Algorithm-An application: Tailored Website Mapping. " At World Wide Conference, held in Australia, 317-326.
  10. S. Ganesh, M. Jayaraj, V. Kalyan, S. Murthy and G. Aghila. ,2004 on "Ontologybased Web Crawler", IEEE Computer Society, Las Vegas – Nevada – USA, pp. 337-341.
  11. Jon M. Kleinberg, 1999 on "Authoritative Sources in a Hyperlinked Environment", Journal of the 9th ACM-SIAM Symposium on Discrete Algorithm, 46(5), 604-632.
  12. S. Bri, L. Page, 1998 on "The anatomy of large-scale hypertext Web search-engine",suggested at 7th World-Wide Web Conference, Australia, 107-117.
  13. J. Cho, H. Garcia-Molina, and L. Page, 1998 on "Efficient crawling through URL-ordering,"at Seventh World-Wide Web Conference.
  14. X. Chen and X. Zhang, 2008 on "HAWK: A Focused Crawler with Content and LinkAnalysis", presented at ICEBE, China.
  15. Zhang X. ,Zhou T. ,Yu Z. and Chen D. , 2008 on "URL Rule Based Focused-Crawlers",conference IEEE-ICEBE,China .
Index Terms

Computer Science
Information Sciences

Keywords

Visited URL Test Content Matching Test.