Focused Crawler based on Efficient Page Rank Algorithm

Anand Ratna; Divya; Akshay Sawhney

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Focused Crawler based on Efficient Page Rank Algorithm

by Anand Ratna, Divya, Akshay Sawhney

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 116 - Number 7

Year of Publication: 2015

Authors: Anand Ratna, Divya, Akshay Sawhney

10.5120/20351-2540

Anand Ratna, Divya, Akshay Sawhney . Focused Crawler based on Efficient Page Rank Algorithm. International Journal of Computer Applications. 116, 7 ( April 2015), 37-40. DOI=10.5120/20351-2540

@article{ 10.5120/20351-2540,

author = { Anand Ratna, Divya, Akshay Sawhney },

title = { Focused Crawler based on Efficient Page Rank Algorithm },

journal = { International Journal of Computer Applications },

issue_date = { April 2015 },

volume = { 116 },

number = { 7 },

month = { April },

year = { 2015 },

issn = { 0975-8887 },

pages = { 37-40 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume116/number7/20351-2540/ },

doi = { 10.5120/20351-2540 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:56:29.634683+05:30

%A Anand Ratna

%A Divya

%A Akshay Sawhney

%T Focused Crawler based on Efficient Page Rank Algorithm

%J International Journal of Computer Applications

%@ 0975-8887

%V 116

%N 7

%P 37-40

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The size of the WWW is increasing rapidly and its nature is dynamic, building an efficient search mechanism is very necessary. A vast number of pages continually being added every day, so fetching information about a special-topic is gaining importance, which poses exceptional scaling challenges for general-purpose crawlers and search engines. This paper describes a web crawling approach based on best first search. Instead of collecting and indexing all available web documents to be able to answer all possible queries, a focused crawler choose the links that are likely to be most relevant for the crawl, and avoids irrelevant links of the document. This leads to significant savings in hardware as well as network resources and also helps keep the crawl more up-to-date. To accomplish such goal-directed crawling, select top most K relevant documents for a given query and then expand the most promising link chosen according to link score, to circumvent irrelevant regions of the web.

References

Bing Liu, "Web Content Mining" the 14th international world wide web conference
De Bra, P. , Houben, G. , Kornatzky, Y. , Post, R. ``Information retrieval in distributed hypertexts''. Proc. 4th RIAO Conference, 1994.
S. Chakrabarti, M. van der Berg, and B. Dom, "Focused crawling: a new approach to topic-specific web resource discovery," in Proc. of the 8th International World-Wide Web Conference (WWW8), 1999.
J. Cho, H. Garcia-Molina, and L. Page, "Efficient crawling through URL ordering," in Proceedings of the Seventh World-Wide Web Conference, 1998
SunitaRawat, D. R. Patil Department of Computer Science and Engineering, 2013 3rd IEEE International Advance Computing Conference (IACC).
A. McCallum, K. Nigam, J. Rennie, and K. Seymore, "Building domainspecic search engines with machine learning techniques," in Proc. AAAI Spring Symposium on Intelligent Agents in Cyberspace, 1999.
A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, "Automating the construction of internet por- tals with machine learning," To appear in Information Retrieval.
M. Gori, M. Maggini, and F. Scarselli, "http://nautilus. dii. unisi. it. "
Menczer F. , Pant G. and Srivasan, P. "Topical Web Crawler: Evaluating Adaptive Algorithms" ACM Transaction on internet Technology (TOIT). Nov. 2014.
S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg, "Automatic resource compilation by analyzing hyperlink structure and associated text," in Proc. 7th World Wide Web Conference, Brisbane, Australia, 1998
K. Bharat and M. Henzinger, "Improved algorithms for topic distillation in hyperlinked environments," in Proceedings 21st Int'l ACM SIGIR Conference. , 1998.
McCown, F. and Nelson, M. "Agreeing to Disagree: Search Engines and their Public Interfaces". ACM IEEE Joint Conference on Digital Libraries (JCDL 2007). Vancouver, British Columbia, Canada. pp. 309318. June 17-23, 2007.
Bao, S. , Li, R. , Yu, Y. and Cao, Y. "Competitor Mining with the Web Knowledge". IEEE Transactions on Data Engineering, Volume: 20, Issue: 10, pp. 1297-1310, Oct. 2008.
J. Kleinberg, "Authoritative sources in a hyperlinked environment. " Report RJ 10076, IBM, May 1997.
Zhang, T. Zhou, Z. Yu and D. Chen, "URL rule based focusedcrawlers", IEEE International Conference on e-Business Engineering, 2008.
TfIdf weighting from http://nlp. stanford. edu/IRbook/html/htmledition/tf-idf-weighting-1. html
Page Rank form Wikipedia, the free encyclopedia http://en. wikipedia. org/wiki/PageRank/

Index Terms

Computer Science

Information Sciences

Keywords

Focused web crawler TF-IDF Relevancy calculation Page Rank.