An Approach for Identifying URLs Based on Division Score and Link Score in Focused Crawler

Debashis Hati; Amritesh Kumar

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

An Approach for Identifying URLs Based on Division Score and Link Score in Focused Crawler

by Debashis Hati, Amritesh Kumar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 2 - Number 3

Year of Publication: 2010

Authors: Debashis Hati, Amritesh Kumar

10.5120/643-899

Debashis Hati, Amritesh Kumar . An Approach for Identifying URLs Based on Division Score and Link Score in Focused Crawler. International Journal of Computer Applications. 2, 3 ( May 2010), 48-53. DOI=10.5120/643-899

@article{ 10.5120/643-899,

author = { Debashis Hati, Amritesh Kumar },

title = { An Approach for Identifying URLs Based on Division Score and Link Score in Focused Crawler },

journal = { International Journal of Computer Applications },

issue_date = { May 2010 },

volume = { 2 },

number = { 3 },

month = { May },

year = { 2010 },

issn = { 0975-8887 },

pages = { 48-53 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume2/number3/643-899/ },

doi = { 10.5120/643-899 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:49:54.505610+05:30

%A Debashis Hati

%A Amritesh Kumar

%T An Approach for Identifying URLs Based on Division Score and Link Score in Focused Crawler

%J International Journal of Computer Applications

%@ 0975-8887

%V 2

%N 3

%P 48-53

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The rapid growth of the World Wide Web (WWW) poses unprecedented scaling challenges for general-purpose crawlers. Crawlers are software which can traverse the internet and retrieve web pages by hyperlinks. The focused crawler of a special-purpose search engine aims to selectively seek out pages that are relevant to a pre-defined set of topics, rather than to exploit all regions of the Web. Focused crawler is developed to collect relevant web pages of interested topics from the Internet. Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size of the web. Focused crawlers aim to search only the subset of the web related to a specific topic, and offer a potential solution to the problem. In our proposed approach, we calculate the link score based on average relevancy score of parent pages (because we know that the parent page is always related to child page which means that for detailed information any author prefers the child page) and division score (means how many topic keywords belong to division in which particular link belongs). After finding out link score, we compare the link score with some threshold value. If link score is greater than or equal to threshold value, then it is relevant link. Otherwise, it is discarded. Focused crawler first fetches that link which has greater value compared to all link scores and threshold.

References

X. Zhang, T. Zhou, Z.Yu and D.Chen, “URL Rule Based Focused Crawlers”, IEEE International Conference on e-Business Engineering, 2008.
A. Pal, D. S. Tomar and S.C. Shrivastava. “Effective Focused Crawling Based on Content and Link Structure Analysis”, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, June 2009.
Y. Zhang, C. Yin and F. Yuan. “An Application of Improved PageRank in Focused Crawler”, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007IEEE).
Q. Cheng, W. Beizhan and W. Pianpian. “Efficient focused crawling strategy using combination of link structure and content similarity”, IEEE 2008.
X. Chain and X. Zhang. “HAWK: A Focused Crawler with Content and Link Analysis”, IEEE International Conference on e-Business Engineering, 2008.
S. Chakrabarti, M. van den Berg and B. Dom. “Focused crawling: a new approach to topic-specific Web resource discovery”, 8th International WWW Conference, May 1999.
M. Yuvarani, N. Ch. S. N. Iyengar and A. Kannan, “LSCrawler: A Framework for an Enhanced Focused Web Crawler based on Link Semantics” in Proceedings of the 2006 IEEE/WIC/ACM International Conference on WebIntelligence.
Novak, B., “A survey of focused web crawling algorithms”, in Proceedings of SIKDD 2004 at Multiconference IS. 2004, ACM Press: Slovenia. p. 55-58.
Sergey, B., Lawrence, Page. “The anatomy of a largescale hypertextual Web search engine”, Computer Networks and ISDN Systems 1998. 30(1-7): p. 107-117.
Davison, B.D. “Topical locality in the Web”, in Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval 2000: Athens, Greece. p. 272-279.
Altingovde, I.S., Ulusoy, O. “Exploiting interclass rules for focused crawling”, IEEE Intelligent Systems, 2004. 19(6): p. 66-73.

Index Terms

Computer Science

Information Sciences

Keywords

Crawler Focused crawler Division score Link score