We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

A Frame Work for Topical Collections Make with Focused and Accelerated Focused Crawlers

by Saturi Rajesh, D.raju, P.ajay Kumar, P.srikanth
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 117 - Number 8
Year of Publication: 2015
Authors: Saturi Rajesh, D.raju, P.ajay Kumar, P.srikanth
10.5120/20573-2974

Saturi Rajesh, D.raju, P.ajay Kumar, P.srikanth . A Frame Work for Topical Collections Make with Focused and Accelerated Focused Crawlers. International Journal of Computer Applications. 117, 8 ( May 2015), 13-20. DOI=10.5120/20573-2974

@article{ 10.5120/20573-2974,
author = { Saturi Rajesh, D.raju, P.ajay Kumar, P.srikanth },
title = { A Frame Work for Topical Collections Make with Focused and Accelerated Focused Crawlers },
journal = { International Journal of Computer Applications },
issue_date = { May 2015 },
volume = { 117 },
number = { 8 },
month = { May },
year = { 2015 },
issn = { 0975-8887 },
pages = { 13-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume117/number8/20573-2974/ },
doi = { 10.5120/20573-2974 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:58:47.991986+05:30
%A Saturi Rajesh
%A D.raju
%A P.ajay Kumar
%A P.srikanth
%T A Frame Work for Topical Collections Make with Focused and Accelerated Focused Crawlers
%J International Journal of Computer Applications
%@ 0975-8887
%V 117
%N 8
%P 13-20
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In the personalized search domain, an alternative to general purpose crawler called focused crawlers are receiving increasing attention. The goal of these crawlers is to selectively seek out pages that are relevant to a pre-defined set of topics or theme. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, these crawlers analyzes their crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. This leads to significant savings in hardware and network resources, and helps keep the crawl more up-to-date. This paper presents and compares two focused crawlers called traditional focused crawler and accelerated focused crawler. Accelerated focused crawler takes offline lessons from traditional focused crawler. It emulates human surfer by trying to predict the relevance of a 'HREF' target page based on words around the link on the source page. The topics are specified using exemplary documents in these experiments. Naive Bayesian classifier is used to guide the crawlers. The crawlers were evaluated for different number of pages crawled, for different number of features gathered from different distances from the link and with different feature selection methods.

References
  1. Yohanes, Banu Wirawan, H. Handoko, and Hartanto Kusuma Wardana, "Focused Crawler Optimization Using Genetic Algorithm", TELKOMNIKA (Telecommunication Computing Electronics and Control) , pp. 403-410, 2013.
  2. Barbosa, Luciano, and Juliana Freire. , "An adaptive crawler for locating hidden-web entry points", International conference on World Wide Web, pp. 441-450, 2007.
  3. Li, Yanni, Yuping Wang, and Jintao Du. , "E-FFC: an enhanced form-focused crawler for domain-specific deep web databases", Journal of Intelligent Information Systems ,pp. 159-184,2013.
  4. Bergmark, Donna, Carl Lagoze, and Alex Sbityakov, "Focused crawls, tunneling, and digital libraries", Research and Advanced Technology for Digital Libraries, pp. 91-106, 2002.
  5. Fu, Tianjun, Ahmed Abbasi, and Hsinchun Chen. , "A focused crawler for Dark Web forums", Journal of the American Society for Information Science and Technology, pp. 1213-1231,2010.
  6. Bedi, Punam, Anjali Thukral, and Hema Banati. , "Focused crawling of tagged web resources using ontology", Computers & Electrical Engineering, pp. 613-628, 2013.
  7. Liu, Jin-Hong, and Yu-Liang Lu. , "Survey on topic-focused Web crawler", Application Research of Computers ,pp. 26-29,2007.
  8. Yakushev, Andrei V. , Alexander V. Boukhanovsky, and Peter MA Sloot. , "Topic crawler for social networks monitoring", Knowledge Engineering and the Semantic Web, pp. 214-227, 2013.
  9. Brin, S. , Page, L. , "The anatomy of a large-scale hypertextual Web search engine," In Computer Networks and ISDN Systems, pp. 107–117, 1998.
  10. Goyal, Deepali, and Mala Kalra. "A novel prediction method of relevancy for focused crawling in topic specific search", International Conference on Signal Propagation and Computer Technology (ICSPCT), pp. 257-262, 2014.
  11. Wang, Wenxian, Xingshu Chen, Yongbin Zou, Haizhou Wang, and Zongkun Dai. , "A focused crawler based on naive bayes classifier", International Symposium on Intelligent Information Technology and Security Informatics (IITSI), pp. 517-521, 2010.
  12. Chuang, Hsiu-Min, Chia-Hui Chang, and Ting-Yao Kao. , "Effective Web Crawling for Chinese Addresses and Associated Information", Springer International Publishing on E-Commerce and Web Technologies, pp. 13-25, 2014.
  13. Yang, Sheng-Yuan. , "OntoCrawler: A focused crawler with ontology-supported website models for information agents", Expert Systems with Applications , pp. 5381-5389,2010.
  14. Hati, Devashis, Biswajit Sahoo, and Amritesh Kumar, "Adaptive focused crawling based on link analysis", International Conference on Education Technology and Computer (ICETC), pp. V4-455, 2010.
  15. Achsan, Harry T. Yani, and Wahyu Catur Wibowo, "A Fast Distributed Focused-web Crawling", Procedia Engineering, pp. 492-499, 2014.
  16. Dey, Manas Kanti, Hasan Md Suhag Chowdhury, Debakar Shamanta, and Khandakar Entenam Unayes Ahmed. , "Focused web crawling: A framework for crawling of country based financial data", International Conference on Information and Financial Engineering (ICIFE), pp. 409-412, 2010.
  17. Najork, M. , Wiener, J. , "Breadth-first search crawling yields high-quality pages" , International conference on World Wide Web, pp. 114-118,2001.
  18. Aggarwal, C. , Al-Garawi, F. ,Yu, P. , "Intelligent Crawling on the World Wide Web with Arbitrary Predicates", Int. World Wide Web Conference, pp. 96-105, 2001.
  19. Bharat, K. , Henzinger, M. , "Improved algorithms for topic distillation in hyperlinked environments," In Proc. of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp-104-111, Melbourne, Australia, pp. 104-111, 1998.
  20. Brin, S. , Page, L. , "The anatomy of a large-scale hypertextual Web search engine," In Computer Networks and ISDN Systems, pp. 107–117, 1998.
  21. Arasu, A. , Cho, J. , Garcia-Molina, H. , Paepcke, A. , and Raghavan A. , "Searching the Web," ACM Transactions on Internet Technology, pp. 2-43, 2001.
  22. Chakrabarti, S. , Punera, K. , Subramanyam, M. , "Accelerated Focused Crawling through Online Relevance Feedback," International conference on World Wide Web Honolulu, pp. 148-159, 2002.
Index Terms

Computer Science
Information Sciences

Keywords

Focused Crawler World Wide Web and Accelerated focused crawlers.