We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Improving Focused Crawling With Genetic Algorithms

by Chain Singh, Ashish Kr. Luhach, Amitesh Kumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 66 - Number 4
Year of Publication: 2013
Authors: Chain Singh, Ashish Kr. Luhach, Amitesh Kumar
10.5120/11075-5996

Chain Singh, Ashish Kr. Luhach, Amitesh Kumar . Improving Focused Crawling With Genetic Algorithms. International Journal of Computer Applications. 66, 4 ( March 2013), 40-43. DOI=10.5120/11075-5996

@article{ 10.5120/11075-5996,
author = { Chain Singh, Ashish Kr. Luhach, Amitesh Kumar },
title = { Improving Focused Crawling With Genetic Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { March 2013 },
volume = { 66 },
number = { 4 },
month = { March },
year = { 2013 },
issn = { 0975-8887 },
pages = { 40-43 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume66/number4/11075-5996/ },
doi = { 10.5120/11075-5996 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:21:29.539108+05:30
%A Chain Singh
%A Ashish Kr. Luhach
%A Amitesh Kumar
%T Improving Focused Crawling With Genetic Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 66
%N 4
%P 40-43
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The Web, containing a large amount of useful information and resources, is expanding rapidly. Web crawlers are one of the most crucial components in search engines and their optimization would have a great effect on improving the searching efficiency. Focused Crawlers can selectively retrieve Web documents relevant to a specific domain to build collections for domain-specific search engines. In this paper, we use a genetic algorithm with focused crawling for improving its crawling performance. Expands initial keywords by using a genetic algorithm for focused crawling. The results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques.

References
  1. Michelangelo Diligenti, Frans Coetzee, Steve Lawrence, C. Lee Giles, Marco Gori, " Focused Crawling using Context Graphs," Proceedings of the 26th VLDB Conference, Cairo, p. 527–534, 2000.
  2. S. Chakrabarti, M. van der Berg, and B. Dom, "Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery," in Proc. 8th International World-Wide Web Conference, p. 545-562, 1999.
  3. J. Cho, H. Garcia-Molina, and L. Page, "Efficient Crawling Through URL Ordering," In Proceedings of the Seventh International World Wide Web Conference. Volume 30, April, P. 161-172, 1998.
  4. M. Jamali, H. Sayyadi, B. Bagheri H. and H. Abolhassani, "A Method for Focused Crawling Using Combination of Link Structure and Content Similarity" In Proceedings of the International Conference on Web Intelligence table of contents p. 753-756, 2006.
  5. Z. Gao, Y. Du, L. Yi, Q. Peng, Y. Yang ," Incrementally Updating Concept Context Graph (CCG) for Focused Web Crawling Based on FCA" In proc. Asia-Pacific Conference on Information Processing, vol. 2, p. 40-43, 2009.
  6. Ahmed Ghozia , Hoda Sorour and Ashraf Aboshosha," Improved Focused Crawling Using Bayesian Object Based Approach," In proceeding a Radio Science Conference, p. 1 – 8, 2008.
  7. Milad shokouhi, Pirooz Chubak, Zaynab Raeesy," Enhancing Focused Crawling with Genetic Algorithms," Information Technology: Coding and Computing, Volume 2, Issue, 4-6 April P. 503 – 508, 2005.
  8. Knut magne risvik and Rolf michelsen, "Search Engines and Web Dynamics," in proceeding of computer networks volume 39, Issue 3, 21 June, P. 289-302, 2002.
  9. Chakrabart S. , van den Berg, M. Dom, "Distributed Hypertext Resource Discovery through Examples" In Proceedings of the 25th International Conference on Very Large Data Bases. P. 375 – 386, 99.
  10. MPS Bhatia, Akshi Kumar Khalid, "A Primer on the Web Information Retrieval Paradigm" Journal of Theoretical and Applied Information Technology, p. 657-662.
  11. Gautam Pant, Padmini Srinivasan1, and Filippo Menczer, "Crawling the Web" in procd Web Dynamics pp. 153-178, 2004.
  12. Anshika Pal, Deepak Singh Tomar, S. C. Shrivastava, "Effective Focused Crawling Based On Content And Link Structure Analysis" International Journal of Computer Science and Information Security, Vol. 2, No. 1, June 2009.
  13. Qu Cheng, Wang Beizhan, Wei Pianpian, "Efficient Focused Crawling Strategy Using Combination of Link Structure and Content Similarity" Proceedings of IEEE International Symposium on IT in Medicine and Education. vol. 2, July, p. 797 – 802, 2003.
  14. Bing Liu, Chee Wee Chin, Hwee Tou Ng. "Mining Topic-Specific Concepts and Definitions on the Web" in proceeding WWW, May 20-24, Hungary, 2003 .
  15. T. Peng, W. L. Zuo and Y. L. Liu "Genetic Algorithm For Evaluation Metrics In Topical Web Crawling" Computational Methods Springer in the Netherlands, pp- 1203–1208, 2006.
  16. J. J. Gregory Caporaso William A. Baumgartner, Jr. Hyunmin Kim, Zhiyong Lu Helen L. Johnson Olga Medvedeva Anna Lindemann, Lynne M. Fox Elizabeth K. White K. Bretonnel Cohen Lawrence Hunter, "Concept Recognition, Information Retrieval, and Machine Learning in Genomics Question-Answering" in proc. TREC Proceedings (723), November, 2006.
  17. Soumen Chakrabarti, Kunal Punera, Mallela Subramanyam "Accelerated Focused Crawling through Online Relevance Feedback" WWW2002, May 7-11, Honolulu, Hawaii, USA 2002.
  18. Blaž Novak "A Survey Of Focused Web Crawling Algorithms" Publication Year, multiconference is 2004, 12-15 Oct 2004, Ljubljana, Slovenia.
  19. Yuxin Chen, Edward A. Fox et. al "A Novel Hybrid Focused Crawling Algorithm to Build Domain-Specific Collections" Virginia Polytechnic Institute & State University Blacksburg, VA, USA pp- 85, 2007
  20. Ahmed A. A. Radwan, Bahgat A. Abdel Latef, Abdel Mgeid A. Ali, and Osman A. Sadek "Using Genetic Algorithm to Improve Information Retrieval Systems" World Academy of Science, Engineering and Technology 17 2006 ISSN 2070-3724.
  21. Jialun Qin & Hsinchun Chen "Using Genetic Algorithm in Building Domain-Specific Collections An Experiment in the Nanotechnology Domain" Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05), Volume 04 IEEE Computer Society Washington, 2005.
  22. Hsinchun Chen, Yi-Ming Chung, and Marshall Ramsey "A Smart Itsy Bitsy Spider for the Web" Journal of the American Society for Information Science pp 604–618, 1998.
  23. Alessandro Micarelli, Fabio Gasparetti "Adaptive focused crawling" Lecture Notes in Computer Science the adaptive web methods and strategies of web personalization section Adaptation technologies pp 231-262, 2007.
  24. Bangorn klabbankoh, Ouen pinngern ph. d. "applied genetic algorithms in information retrieval" In proc. IEEE, vol-92,pp-702-711, issue-4, nov 2004.
  25. N. Angkawattanawit and A. Rungsawang, "Learnable Crawling: An Efficient Approach to Topic-Specific web Resource Discovery", 2nd international Symposium on communications and Information Technology (ISCIT 2002), October 2002.
  26. V. Raghavan and B. Aggarwal, "Optimal Determination of User-Oriented Clusters: An Application for the Reproductive Plan," in the Proceedings of the Second International Conference on Genetic Algorithms and Their Applications, Cambridge, pp. 241-246, 1987.
  27. M. Gordon, "Probabilistic and Genetic Algorithms for Document Retrieval," Communications of ACM (31:2), 1988, pp. 152-169.
  28. J. Yang, R. Korfhage, and E. Rasmussen, "Query Improvement in Information Retrieval Using Genetic Algorithms: A Report on the Experiments of the TREC Project," in Proceedings of the First Text Retrieval Conference, Washington, National Institute of Standards and Technology (NIST) Special Publication 500-207, March 1993, pp. 31-58,1993.
Index Terms

Computer Science
Information Sciences

Keywords

Crawling focused crawling Genetic Algorithm web crawler