CFP last date
20 December 2024
Reseach Article

Concept based Focused Crawling using Ontology

by S.Thenmalar, T. V. Geetha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 26 - Number 7
Year of Publication: 2011
Authors: S.Thenmalar, T. V. Geetha
10.5120/3115-4282

S.Thenmalar, T. V. Geetha . Concept based Focused Crawling using Ontology. International Journal of Computer Applications. 26, 7 ( July 2011), 29-32. DOI=10.5120/3115-4282

@article{ 10.5120/3115-4282,
author = { S.Thenmalar, T. V. Geetha },
title = { Concept based Focused Crawling using Ontology },
journal = { International Journal of Computer Applications },
issue_date = { July 2011 },
volume = { 26 },
number = { 7 },
month = { July },
year = { 2011 },
issn = { 0975-8887 },
pages = { 29-32 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume26/number7/3115-4282/ },
doi = { 10.5120/3115-4282 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:12:10.481056+05:30
%A S.Thenmalar
%A T. V. Geetha
%T Concept based Focused Crawling using Ontology
%J International Journal of Computer Applications
%@ 0975-8887
%V 26
%N 7
%P 29-32
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The constraint of a web crawler that downloads only relevant pages is still a major challenge in the field of information retrieval systems. Rather than visiting all the web pages, a focused crawler visits only the section of the web that contains relevant pages, and at the same time, tries to skip irrelevant sections. Existing ontology based web crawlers estimate the semantic content of the URL based on a domain dependent ontology, which in turn supports the methods used for prioritizing the URL queue. The crawler maintains a queue of URLs it has seen during the crawl at each level, and then selects from this queue, the next URL to visit based on the conceptual rank of the page at that level obtained from domain ontology. However in this work we represent the topic as an overall conceptual vector, obtained by combining concept vectors of individual pages associated with seed URLs. The conceptual rank is based on comparison between conceptual vectors at each depth, across depths and between the overall topics indicating seed concept vector.

References
  1. X.Zhang, T.Zhou, Z.Yu and D.Chen, 2008 URL Rule based Focused Crawlers, IEEE International Conference on e-Buisness Engineering, pp.147-154.
  2. A. Pal, D. S. Tomar and S.C. Shrivastava, 2009. Effective Focused Crawling Based on Content and Link Structure Analysis, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1.
  3. Y. Zhang, C. Yin and F. Yuan, 2007. An Application of Improved PageRank in Focused Crawler, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007IEEE), Volume 2, pp.331-335.
  4. Q. Cheng, W. Beizhan and W. Pianpian, 2008. Efficient focused crawling strategy using combination of link structure and content similarity, IEEE International Symposium on IT in Medicine and Education, pp. 1045-1048.
  5. Debajyoti Mukhopadhyay, Arup Biswas, Sukanta Sinha, 2007. A new approach to design domain specific ontology based crawler, 10th International Conference on Information Technology, pp. 289-291.
  6. Debashis Hati, Amritesh kumar, 2010. An approach for identifying URLs based on Division score and link score in focused crawler, International journal of computer applications, Volume 2 – No.3.
  7. Mohen Jamali, Hassan Sayyadi, Babak Bagheri, Hariri and Hassan Abolhassani, 2006. A method of focused crawling using combination of link structure and content similarity, Proceedings of the International Conference on Web Intelligence.
  8. S. Ganesh, M. Jayaraj, V. Kalyan and G.Aghila, 2004. Ontology –based Web Crawler, Proceedings of the International Conference on Information Technology: Coding and Computing, Volume 2.
  9. Hiep Phuc Luong, Susan Gauch, Qiang Wang, 2009. Ontology-based Focused Crawling, International Conference on Information, Process, and Knowledge Management, pp. 123-128.
  10. Marc Ehrig, Alexander Maedche, 2003. Ontology-Focused Crawling of Web Documents, Proceedings of the symposium on Applied computing.
  11. Debashis Hati, Amritesh Kumar, Lizashree Mishra, 2010. Unvisited URL Relevancy Calculation in Focused Crawling Based on Naïve Bayesian Classification, International Journal of Computer Applications, Volume 3- No.9.
Index Terms

Computer Science
Information Sciences

Keywords

Focused crawler Ontology Conceptual vector