International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 26 - Number 7 |
Year of Publication: 2011 |
Authors: S.Thenmalar, T. V. Geetha |
10.5120/3115-4282 |
S.Thenmalar, T. V. Geetha . Concept based Focused Crawling using Ontology. International Journal of Computer Applications. 26, 7 ( July 2011), 29-32. DOI=10.5120/3115-4282
The constraint of a web crawler that downloads only relevant pages is still a major challenge in the field of information retrieval systems. Rather than visiting all the web pages, a focused crawler visits only the section of the web that contains relevant pages, and at the same time, tries to skip irrelevant sections. Existing ontology based web crawlers estimate the semantic content of the URL based on a domain dependent ontology, which in turn supports the methods used for prioritizing the URL queue. The crawler maintains a queue of URLs it has seen during the crawl at each level, and then selects from this queue, the next URL to visit based on the conceptual rank of the page at that level obtained from domain ontology. However in this work we represent the topic as an overall conceptual vector, obtained by combining concept vectors of individual pages associated with seed URLs. The conceptual rank is based on comparison between conceptual vectors at each depth, across depths and between the overall topics indicating seed concept vector.