National Workshop-Cum-Conference on Recent Trends in Mathematics and Computing 2011 |
Foundation of Computer Science USA |
RTMC - Number 5 |
May 2012 |
Authors: Rajni Mehta, Upasana |
c5e89816-be46-4548-8a71-c3feb70e4fd6 |
Rajni Mehta, Upasana . InfoHRDS: Information Domain Linked With Hypertext Resource Discovery System. National Workshop-Cum-Conference on Recent Trends in Mathematics and Computing 2011. RTMC, 5 (May 2012), 1-5.
The world wide web is a system of interlinked hypertext documents contained on the Internet, these web page may contain text, images, videos, and other multimedia and navigate between them by using hyperlink. About million new pages go online each day. It is impossible for major search engines to update their collections to meet such rapid growth. Web mining is the application of data mining techniques to extract knowledge from web data, including web documents, hyperlinks between documents. To address the above problems, domain-specific search engines were introduced, which keep their Web collections for one or several related domains. Focused crawlers were used by the domain-specific search engines to selectively retrieve Web pages relevant to particular domains to build special Web collections, which have smaller size and provide search results with high precision. In this paper, we are introducing a global focused crawling approach which is beneficial in extracting more relevant data.