International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 52 - Number 19 |
Year of Publication: 2012 |
Authors: Vidushi Singhal, Sachin Sharma |
10.5120/8309-1827 |
Vidushi Singhal, Sachin Sharma . Crawling the Web Surface Databases. International Journal of Computer Applications. 52, 19 ( August 2012), 15-22. DOI=10.5120/8309-1827
The World Wide Web is growing at a rapid rate. A web crawler is a computer program which independently browses the World Wide Web. The size of web as on February 2007 was 29 billion pages. One of the most important uses of web page is in indexing purpose and keeping web pages up to date which can be used by search engine to serve the end user queries. Web is dynamic in nature; hence we need to update the web pages constantly. In this paper, we put forward a technique to update a page stored in web repository. This paper put forward an efficient method to refresh a page. We are proposing two methods for refreshing the page by comparing the page structure. First method compares the page structure with the help of tags used in it. And second method creates a document tree compare structures of pages.