International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 11 - Number 7 |
Year of Publication: 2010 |
Authors: Shekhar Mishra, Anurag Jain, Dr. A.K. Sachan |
10.5120/1593-2140 |
Shekhar Mishra, Anurag Jain, Dr. A.K. Sachan . Article:Smart Approach to Reduce the Web Crawling Traffic of Existing System using HTML based Update File at Web Server. International Journal of Computer Applications. 11, 7 ( December 2010), 34-38. DOI=10.5120/1593-2140
Web crawler is used for downloading information from web. Web pages are changed without any notice. Web crawler frequently revisits websites to check updates. It is expected that 40% of present internet traffic is because of web crawling. In this paper we propose a file which maintains the list of updated URLs of web pages of web site. Format of file is based on HTML. Crawler will only visit the UPDATE File, and need not have to revisit the full website to know the updates. This scheme can easily implement on today’s system with little modification on web application and web crawler. In simulator we test proposed method; using a website of 13 pages for experiment. Experiment results shows that this scheme is very promising.