International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 14 - Number 4 |
Year of Publication: 2011 |
Authors: Shruti Sharma, A.K.Sharma, J.P.Gupta |
10.5120/1846-2476 |
Shruti Sharma, A.K.Sharma, J.P.Gupta . A Novel Architecture of a Parallel Web Crawler. International Journal of Computer Applications. 14, 4 ( January 2011), 38-42. DOI=10.5120/1846-2476
Due to the explosion in the size of the WWW[1,4,5] it becomes essential to make the crawling process parallel. In this paper we present an architecture for a parallel crawler that consists of multiple crawling processes called as C-procs which can run on network of workstations. The proposed crawler is scalable, is resilient against system crashes and other event. The aim of this architecture is to efficiently and effectively crawl the current set of publically indexable web pages so that we can maximize the download rate while minimizing the overhead from parallelization