CFP last date
20 January 2025
Call for Paper
February Edition
IJCA solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 20 January 2025

Submit your paper
Know more
Reseach Article

Article:A Clickstream-based Focused Trend Parallel Web Crawler

by F. Ahmadi-Abkenari, Ali Selamat
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 9 - Number 5
Year of Publication: 2010
Authors: F. Ahmadi-Abkenari, Ali Selamat
10.5120/1385-1866

F. Ahmadi-Abkenari, Ali Selamat . Article:A Clickstream-based Focused Trend Parallel Web Crawler. International Journal of Computer Applications. 9, 5 ( November 2010), 1-8. DOI=10.5120/1385-1866

@article{ 10.5120/1385-1866,
author = { F. Ahmadi-Abkenari, Ali Selamat },
title = { Article:A Clickstream-based Focused Trend Parallel Web Crawler },
journal = { International Journal of Computer Applications },
issue_date = { November 2010 },
volume = { 9 },
number = { 5 },
month = { November },
year = { 2010 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume9/number5/1385-1866/ },
doi = { 10.5120/1385-1866 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:57:49.489796+05:30
%A F. Ahmadi-Abkenari
%A Ali Selamat
%T Article:A Clickstream-based Focused Trend Parallel Web Crawler
%J International Journal of Computer Applications
%@ 0975-8887
%V 9
%N 5
%P 1-8
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcomes in an appropriate timely manner. Regarding the fact that employing link dependent Web page importance metrics within a parallel crawler yields a considerable overhead on the overall searching system, and also because such a metric is not able to cover the authorized Web content in dark net and authorized fresh pages, therefore employing these metrics is not an absolute solution within search engines’ architecture. This paper proposes the application of a link independent Web page importance metric to govern the priority rule within the crawl frontier through proposing a modest weighted architecture for a focused structured parallel Web crawler (CFP crawler) in which the credit assignment to URLs in crawl frontier is done according to a clickstream-based prioritizing algorithm.

References
Index Terms

Computer Science
Information Sciences

Keywords

Clickstream analysis Focused crawlers Parallel crawlers Web data management Web page Importance metrics