We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

A Forecasting Model for the Pages Crawled by Search Engine Crawlers at a Web Site

by Jeeva Jose, P. Sojan Lal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 68 - Number 13
Year of Publication: 2013
Authors: Jeeva Jose, P. Sojan Lal
10.5120/11639-7122

Jeeva Jose, P. Sojan Lal . A Forecasting Model for the Pages Crawled by Search Engine Crawlers at a Web Site. International Journal of Computer Applications. 68, 13 ( April 2013), 19-24. DOI=10.5120/11639-7122

@article{ 10.5120/11639-7122,
author = { Jeeva Jose, P. Sojan Lal },
title = { A Forecasting Model for the Pages Crawled by Search Engine Crawlers at a Web Site },
journal = { International Journal of Computer Applications },
issue_date = { April 2013 },
volume = { 68 },
number = { 13 },
month = { April },
year = { 2013 },
issn = { 0975-8887 },
pages = { 19-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume68/number13/11639-7122/ },
doi = { 10.5120/11639-7122 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:27:44.093204+05:30
%A Jeeva Jose
%A P. Sojan Lal
%T A Forecasting Model for the Pages Crawled by Search Engine Crawlers at a Web Site
%J International Journal of Computer Applications
%@ 0975-8887
%V 68
%N 13
%P 19-24
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

World Wide Web is exploding in terms of the number of web sites and users. Without search engines the web sites will not be visible to the users. Different search engine crawlers behave in different ways while they access a web site. The number of visits and pages crawled by search engines could be helpful in identifying their behavior and also the server load. A forecasting model in time series has been proposed for predicting the number of pages crawled by search engines. This model was compared with the actual values and it was found feasible.

References
  1. C. Lee Giles, Yang Sun and Issac G. Council, "Measuring the Web Crawler Ethics," WWW2010, ACM, 2010, pp. 1101-1102.
  2. Brin . S and Page. L, The Anatomy of a Large Scale Hypertextual Web Search Engine, In Proceedings of the 7th International WWW Conference, Elsevier Science, New York, 1998.
  3. Yang Sun,Ziming Zhuang and C. Lee Giles," A Large- Scale Study of Robots. txt", WWW2007, ACM, 2007, pp. 1123–1124.
  4. Animesh Tripathy, Prashanta K Patra, "A Web Mining Architectural Model of Distributed Crawler for Internet Searches Using PageRank Algorithm", Proceedings of the Asia-Pacific Services Computing Conference, IEEE,2008.
  5. Bhagwani J. and K. Hande, "Context Disambiguation in Web Search Results Using Clustering Algorithm", International Journal of Computer Science and Communication, vol. 2, pp. 119-123.
  6. Schwenke F. and Weideman M, "The Influence that JavaScript has on the visibility of a web site to search engines – a pilot study", Informatics & Design Papers and Reports, vol 11, pp. 1-10.
  7. Vaughan L. and Thelwal M. , "Search Engine Coverage Bias: Evidence and Possible causes", Information Processing and Management, vol 40, pp. 693-707.
  8. Sullivan D. , "Webspin: Newsletter " http://contentmarketingpedia. com/Marketing-Library/Search/industryNewsSeptA1. pdf
  9. Linda T. and Saul Greenberg,"Revisitation Patterns in World Wide Web Navigation", CHI, ACM, 1997, pp. 22-27.
  10. A. H. M. Wahab,H. N. M. Mohd,F. H. Hanaf & M. F. M. Mohsin," Data Pre-processing on Web Server Logs for Generalized Association Rules Mining Algorithm",World Academy of Science, Engineering and Technology,2008, pp. 190-197.
  11. M. Spiliopoulou, "Web Usage Mining for Web Site Evaluation", Communications of the ACM, 2000. Vol. . 43(8), pp. 127-134.
  12. D. Mican & D. Sitar-Taut," Preprocessing and Content/ Navigational Pages Identification as Premises for an Extended Web Usage Mining Model Development", Informatica Economica, 2009,vol. 13(4),pp. 168-179.
  13. Kothari C. R, Research Methodology Methods & Techniques, New Age International Publishers, Revised Second Edition, 2007.
  14. Pannerselvam R, Research Methodology, Prentice Hall of India Private Limited, 2005.
  15. Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, Elsevier, Third Edition,2012.
Index Terms

Computer Science
Information Sciences

Keywords

Web sites Web logs Search engines Crawlers