We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

A Novel Web Crawler Algorithm on Query based Approach with Increases Efficiency

by S S Vishwakarma, A Jain, A K Sachan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 46 - Number 1
Year of Publication: 2012
Authors: S S Vishwakarma, A Jain, A K Sachan
10.5120/6874-8983

S S Vishwakarma, A Jain, A K Sachan . A Novel Web Crawler Algorithm on Query based Approach with Increases Efficiency. International Journal of Computer Applications. 46, 1 ( May 2012), 34-37. DOI=10.5120/6874-8983

@article{ 10.5120/6874-8983,
author = { S S Vishwakarma, A Jain, A K Sachan },
title = { A Novel Web Crawler Algorithm on Query based Approach with Increases Efficiency },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 46 },
number = { 1 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 34-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume46/number1/6874-8983/ },
doi = { 10.5120/6874-8983 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:38:39.600475+05:30
%A S S Vishwakarma
%A A Jain
%A A K Sachan
%T A Novel Web Crawler Algorithm on Query based Approach with Increases Efficiency
%J International Journal of Computer Applications
%@ 0975-8887
%V 46
%N 1
%P 34-37
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The Web crawler is a computer program that downloads data or information from World Wide Web for search engine. Web information is changed or updated rapidly without any information or notice. Web crawler searches the web for updated or new information. Approximate 40 % of web traffic is by web crawler. In this paper a web or network traffic solution has been proposed. The method of web crawling with filter is used. This approach is query based approach. The proposed approach solves the problem of revisiting web pages by crawler.

References
  1. Yuan X, H Macgregor and J. Harms, "An efficient scheme to remove crawler traffic from the internet. " Proceedings of the 11th International Conference on Computer Communications and Networks, Oct 2002. 14-16, IEEE CS Press, (pp: 90-95).
  2. Sun. Y, Council G. Isaac and Giles C. Lee, "The Ethicality of Web Crawlers", in the proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Toronto Canada august 2010. (pp: 668-675)
  3. Alpert, Jesse; Hajaj, Nissan (July 25, 2008). "We knew the web was big. . . " The Official Google Blog.
  4. "Domain Counts & Internet Statistics". Name Intelligence. Retrieved May 17, 2009.
  5. Alexandros Ntoulas, Junghoo Cho and Christopher Olston, "What's new on the web ? the evolution of the web from a search engine perspective" WWW2004, may 17-22, 2004, New York, USA, ACM 1-58113-844-X/04/0005.
  6. Etyan Adar, Jaime Teevan, Susan T Durnais and Jonathan L Elsas, "The web changes everything: Understanding the dynamics of web content" WSDM 09, February 9-12-2009, Barcelona, Spam, ACM 978-1-60558-390-7.
  7. Cambazoglu, B. B. ; Junqueira, F. ; Plachouras, V. ; Telloli, L. , "On the feasibility of geographically distributed web crawling. " (ISBN: 978-963-9799-28-8) In the proceedings of Third International ICST Conference on Scalable Information Systems, ICST, Vico Equense, Italy (2008).
  8. Bal. S and Nath. R,"Filtering the web pages that are not modified at remote site without downloading using mobile crawler". Information Technology journal 9(2)2010 ISSN 1812-5638, Asian Network for Sciencetific information. (pp: 376-380)
  9. Pahal N, Kumar S, Bhardwaj A and Chauhan N," Security Mobile Agent Based Crawler = (SMABC)"= International Journal of Computer Applications 1(14), February 2010. (pp: 5–11)
  10. Thelwall. M and Stuart. D, "Web crawling ethics revisited: Cost, privacy and denial of service". Journal of the American Society for Information Science and Technology. 2006. Volume 57, Issue 13 November 2006. (pp: 1771 - 1779)
  11. Shekhar mishra, anurag jain and A K Sachhan, "A Query based Approach to Reduce the Web Crawler Traffic using HTTP Get Request a Dynamic Web Page". International Journal of Computer Applications (0975 – 8887) Volume 14– No. 3, January 2011.
  12. Shekhar mishra, anurag jain and A K Sachhan, "Smart approach to Reduce the Web Crawler Traffic of existing system using HTML based update file at web server". International Journal of Computer Applications 11(7), December 2010 (pp: 34-38).
  13. "Web Crawler", From Wikipedia, http://en. wikipedia. org/wiki/Web_crawler
  14. "World Wide Web", From Wikipedia, http://en. wikipedia. org/wiki/World_Wide_Web
  15. "Hyper Text Transfer Protocol", http://en. wikipedia. org/wiki/hypertext_Transfer_Protocol
Index Terms

Computer Science
Information Sciences

Keywords

Web Search Engine Web Crawler Web Crawling Traffic Http Get Request