Restructuring robots.txt for better Information Retrieval

Bhavin M. Jasani; C. K. Kumbharana

Call for Paper

December Edition

IJCA solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 20 November 2025

Submit your paper

Know more

The week's pick

Zero Trust Architecture Implementation in Enterprise Networks: Evaluating Effectiveness Against Cyber Threats

Stephen Kofi Dotse Samuel Yao Sebuabe Augustus Obeng Silas Asani Abudu Edna Awisie Pappoe

Random Articles

Reseach Article

Restructuring robots.txt for better Information Retrieval

by Bhavin M. Jasani, C. K. Kumbharana

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 120 - Number 9

Year of Publication: 2015

Authors: Bhavin M. Jasani, C. K. Kumbharana

10.5120/21258-4115

Bhavin M. Jasani, C. K. Kumbharana . Restructuring robots.txt for better Information Retrieval. International Journal of Computer Applications. 120, 9 ( June 2015), 35-40. DOI=10.5120/21258-4115

@article{ 10.5120/21258-4115,

author = { Bhavin M. Jasani, C. K. Kumbharana },

title = { Restructuring robots.txt for better Information Retrieval },

journal = { International Journal of Computer Applications },

issue_date = { June 2015 },

volume = { 120 },

number = { 9 },

month = { June },

year = { 2015 },

issn = { 0975-8887 },

pages = { 35-40 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume120/number9/21258-4115/ },

doi = { 10.5120/21258-4115 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:05:49.150704+05:30

%A Bhavin M. Jasani

%A C. K. Kumbharana

%T Restructuring robots.txt for better Information Retrieval

%J International Journal of Computer Applications

%@ 0975-8887

%V 120

%N 9

%P 35-40

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Now a days the users of the WWW are not only the human. There are other users or visitors like web crawlers and robots which are generated by the search engines or information retrievers. The direct visitors of your website are very less than those who reach to your website by using search engines or through other links. To collect information from your website search engines use crawlers or robots to access your website. There must be an access mechanism or protocol for such robots which restrict them to access unwanted content of the website. robots. txt is a partial mechanism for such facilities but not fully functional. This paper gives an enhancements to fully make use of the functionality of robots. txt file.

References

A Standard for Robot Exclusion: http://www. robotstxt. org/orig. html
Standard for the Format of ARPA Internet Text Messages: https://www. ietf. org/rfc/rfc0822. txt
The Web Robots Pages. http://www. robotstxt. org/
W3C http://www. w3. org/
Timestamp http://en. wikipedia. org/wiki/Timestamp
Backus–Naur Form http://en. wikipedia. org/wiki/Backus%E2%80%93Naur_form

Index Terms

Computer Science

Information Sciences

Keywords

Crawling agents robots spammer harvesters User Agent tag Directive Overriding Web Crawling Web Tree Web Spamming Crawling Querying.