Available Challenges and Guidelines in the Field of Deep Web and Intensive Crawling

Yasin Ezatdoost; Ali Tourani; Amir Seyed Danesh

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Available Challenges and Guidelines in the Field of Deep Web and Intensive Crawling

by Yasin Ezatdoost, Ali Tourani, Amir Seyed Danesh

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 77 - Number 1

Year of Publication: 2013

Authors: Yasin Ezatdoost, Ali Tourani, Amir Seyed Danesh

10.5120/13355-0948

Yasin Ezatdoost, Ali Tourani, Amir Seyed Danesh . Available Challenges and Guidelines in the Field of Deep Web and Intensive Crawling. International Journal of Computer Applications. 77, 1 ( September 2013), 1-5. DOI=10.5120/13355-0948

@article{ 10.5120/13355-0948,

author = { Yasin Ezatdoost, Ali Tourani, Amir Seyed Danesh },

title = { Available Challenges and Guidelines in the Field of Deep Web and Intensive Crawling },

journal = { International Journal of Computer Applications },

issue_date = { September 2013 },

volume = { 77 },

number = { 1 },

month = { September },

year = { 2013 },

issn = { 0975-8887 },

pages = { 1-5 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume77/number1/13355-0948/ },

doi = { 10.5120/13355-0948 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:49:05.686948+05:30

%A Yasin Ezatdoost

%A Ali Tourani

%A Amir Seyed Danesh

%T Available Challenges and Guidelines in the Field of Deep Web and Intensive Crawling

%J International Journal of Computer Applications

%@ 0975-8887

%V 77

%N 1

%P 1-5

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Today, there is a great deal of information available in Web world and the only way to access them is through search relationships. Web crawler is an automated script that independently browses the web. Web crawler starts its task with a "seed URL" and then traces links available in each page. This encountered many available crawlers with essential difficulties. Identification of search intermediate and selection of a proper inquiry, on one hand, and retrieving documentaries returned by the web as the result, on the other hand, are issues that intensify challenges available for web crawlers. The aim of the present paper is to investigate available challenges and guidelines in the field of deep web and intensive crawling.

References

See http://java. sun. com/products/servlet/ 2006 Java Servlet TM Technology
Gravano L. , Iperirotis P. G, Sahami M. 2003 QProber: A system for automatic classification Web databases. In Proceedings of the ACM Trans. Information System pp. 1-14
Change K. C. C. , He B. , Li C. , Patel M. , Zhang Z. 2004 Structured databases on the web: Observations and implications. SIGMOD Record
Chakrabarti S. , Berg M. V. D. , Dom B. 1999 Focused Crawling: a New Approach to Topic-Specific Web Resource Discovery. In 31th Computer Networks Conference, pp. 1623-1640
Chakrabarti S. , Berg M. V. D. , Dom B. 1997 Distributed Hypertext Resource Discovery through Example". In 25th International Conference on Very Large Data Base, USA
Cho J. , Garcia-Molina H. 2000 the Evolution of the Web and Implications for an Incremental Crawler. In 26th International Conference on Very Large Data Bases, USA, pp. 200-209
Cho J. , Garcia-Molina H. 2000 Synchronizing a Database to Improve Freshness. In ACM SIGMOD International Conference on Management of Data, USA, pp. 117-128
Cho J. , Garcia-Molina H. and Page L. 1998 Efficient Crawling through URL Ordering In 7th In World Wide Web Conference, Australia. pp. 161-172
Diligenti M. , Coetzee F. , Lawrence S. 2000 Focused Crawling Using Context Graphs. In 26th International Conference on Very Large Databases (VLDB), Cairo, Egypt, pp. 527-534
Alvarez M. , Pan A. , Raposo J. and Vina A. 2006 Crawling the client-side hidden web
Doorenbos R. B. , Etzioni O. , Weld D. S. 1997 A scalable comparison-shopping agent for the World-Wide Web. In First International Conference on Autonomouse Agent, pp. 39-48
Lage J. P. , da Silva A. , Golgher P. B. , Laender A. H. 2004 Automatic generation of agent for collecting hidden web pages for data extraction. Data Knowledge Eng. pp. 177-196
Zhang Z. , He B. , Chang K. 2004 Understanding Web query interfaces: best- effort parsing with hidden syntax. In Proceeding of the 2004 ACM SIGMOD international Conference on Management of Data, Paris, France
Article on New York Times 2006 Old Search Engine, the Library Tries to Fit Into a Google World. See http://www. nytimes. com/2004/06/21/technology/21LIBR. html
Najork M. , Wiener J. 2011 Breadth-First Search Crawling Yields High-Quality Pages. In 10th Conference on Word Wide Web, Hong-Kong. pp. 114- 118
Broder A. , Carnel D. 2005 Sampling search-engine results. In 14th international Conference on world Wide Web, Chiba, Japan
Qin J. , Chen H. 2005 Using Genetic Algorithm in Building Domain-Specific Collections: An Experiment in the Nanotechnology Domain. In 38th Annual Hawaii International Conference on System Sciences, USA
Rennie J. , McCallum A. 1999 Using Reinforcement Learning to Spider the Web Efficiently. In 16th International Conference on Machine Learning, USA, pp. 335-343
Rungsawang A. , Angkawattanawit N. 2005 Learnable Topic-Specific WebCrawler. Journal of Network and Computer Applications, UK, pp. 97-114
Koster M. 1993 Guidelines for robot writers, http://www. robotstxt. org/guidelines. html,
Shkapenyuk V. , Suel T. 2001 Design and Implementation of a High-Performance Distributed Web Crawler. In 18th International Conference on Data Engineering, USA, pp. 357- 368
Younes H. , Chabane D. 2004 High Performance Crawling System. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, New York, USA, pp. 299-306
Gulli A. , Signorini A. 2005 The Index able Web is More than 11. 5 billion pages. In 14th International World Wide Web Conference, Chiba, Japan
Gravano L. , Ipeirotis P. G. , Sahami M. 2002 Query- vs. Crawling-based Classification of Searchable Web Databases. IEEE Data Engineering Bulletin
Gravano L. , Garcia-Molina H. , Tomasic A. 1999 GIOSS: Text source discovery over the Internet. ACM TODS
Ipeirotis P. G. , Gravano L. , Sahami M. 2001 Probe, count, and classify: categorizing hidden web databases. In Proceeding of 2001 ACM SIGMOD, international Conference on Management of Data, Santa Barbara, California, U. S.
Ipeirotis P. G. , Gravano L. 2002 Distributed Search over the Hidden web: Hierarchical Database Sampling and Selection. In 28th VLDB Conference, Hong Kong, China
Barbosa L. , Freire J. 2004 Siphoning Hidden-Web Data through Keyword-Base Interfaces. In SBBD
Castillo C. 2004 Effective Web Crawling. In ACM SIGIR. Vo. 39, Issue 1
Kumar Sharma D. 2011 A Novel Architecture for Deep Web Crawler. International Journal of Information Technology and Web Engineering

Index Terms

Computer Science

Information Sciences

Keywords

Intensive crawler search engine genetic algorithm deep web