Web Crawlers for Searching Hidden Pages: A Survey

K. F. Bharati; P. Premchand; A. Govardhan

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Web Crawlers for Searching Hidden Pages: A Survey

by K. F. Bharati, P. Premchand, A. Govardhan

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 64 - Number 14

Year of Publication: 2013

Authors: K. F. Bharati, P. Premchand, A. Govardhan

10.5120/10706-5649

K. F. Bharati, P. Premchand, A. Govardhan . Web Crawlers for Searching Hidden Pages: A Survey. International Journal of Computer Applications. 64, 14 ( February 2013), 42-46. DOI=10.5120/10706-5649

@article{ 10.5120/10706-5649,

author = { K. F. Bharati, P. Premchand, A. Govardhan },

title = { Web Crawlers for Searching Hidden Pages: A Survey },

journal = { International Journal of Computer Applications },

issue_date = { February 2013 },

volume = { 64 },

number = { 14 },

month = { February },

year = { 2013 },

issn = { 0975-8887 },

pages = { 42-46 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume64/number14/10706-5649/ },

doi = { 10.5120/10706-5649 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:16:29.293426+05:30

%A K. F. Bharati

%A P. Premchand

%A A. Govardhan

%T Web Crawlers for Searching Hidden Pages: A Survey

%J International Journal of Computer Applications

%@ 0975-8887

%V 64

%N 14

%P 42-46

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Many researchers have addressed the need of a dynamic proven model of web crawler that will address the need of several dynamic commerce, research and ecommerce establishments over the web that majorly runs with the help of a search engine. The entire web architecture is changing from a traditional to a semantic. And on the other hand the web crawlers. The web crawler of today is vulnerable to omit several tons of pages without searching and also is incapable of capturing the hidden pages. There are several research problems of information retrieval, far from optimization such as supporting user to analyze the problem to determine information needs. The paper makes an analytical survey of several proven web crawlers capable of searching hidden pages. It also addresses the prospects and constraints of the methods and the ways to further enhance.

References

S. Lawrence, C. L. Giles, "Accessibility of Information on the Web," Nature, 400, 107-109, 1999.
Djoerd Hiemstra : Using language models for Information Retrieval . Univ. Twente 2001: I-VIII, 1-163.
Ruihua Song, Haifeng Liu, Ji- Rong Wen, Wei-Ying Machine: Learning Important Models for Web Page Blocks Based On Layout and Content Analysis. SIGKDD Explorations 6(2): 14-23 (2004).
Cai, D. , Yu, S. , Wen, J. -R. and Ma, W. -Y. , VIPS: A Vision Based Page Segmentation Algorithm, Microsoft Technical Report, MSR-TR-2003-79, (2003).
Chen, J. , Zhou, B. , Shi , J. , Zhang, H. -J. and Qiu, F Function - Based Object Model Towards Website Adaptation, in the proceedings Of the 10th World Wide Web conference (WWW10), Budapest, Hungary, May (2001).
Chia-Hui Chang, Mohammed Kayed, Moheb R. Girgis, Khaled F. Shaalan : A Survey of Web Information Extraction Systems IEEE Trans. Knowl. Data Eng. 18(10): 1411-1428 (2006).
Zehua Liu, Wee Keong Ng, Ee-Peng Lim: An Automated Algorithm for Extracting Website Skeleton. DASFAA 2004: 799-811.
EugeneAgichtein:Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull. 28(4): 3-10 (2005).
Sriram Raghavan, Hector Garcia Molina: Crawling the Hidden Web. VLDB 2001: 129-138.
Kovacevic, M. , Diligenti, M. , Gori, M. and Milutinovic, V. ,Recognition of Common Areas in a Web Page Using Visual Information: A Possible Application In A Page Classification,in the proceedings of 2002 IEEE International Conference on Data Mining (ICDM'02), Maebashi City, Japan, December,(2002).
Sankar K. Pal, Varun Talwar, Pabitra Mitra: Web Mining In Soft Computing Framework:relevance, state of the art and future directions. IEEE Transactions on Neural Networks 13(5): 1163-1177 (2002).
Fabrizio Lamberti, Andrea Sanna, Claudio Demartini: A Relation - Based Page RankAlgorithm for Semantic Web Search Engines. IEEE Trans. Knowl. Data Eng. 21(1): 23-136.
Vagelis Hristidis, Yuheng Hu, Panagiotis G. Ipeirotis Relevance - Based Retrieval on Hidden Web Text Databases Without Ranking Support. IEEE Trans. Knowl. Data Eng. 23(10): 1555- 1568 (2011).
Stephen W. Liddle, Sai Ho Yau, David W. Embley: On the Automatic Extraction of Data From the Hidden Web. ER (Workshops) 2001: 212-226.
K. Hammond, R. Burke, C. Martin, and S. Lytinen, "Faq-finder: A case based approach to knowledge Navigation," presented at theWorking Notes of AAAI Spring Symposium on Information Gathering From Heterogeneous Distributed Environments, Stanford, CA, (1995).
A. Y. Levy, T. Kirk, and Y. Sagiv, "The gll information manifold," presented at the AAAI Spring Symposium on Information Gathering From Heterogeneous Distributed Environments, (1995).
C. Kwok and D. Weld, "Planning to gather information," in Proc. 14th Nat. Conf. AI, (1996).
E. Spertus, "Parasite: Mining Structural Information on the web," presented at the Proc. 6th WWW Conf. , (1997).
O. Etzioni, D. S. Weld, and R. B. Doorenbos, "A Scalable Comparison Shopping Agent for The World Wide Web," Univ. Washington, Dept. Comput. Sci. , Seattle, Tech. Rep. TR 96- 01-03, (1996).
O. Etzioni and M. Perkowitz, "Category translation: Learning to Understand Information on the internet," in Proc. 15th Int. Joint Conf. Artificial Intell, Montreal, QC, Canada, (1995). pp. 930–936.
M. Craven, D. Freitag, A. McCallum, T. tchell, K. Nigam, S. Slattery , and D. DiPasquo,"Learning to extract Symbolic Knowledge from the World Wide Web," in Proc. 15th Nat. Conf. AI (AAAI98), 1998, pp. 509–516.
Anuradha, A. K. Sharma, "A Novel Approach for Automatic Detection and Unification of Web Search Query Interfaces using Domain Ontology" selected in International Journal of Information Technology and knowledge management(IJITKM), August (2009).
S. Raghavan and H. Garcia - Molina. Crawling The Hidden Web. In Proceedings of VLDB, pages 129–138, 2001.
Shetty, K. S. ; Bhat, S. ; Singh, S. ; , "Symbolic verification of web crawler functionality and its properties," Computer Communication and Informatics (ICCCI), 2012 International Conference on , vol. , no. , pp. 1-6, 10-12 Jan. (2012).
Weicheng Ma ; Xiuxia Chen; Wenqian Shang; "Advanced Deep Web Crawler Based on Dom," Computational Sciences and Optimization (CSO),2012 Fifth International JointConference on , vol. , no. , pp. 605-609, 23-26 June (2012).
Jeff Dean, Google Fellow," Google Challenges in Building Large-Scale Information Retrieval Systems" Research. google. com.
Subhendu kumar pani et. al. ," Integration of Web Mining and web Crawler : Relevance and State of Art," International Journal on Computer Science and Engineering, Vol. 02, No. 03, 2010, 772-776.
Carlos Castillo, Mauricio Marín, Andrea Rodríguez, Ricardo A. Baeza-Yates: Scheduling Algorithms for Web Crawling. 10-17.
Birrell, A. D. , Levin, R. , Needham, R. M. and Schroeder, M. D. Grapevine : an exercise In distributed computing. Communications Of the ACM, 25 (4) 260-274. (1992).
A. K. Sharma, J. P. Gupta, D. P. Agarwal, "Augmented Hypertext Documents Suitable For Parallel Crawlers", Communicated to 21st IASTED International Multi-conference_Applied Informatics AI-2003, Feb 10- 13,2003, Austria.
Dhiraj Khurana, Satish Kumar " Web Crawler: A Review", IJCSMS, Vol 12, Issue 01 Jan, 2012, ISSN (online): 2231-5268

Index Terms

Computer Science

Information Sciences

Keywords

Web crawler Hidden pages search search optimization