Priority based Semantic Web Crawler

Jaytrilok Choudhary; Devshri Roy

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Wirelessly Transmitting a Grayscale Image using Visible Light

November

2012

Development and Performance Evaluation of Mismatched Filter using Differential Evolution

May

2012

A Novel Prioritised Concealment and Flexible Macroblock Ordering Scheme for Video Transmission

Sep

2016

An Optimizing Technique based on Genetic Algorithm for Power Management in Heterogeneous Multi-Tier Web Clusters

April

2015

Reseach Article

Priority based Semantic Web Crawler

by Jaytrilok Choudhary, Devshri Roy

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 81 - Number 15

Year of Publication: 2013

Authors: Jaytrilok Choudhary, Devshri Roy

10.5120/14197-2372

Jaytrilok Choudhary, Devshri Roy . Priority based Semantic Web Crawler. International Journal of Computer Applications. 81, 15 ( November 2013), 10-13. DOI=10.5120/14197-2372

@article{ 10.5120/14197-2372,

author = { Jaytrilok Choudhary, Devshri Roy },

title = { Priority based Semantic Web Crawler },

journal = { International Journal of Computer Applications },

issue_date = { November 2013 },

volume = { 81 },

number = { 15 },

month = { November },

year = { 2013 },

issn = { 0975-8887 },

pages = { 10-13 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume81/number15/14197-2372/ },

doi = { 10.5120/14197-2372 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:56:07.301220+05:30

%A Jaytrilok Choudhary

%A Devshri Roy

%T Priority based Semantic Web Crawler

%J International Journal of Computer Applications

%@ 0975-8887

%V 81

%N 15

%P 10-13

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The Internet has billions of web pages and these web pages are attached to each other using URL(Uniform Resource Allocation). Web crawler is a main module of Search engine that gathers these documents from WWW. Most of the web pages present on Internet are active and changes periodically. Thus, Crawler is required to update these web pages to update database of search engine. In this paper, priority based semantic web crawling algorithm has been proposed. Ontology is used to get semantics of web page during crawling process. Algorithm starts with initial seed URL. The web page at given URL is downloaded from Internet and semantic score is calculated with given topic. The semantic score of unvisited URL is calculated using its Anchor text semantic similarity score, semantic similarity score of web page of unvisited URL with given topic and semantic score of its parent pages. Priority queue is used to store URL and its semantic score instead of simple queue. So, every time priority queue returns higher priority URL to crawl next. The overall performance gain over simple crawler is 88%, over focused crawling is 28% and priority based focused crawler is 6%.

References

Singhal, N. , Dixit, A. and Sharma, A. K. 2010. Design of a Priority Based Frequency Regulated Incremental Crawler. International Journal of Computer Applications, Volume 1, No. 1, PP. 42-47.
Tsoi, Ah C. , Forsali, D. , Gori, M. , Hagenbuchner, M. and Scarselli, F. 2003. A Simple Focused Crawler. WWW 2003: ACM.
Snasel, V. , Moravec, P. and Pokorný, J. 2005. WordNet Ontology Based Model for Web Retrieval. In Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration (WIRI'05).
Gruber, T. R. 1993. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5, Academic Press Ltd. , PP. 199–220.
Mizoguchi, R. , Vanwelkenhuysen, R. and Iked, M. 1995. Task ontology for reuse of problem solving knowledge. In Proceedings of Towards Very Large Knowledge Bases: Knowledge Building & Knowledge Sharing.
Ganesh, S. , Jayaraj, M. , Kalyan, V. , Murthy, S. and Aghila, G. 2004. Ontology-based Web Crawler. In proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04), IEEE.
Mukhopadhyay, D. , Biswas, A. and Sinha, S. 2010. A New Approach to Design Domain Specific Ontology Based Web Crawler. In proceedings of 10th International Conference on Information Technology, IEEE.
Chen, X. and Zhang, X. 2008. HAWK: A Focused Crawler with Content and Link Analysis. In proceeding of International Conference on e-Business Engineering, IEEE.
Hati, D. , Sahoo, B. , Kumar, A. 2010. Adaptive Focused Crawling Based on Link Analysis. In proceeding of 2nd International Conference on Education Technology and Computer (ICETC), IEEE.
Thenmalar, S. and Geetha, T. V. 2011. Concept based Focused Crawling using Ontology. International Journal of Computer Applications, Volume 26, No. 7, PP. 29-32.
Choudhary, J. and Roy, D. 2013. Priority based Focused Web crawler. International Journal of Computer Engineering and Technology, Vol. 4, No. 4, PP. 163-169.
Salton, B. 1988. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management Elsevier, Vol. 24, No. 5, PP. 513-523.
Lee, D. L. , Chuang, H. and Seamons, K. 1997. Document Ranking and the Vector-space Model. IEEE Software, Vol. 14, No. 2, PP. 67-75.
Chakrabarti, S. , van den Berg, M. and Dom, B. 1999. Focused crawling: a new approach to topic-specific Web resource discovery. In proceeding of 8th International WWW Conference.

Index Terms

Computer Science

Information Sciences

Keywords

Priority ontology Semantic similarity downloader search engine