CFP last date
20 January 2025
Reseach Article

Ontology based Web Page Topic Identification

by Abhishek Singh Rathore, Devshri Roy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 85 - Number 6
Year of Publication: 2014
Authors: Abhishek Singh Rathore, Devshri Roy
10.5120/14849-3211

Abhishek Singh Rathore, Devshri Roy . Ontology based Web Page Topic Identification. International Journal of Computer Applications. 85, 6 ( January 2014), 35-40. DOI=10.5120/14849-3211

@article{ 10.5120/14849-3211,
author = { Abhishek Singh Rathore, Devshri Roy },
title = { Ontology based Web Page Topic Identification },
journal = { International Journal of Computer Applications },
issue_date = { January 2014 },
volume = { 85 },
number = { 6 },
month = { January },
year = { 2014 },
issn = { 0975-8887 },
pages = { 35-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume85/number6/14849-3211/ },
doi = { 10.5120/14849-3211 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:01:48.850083+05:30
%A Abhishek Singh Rathore
%A Devshri Roy
%T Ontology based Web Page Topic Identification
%J International Journal of Computer Applications
%@ 0975-8887
%V 85
%N 6
%P 35-40
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the emergence of the web, lots of research efforts are made in the area of Web Mining. This paper proposes an automatic approach for automatic topic identification from the web pages. The contribution of this research is in the approach of automatic topic identification of web pages that can provide better results. The topic of the web documents is identified through ontological approach. Keywords are extracted from the basic HTML tags and co-occurrence of words in the text instead of calculating the frequency of each term exits in a web page. Domain ontology is developed to map topics of the documents. Keywords are mapped to the ontology with a Levenshtein Edit Distance to extract topic of the web page. The result could give benefit to the search engines for faster tagging of web pages.

References
  1. Chang, C. H. , Hsu, C. N. and Lui, S. C. 2003. Automatic information extraction from semi-structured Web pages by pattern discovery. Decision Support Systems. 35, 129-147.
  2. Villarreal, S. E. G. , Elizalde, L. M. and Viveros, A. C. 2009. Clustering hyperlinks for topic extraction: an exploratory analysis. In Proceedings of the Eighth Mexican International Conference on Artificial Intelligence.
  3. Coursey, K. and Mihalcea, R. 2009. Using Encyclopedic Knowledge for Automatic Topic Identification. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL).
  4. Butler D. 2000. Souped-up search engines. Nature. 405, 112-115.
  5. Liu, X. , Duan, X. and Zhang, H. 2012. Application of Ontology in Classification of Agricultural Information. In Proceedings of the IEEE Symposium on Robotics and Applications (ISRA).
  6. Salton, G. and Buckley, C. 1988. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management. 24 (5), 513-523.
  7. Chakrabarti, S. 2003. Mining the Web: Discovering the Knowledge from Hypertext Data. Elsevier Science, 48.
  8. Yang, Y. , He, L. and Qiu, M. 2011. Exploration and Improvement in Keyword Extraction for News Based on TFIDF. In Proceedings of the ESEP 2011.
  9. Kong, H. , Hwang, M. , Hwang, G. Shim, J. and Kim, P. 2006. Topic Selection of Web Documents Using Specific Domain Ontology. In Proceedings of the MICAI 2006.
  10. Tiun, S. , Abdullah, R. and Kong, T. E. 2001. Automatic Topic Identification Using Ontology Hierarchy. In Proceedings of the CICLing 2001.
  11. Fang, J. , Guo, L. , Wang, X. D. and Yang, N. 2007. Ontology-Based Automatic Classification and Ranking for Web Documents. In Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).
  12. Coursey, K. Mihalcea, R. and Moen, W. 2009. Using Encyclopedic Knowledge for Automatic Topic Identification. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL).
  13. He, X. , Ding, C. H. Q. , Zha, H. and Simon, H. D. 2001. Automatic Topic Identification Using Webpage Clustering. In Proceedings of the International Conference on Data Mining (ICDM 2001).
  14. Shahsav, H. and Baghdadi, B. R. M. 2011. An Automatic Topic Identification Algorithm. Journal of Computer Science. 7 (9), 1363-1367.
  15. Wang, C. , Yuan, C. , Wang, X. and Xue, W. 2011. Dirichlet Process Mixture Models based topic identification for short text streams. In Proceedings of the Seventh International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2011).
  16. Snasel V. , Moravec, P. and Pokorny, J. 2008. Using Semi-discrete Decomposition for Topic Identification. In Proceedings of the Eighth International Conference on Intelligent Systems Design and Applications, ISDA '08.
  17. Burger S. and Stieger, B. 2010. Ontology-based classification of unstructured information. In Proceedings of the Fifth International Conference on Digital Information Management (ICDIM 2010 ).
  18. Available: http://en. wikipedia. org/wiki/Ontology_(information_science)
  19. Hepp, M. 2008. Ontologies: State of The Art, Business Potential, And Grand Challenges. In Ontology Management Semantic Web, Semantic Web Services, and Business Application, 7, Springer.
  20. Matsuo Y. and Ishizuka M. 2003. Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information. International Journal on Artificial Intelligence Tools. 10 (1), 157-169.
  21. Jain, S. and Pareek, J. 2010. Automatic Topic(s) Identification from Learning Material: An Ontological Approach. In Proceedings of the Second International Conference on Computer Engineering and Applications (ICCEA 2010).
Index Terms

Computer Science
Information Sciences

Keywords

DOM Word co-occurrence Ontology Topic Identification.