CFP last date
20 January 2025
Reseach Article

Web Content Mining Techniques: A Survey

by Faustina Johnson, Santosh Kumar Gupta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 47 - Number 11
Year of Publication: 2012
Authors: Faustina Johnson, Santosh Kumar Gupta
10.5120/7236-0266

Faustina Johnson, Santosh Kumar Gupta . Web Content Mining Techniques: A Survey. International Journal of Computer Applications. 47, 11 ( June 2012), 44-50. DOI=10.5120/7236-0266

@article{ 10.5120/7236-0266,
author = { Faustina Johnson, Santosh Kumar Gupta },
title = { Web Content Mining Techniques: A Survey },
journal = { International Journal of Computer Applications },
issue_date = { June 2012 },
volume = { 47 },
number = { 11 },
month = { June },
year = { 2012 },
issn = { 0975-8887 },
pages = { 44-50 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume47/number11/7236-0266/ },
doi = { 10.5120/7236-0266 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:41:38.428975+05:30
%A Faustina Johnson
%A Santosh Kumar Gupta
%T Web Content Mining Techniques: A Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 47
%N 11
%P 44-50
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The Quest for knowledge has led to new discoveries and inventions. With the emergence of World Wide Web, it became a hub for all these discoveries and inventions. Web browsers became a tool to make the information available at our finger tips. As years passed World Wide Web became overloaded with information and it became hard to retrieve data according to the need. Web mining came as a rescue for the above problem. Web content mining is a subdivision under web mining. This paper deals with a study of different techniques and pattern of content mining and the areas which has been influenced by content mining. The web contains structured, unstructured, semi structured and multimedia data. This survey focuses on how to apply content mining on the above data. It also points out how web content mining can be utilized in web usage mining.

References
  1. Ahmed, S. S. , Halim, Z. , Blaig, R. and Bashir, S. 2008. Web Content Mining: A Solution to Consumers Product Hunt. International Journal of Social and Human Sciences 2, 6-11.
  2. Ajoudanian, S. and Jazi, M. D. 2009. Deep Web Content Mining. World Academy of Science, Engineering and Technology 49.
  3. Bassiou, N. and Kotropoulos, C. 2006. Color Histogram Equalization using Probability Smoothening. Proceedings of XIV European Signal Processing Conference
  4. Bharanipriya, V. and Prasad, K. 2011. Web content Mining Tools: A Comparative study. International Journal of Information Technology and Knowledge Management. Vol. 4. No 1,211- 215.
  5. Cooper, M. , Foote, J. , Adcock, J. and Casi, S. 2003. Shot Boundary Detection via Similarity Analysis. In Proceedings of TRECVID 2003 workshop.
  6. Dunham, M. H. 2003. Data Mining Introductory and Advanced Topics. Pearson Education.
  7. Etzioni, O. 1996. The World Wide Web: quagmire or gold mine?. Communications of the ACM. Vol. 39. Issue 11. pp. 65-68.
  8. Fan, W. , Wallace, L. , Rich, S. and Zhang, Z. 2005. Tapping into the Power of Text Mining. Communications of the ACM – Privacy and Security in highly dynamic systems. Vol. 49, Issue-9.
  9. Fayyad, U. M. 1995. SKICAT: Sky Image Cataloging and Analysis Tool. ACM Proceedings of the 14th International joint Conference on Artificial Intelligence. Vol. 2.
  10. Gedov, V. , Stolz, C. , Neuneir, R. , Skubacz, M. and Siepel, D. 2004. Matching Web Site Structure andContent. ACM. Proceedings of the 13th International World Wide Web Conference on Alternate track papers and posters.
  11. Guo, J. , Keselj, V. and Gao, Q. 2005. Integrating Web Content Clustering into Web Log Association Rule Mining. Springer Verlag. Vol. 3501 LNAI, 182-193.
  12. Gupta, V. and Lehal, G. S. 2009. A Survey of Text Mining Techniques and Applications. Journal of Emerging Technologies in Web Intelligence. Vol. 1 . pp. 60-76.
  13. Inamdar, S. A. and shinde, G. N. 2010. An Agent Based Intelligent Search Engine System for Web Mining. International Journal on Computer Science and Engineering, Vol. 02, No. 03.
  14. Kazienko, P. and Kiewra, M. 2003. Link Recommendation Method Based on Web Content and Usage Mining. New Trends in Intelligent Information Processing and Web Mining Proc. of the International IIS: IIPWM '03 Conference. Advances in soft Computing, Springer Verlag. 529-534.
  15. Kosla, R. and Blockeel, H. 2000. Web Mining Research: A Survey. SIG KDD Explorations. Vol. 2, 1-15.
  16. Liu, B. and Chiang K. C. 2004. Editorial Special Issue on Web Content Mining. ACM. Journal of Machine Learning Research 4, 177-210.
  17. MitChell, T. 1997. Machine Learning. McGraw Hill.
  18. Nimgaonkar, S. and Duppala, S. 2012. A Survey on Web Content Mining and extraction of Structured and Semi structured data, IJCA Journal
  19. Oh, J. and Bandi, B. 2002. Multimedia Data Mining Framework for Raw video sequences. ACM. Third International Workshop on Multimedia Data Mining. Pp. 1-10.
  20. Pokorny, J. and Smigansky, J. 2005. Page Content Rank: An Approach to the Web Content Mining. In proceedings of IADIS International Conference Applied Computing. Algarve, Portugal.
  21. Pol, K. , Patil, N. , Patankar, S. and Das, C. 2008. A Survey on Web Content Mining and extraction of Structured and Semi structured Data. IEEE First International Conference on Emerging Trends in Engineering and Technology. pp. 543-546.
  22. Poonkuzhali, G. , Thiagarajan, K. , Sarukesi, K. and Uma G. V. 2009. Signed Approach for Mining Web Content Outliers. World Academy of Science, Engineering and Technology 56.
  23. Singh, B. and Singh, H. K. 2010. Web Data Mining Research: A Survey. Computational Intelligence and Computing Research (ICCIC). IEEE International Conference, 1-10.
  24. Smeaton, A. F. , Over, P. and Doherty, A. R. 2010. Video Shot Boundary Detection: Seven years of TRECVID Activity. Elsevier, Computer Vision and Image Understanding. Vol. 114, Issue 4. Pp. 411-418.
  25. Srivastava, J. , Cooley, R. , Deshpande, M. , Tan, P. N. 2000. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data.
  26. Taherizadeh, S. and Moghadam, N. 2009. Integrating Web Content Mining into Web Usage Mining for Finding Patterns and Predicting User's Behaviors. International Journal of Information Science and Management. Vol. 7, No. 1.
  27. Torreblanca, A. M. , Gomez, M. M. and Lopez, A. L. 2002. A Trend Discovery System for Dynamic Web Content Mining. Proceedings of the 11th International Conference on Computing.
  28. Van. C. J. 1979. Information Retrieval. Butterworths.
  29. Yang, C. Y. , Hsu, H. H. and Hung, J. C. 2006. A Web Content Suggestion System for Distance Learning. Tamkang Journal of Science and Engineering. Vol. 9, No. 3, 243-254.
  30. Zhang, J. , Hsu, W. and Lee, M. L. 2001. Image Mining: Issues, FrameWorks and Techniques. In Proceedings of the 2nd International Workshop Multimedia Data Mining. pp. 13-20.
Index Terms

Computer Science
Information Sciences

Keywords

Web Content Mining Web Usage Mining Structured Data Unstructured Data Semi-structured Data Multimedia Data