CFP last date
20 December 2024
Reseach Article

Article:Realization of Framework for Web Content Extraction and Classification

by Ganesh D. Puri, Prof. Y.C. Kulkarni
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 32 - Number 6
Year of Publication: 2011
Authors: Ganesh D. Puri, Prof. Y.C. Kulkarni
10.5120/3908-5486

Ganesh D. Puri, Prof. Y.C. Kulkarni . Article:Realization of Framework for Web Content Extraction and Classification. International Journal of Computer Applications. 32, 6 ( October 2011), 22-26. DOI=10.5120/3908-5486

@article{ 10.5120/3908-5486,
author = { Ganesh D. Puri, Prof. Y.C. Kulkarni },
title = { Article:Realization of Framework for Web Content Extraction and Classification },
journal = { International Journal of Computer Applications },
issue_date = { October 2011 },
volume = { 32 },
number = { 6 },
month = { October },
year = { 2011 },
issn = { 0975-8887 },
pages = { 22-26 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume32/number6/3908-5486/ },
doi = { 10.5120/3908-5486 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:18:28.700309+05:30
%A Ganesh D. Puri
%A Prof. Y.C. Kulkarni
%T Article:Realization of Framework for Web Content Extraction and Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 32
%N 6
%P 22-26
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Web content extraction and classification can be viewed as combination of different methods. Nowadays web page contains lot of information including main contents. Contents extraction which are of user’s interest is main task. Text mining is the technique that helps users to find useful information from a large amount of digital text documents on the Web or databases. It is therefore crucial that a good text mining model should retrieve the information that meets user’s needs within a relatively efficient time frame. A first step toward any Web-based text mining effort would be to collect a significant number of Web mentions of a subject. Thus, the challenge becomes not only to find all the subject occurrences, but also to filter out just those that have the desired meaning. The system described in this paper is capable of extracting main content and classify it. Vector space model method is used for classification.

References
  1. Bing Liu ‘Web data mining’ Exploring hyperlinks contents and usage data.Springer Heidelberg, New York.
  2. Weiguo Fan1, Linda Wallace, Stephanie Rich, Zhongju Zhang “Tapping into the Power of Text Mining”.
  3. Suhit Gupta "context Based content Extraction of HTML Documents" M.S. Thesis Proposal, Dept of comp. sci.,Columbia University,New York,2004.
  4. Shiqun Yin Gang Wang Yuhui Qiu Weiqun Zhang. ” Research and Implement of Classification Algorithm on Web Text Mining”. IEEE.(2007)446-449
  5. Thomas Gottron. "Evaluatig content extraction on HTML documents" In ITA '07:Proceeding of 2nd International Conference on Internet Technologies and Applications, pages 123-132,September 2007.
  6. Neha Gupta, Dr.saba Hilal "A Heuristic Approach for Web content extraction"International Journal of Computer Applications(0975-8887) volume 15-No.5 Feb 2011
  7. Yin Yuhui Qiu Jike Ge, Xiaohong Lan.”Research and Realization of Extraction Algorithm on Web Text Mining”. (2007)278-281. Workshop on Intelligent Information Tech nology Application
  8. Shiquin Yin Yuhui Qiu ,Chengwen Zhong Jifu Zhou. “Study of Web Information extraction and Classification Method”.IEEE Transaction(2007)5548-5552.
  9. Yves Weissig, Thomas Gottron. “Combinations of Content Extraction Algorithms”.
Index Terms

Computer Science
Information Sciences

Keywords

MVC Architecture VSM model Text Mining Extraction Classification