Article:Realization of Framework for Web Content Extraction and Classification

Ganesh D. Puri; Prof. Y.C. Kulkarni

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Article:Realization of Framework for Web Content Extraction and Classification

by Ganesh D. Puri, Prof. Y.C. Kulkarni

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 32 - Number 6

Year of Publication: 2011

Authors: Ganesh D. Puri, Prof. Y.C. Kulkarni

10.5120/3908-5486

Ganesh D. Puri, Prof. Y.C. Kulkarni . Article:Realization of Framework for Web Content Extraction and Classification. International Journal of Computer Applications. 32, 6 ( October 2011), 22-26. DOI=10.5120/3908-5486

@article{ 10.5120/3908-5486,

author = { Ganesh D. Puri, Prof. Y.C. Kulkarni },

title = { Article:Realization of Framework for Web Content Extraction and Classification },

journal = { International Journal of Computer Applications },

issue_date = { October 2011 },

volume = { 32 },

number = { 6 },

month = { October },

year = { 2011 },

issn = { 0975-8887 },

pages = { 22-26 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume32/number6/3908-5486/ },

doi = { 10.5120/3908-5486 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:18:28.700309+05:30

%A Ganesh D. Puri

%A Prof. Y.C. Kulkarni

%T Article:Realization of Framework for Web Content Extraction and Classification

%J International Journal of Computer Applications

%@ 0975-8887

%V 32

%N 6

%P 22-26

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Web content extraction and classification can be viewed as combination of different methods. Nowadays web page contains lot of information including main contents. Contents extraction which are of user’s interest is main task. Text mining is the technique that helps users to find useful information from a large amount of digital text documents on the Web or databases. It is therefore crucial that a good text mining model should retrieve the information that meets user’s needs within a relatively efficient time frame. A first step toward any Web-based text mining effort would be to collect a significant number of Web mentions of a subject. Thus, the challenge becomes not only to find all the subject occurrences, but also to filter out just those that have the desired meaning. The system described in this paper is capable of extracting main content and classify it. Vector space model method is used for classification.

References

Bing Liu ‘Web data mining’ Exploring hyperlinks contents and usage data.Springer Heidelberg, New York.
Weiguo Fan1, Linda Wallace, Stephanie Rich, Zhongju Zhang “Tapping into the Power of Text Mining”.
Suhit Gupta "context Based content Extraction of HTML Documents" M.S. Thesis Proposal, Dept of comp. sci.,Columbia University,New York,2004.
Shiqun Yin Gang Wang Yuhui Qiu Weiqun Zhang. ” Research and Implement of Classification Algorithm on Web Text Mining”. IEEE.(2007)446-449
Thomas Gottron. "Evaluatig content extraction on HTML documents" In ITA '07:Proceeding of 2nd International Conference on Internet Technologies and Applications, pages 123-132,September 2007.
Neha Gupta, Dr.saba Hilal "A Heuristic Approach for Web content extraction"International Journal of Computer Applications(0975-8887) volume 15-No.5 Feb 2011
Yin Yuhui Qiu Jike Ge, Xiaohong Lan.”Research and Realization of Extraction Algorithm on Web Text Mining”. (2007)278-281. Workshop on Intelligent Information Tech nology Application
Shiquin Yin Yuhui Qiu ,Chengwen Zhong Jifu Zhou. “Study of Web Information extraction and Classification Method”.IEEE Transaction(2007)5548-5552.
Yves Weissig, Thomas Gottron. “Combinations of Content Extraction Algorithms”.

Index Terms

Computer Science

Information Sciences

Keywords

MVC Architecture VSM model Text Mining Extraction Classification