We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree

by Khaing Wah Wah Linn
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 108 - Number 17
Year of Publication: 2014
Authors: Khaing Wah Wah Linn
10.5120/19006-0547

Khaing Wah Wah Linn . Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree. International Journal of Computer Applications. 108, 17 ( December 2014), 34-37. DOI=10.5120/19006-0547

@article{ 10.5120/19006-0547,
author = { Khaing Wah Wah Linn },
title = { Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree },
journal = { International Journal of Computer Applications },
issue_date = { December 2014 },
volume = { 108 },
number = { 17 },
month = { December },
year = { 2014 },
issn = { 0975-8887 },
pages = { 34-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume108/number17/19006-0547/ },
doi = { 10.5120/19006-0547 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:43:37.223874+05:30
%A Khaing Wah Wah Linn
%T Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree
%J International Journal of Computer Applications
%@ 0975-8887
%V 108
%N 17
%P 34-37
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The world wide (www) serves a huge, widely distributed global information services. A huge amount of data have been accumulated and stored on the web. The information on Web is usually presented via Hypertext Markup Language (HTML) to make its perception easier for humans. Web pages usually contain various contents, which are relevant or irrelevant to the main topic. Irrelevant contents are called noise. A web page usually contains the number of noise which is not related to the main information of the page such as navigation bar, advertisements, and related articles and so on. Noise on the web pages tends to problem mining the main content of these pages. This paper is proposed wed page segmentation using Gomory-Hu tree based Vision-based Page Segmentation (VIPS) algorithm.

References
  1. Cai, D. ,Yu, S. , Wen, J. R. , Ma, W. Y. , "VIPS: A vision-based segmentation algorithm". 2003.
  2. Elgin Akpinar and Yeliz Yesilada, "Vision Based Page Segmentation: Extended and Improved Alorithm", Middle East Technical University, Ankara, Turkey.
  3. Deng C. , Shipeng Y. , Ji-Rong W. , Wei-Ying M. , "Extraction Content Structure for Web Pages based on Visual Representation", Microsoft Research Asia, China.
  4. Brown, L. D. , Hua, H. , and Gao, C. 2003. A widget framework for augmented interaction in SCAPE.
  5. Amit Chauhan, Himanshu Uniyal, Dr. Bhasker Pant, "Cleaining Web Pages for Relevant Text Extraction and Text Categorization", Graphic Era University, India.
  6. Deng C. , Shipeng Y. , Ji-Rong W. , Wei-Ying M. , "Block-based Web Search", Microsoft Research Asia, China.
  7. Swe Swe Nyein, "Mining Contents in Web Page Using Cosine Similarity", University of Computer Studies, Yangon, Myanmar.
  8. Xinyue Liu, 2011 "Segmenting Webpage with Gomory-Hu Tree Based Clustering", Dalian University of Technology, Dalian, China.
  9. Han Fengjiao, Zhou Zhurong, 2012, "Semantics-based Extraction of Webpage Main Text", Chongqing.
  10. Aihua Zhang, Jiwu Jing, Le Kang, Lingchen Zhang, "Precise web page segmentation based on semantic block headers detection", University of science and technology, China.
  11. Chaw Su Win, "Informative Content Extraction By using Eifce", IJSTR, 2013
Index Terms

Computer Science
Information Sciences

Keywords

Web Page Segmentation Vision-based Page Segmentation Gomory-Hu tree