International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 108 - Number 17 |
Year of Publication: 2014 |
Authors: Khaing Wah Wah Linn |
10.5120/19006-0547 |
Khaing Wah Wah Linn . Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree. International Journal of Computer Applications. 108, 17 ( December 2014), 34-37. DOI=10.5120/19006-0547
The world wide (www) serves a huge, widely distributed global information services. A huge amount of data have been accumulated and stored on the web. The information on Web is usually presented via Hypertext Markup Language (HTML) to make its perception easier for humans. Web pages usually contain various contents, which are relevant or irrelevant to the main topic. Irrelevant contents are called noise. A web page usually contains the number of noise which is not related to the main information of the page such as navigation bar, advertisements, and related articles and so on. Noise on the web pages tends to problem mining the main content of these pages. This paper is proposed wed page segmentation using Gomory-Hu tree based Vision-based Page Segmentation (VIPS) algorithm.