International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 98 - Number 9 |
Year of Publication: 2014 |
Authors: Kopal Maheshwari, Namrata Tapaswi |
10.5120/17215-7448 |
Kopal Maheshwari, Namrata Tapaswi . Design and Implementation of Hidden based Web Retrieval using Innovative Vision-based Segmentation. International Journal of Computer Applications. 98, 9 ( July 2014), 42-47. DOI=10.5120/17215-7448
We assimilate the extracted information from a conference website to acquire the clean and high superiority academic data. This research has subsequent contributors: We propose a novel vision-based page segmentation algorithm, which use DOM tree to compensate the information loss of classical vision-based segmentation algorithm. We transform the conference Web material extraction which is difficult into a classification problematic, and categorize text blocks as predefined sets permitting to vision, key disputes, text and content information. We improve the classification quality by post-processing. Our experimental results on real-world datasets shows that our method is highly effective and efficient for extracting academic information from conference pages.