National Technical Symposium on Advancements in Computing Technologies |
Foundation of Computer Science USA |
NTSACT - Number 5 |
August 2011 |
Authors: Snehal M. Shewale, Trupti S. Patil |
e71292de-9766-4fb8-a401-327fe2597c31 |
Snehal M. Shewale, Trupti S. Patil . Vide: A Vision-based Approach for Deep Web Data Extraction. National Technical Symposium on Advancements in Computing Technologies. NTSACT, 5 (August 2011), 34-40.
The data available on the web is so voluminous and Heterogeneous. Deep Web, contains magnitudes more and valuable information than the surface Web. Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages. A large number of techniques have been proposed to address this problem, but all of them are Web-pageprogramming- language-dependent. In this paper we reviewed a novel vision-based approach that is Web-pageprogramming- language-independent. ViDE utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction. Our experiments on a large set of Web databases show that the proposed vision-based approach is highly effective for deep Web data extraction.