CFP last date
20 January 2025
Reseach Article

HWPDE: Novel Approach for Data Extraction from Structured Web Pages

by Manpreet Singh Sehgal, Anuradha and
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 50 - Number 8
Year of Publication: 2012
Authors: Manpreet Singh Sehgal, Anuradha and
10.5120/7791-0897

Manpreet Singh Sehgal, Anuradha and . HWPDE: Novel Approach for Data Extraction from Structured Web Pages. International Journal of Computer Applications. 50, 8 ( July 2012), 22-27. DOI=10.5120/7791-0897

@article{ 10.5120/7791-0897,
author = { Manpreet Singh Sehgal, Anuradha and },
title = { HWPDE: Novel Approach for Data Extraction from Structured Web Pages },
journal = { International Journal of Computer Applications },
issue_date = { July 2012 },
volume = { 50 },
number = { 8 },
month = { July },
year = { 2012 },
issn = { 0975-8887 },
pages = { 22-27 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume50/number8/7791-0897/ },
doi = { 10.5120/7791-0897 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:47:46.250914+05:30
%A Manpreet Singh Sehgal
%A Anuradha and
%T HWPDE: Novel Approach for Data Extraction from Structured Web Pages
%J International Journal of Computer Applications
%@ 0975-8887
%V 50
%N 8
%P 22-27
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Diving into the World Wide Web for the purpose of fetching precious stones (relevant information) is a tedious task under the limitations of current diving equipments (Current Browsers). While a lot of work is being carried out to improve the quality of diving equipments, a related area of research is to devise a novel approach for mining. This paper describes a novel approach to extract the web data from the hidden websites so that it can be used as a free service to a user for a better and improved experience of searching relevant data. Through the proposed method, relevant data (Information) contained in the web pages of hidden websites is extracted by the crawler and stored in the local database so as to build a large repository of structured and indexed and ultimately relevant data. Such kind of extracted data has a potential to optimally satisfy the relevant Information starving end user.

References
  1. The Deep Web: Surfacing Hidden Value. http://www. completeplanet. com/Tutorials/DeepWeb/.
  2. S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(5360):98, 1998.
  3. S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107{109, 1999}
  4. Bing Liu, Robert Grossman, and Yanhong Zhai. Mining data records in web pages. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–606, New York, NY, USA, 2003. ACM Press.
  5. Ntoulas, A. , Zerfos, P. , Cho, J. Downloading Textual Hidden Web Content Through Keyword Queries. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries.
  6. Ji Ma; Derong Shen; TieZheng Nie DESP: An Automatic Data Extractor on Deep Web Pages Web Information Systems and Applications Conference (WISA), 2010 7th Publication Year: 2010, Page(s): 132 - 136
  7. Anuradha, A. K Sharma. "Structure based Data Extraction from Hidden Web Sources " Published in International Journal of Computer Applications (0975-8887) Volume 25-No. 3 July 2011 pages 32-37
  8. Cai, D. , Yu, S. , Wen, J. -R. , and Ma, W. -Y. 2003. VIPS: a Vision-based Page Segmentation Algorithm. Tech. Rep. MSR-TR-2003-79, Microsoft Technical Report.
  9. Anuradha, A. K Sharma. "A Novel Technique for data extraction From Hidden Web Databases Published in International Journal of Computer Applications (0975-8887) Volume 15-No. 4 February 2011 pages 45-48
  10. YalinWang and Jianying Hu. A machine learning based approach for table detection on the web. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages
Index Terms

Computer Science
Information Sciences

Keywords

Hidden Web Web page Extraction Web Page Service