Web Data Extraction

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Web Data Extraction

Published on April 2012 by P. A. Chaudhari, R. L. Paikrao

Emerging Trends in Computer Science and Information Technology (ETCSIT2012)

Foundation of Computer Science USA

ETCSIT - Number 4

April 2012

Authors: P. A. Chaudhari, R. L. Paikrao

P. A. Chaudhari, R. L. Paikrao . Web Data Extraction. Emerging Trends in Computer Science and Information Technology (ETCSIT2012). ETCSIT, 4 (April 2012), 13-17.

@article{

author = { P. A. Chaudhari, R. L. Paikrao },

title = { Web Data Extraction },

journal = { Emerging Trends in Computer Science and Information Technology (ETCSIT2012) },

issue_date = { April 2012 },

volume = { ETCSIT },

number = { 4 },

month = { April },

year = { 2012 },

issn = 0975-8887,

pages = { 13-17 },

numpages = 5,

url = { /proceedings/etcsit/number4/5984-1027/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 Emerging Trends in Computer Science and Information Technology (ETCSIT2012)

%A P. A. Chaudhari

%A R. L. Paikrao

%T Web Data Extraction

%J Emerging Trends in Computer Science and Information Technology (ETCSIT2012)

%@ 0975-8887

%V ETCSIT

%N 4

%P 13-17

%D 2012

%I International Journal of Computer Applications

Abstract

Web is a huge reservoir of information. Data available is extremely diversified and abundant. To search for specific information, the user has to go through many pages of the Internet, filter the data and download related documents and files. This task of searching and downloading is time consuming. Web pages are in unstructured HTML format. There is a necessity to convert unstructured HTML format into a new structured format such as XML or XHTML. We propose an approach for implementing web data extraction and developing a Mashup from HTML web pages. The various stages of building a Mashup are Data Retrieval, Data Source Modeling, Data Cleaning/Filtering, Data Integration and Data Visualization. The data modeling stage renders Document Object Model (DOM) tree with the help of HTML Parser. Algorithms and rules are used to specifically analyze the HTML tags and extract the data. Furthermore, our application enables the user to perform his task without the need to write a script or program or even without any knowledge of computer programming. This approach will manage multiple servers and assure that our website will always have latest data. The Mashup created will help in the decision making process, which is the prima facie requirement for success in corporate world.

References

Jer Lang Hong, Fariza Fauzi, "Tree Wrap-data Extraction Using Tree Matching Algorithm", February 2010
Robert Baumgartner , Wolfgang Gatterbauer, "Web Data Extraction", 2010
Journal of Computer Science 7 (2): 129-142, 2011 ISSN 1549- 3636 © 2011 Science Publications " Proposing the new Algorithm and Technique Development for Integrating Web Table Extraction and Building a Mashup"
Rudy AG. Gultom, Riri Fitri Sari, "Implementing Web Data Extraction and Making Mashup with Xtractorz", 978-1-4244-4791-6/10/$25. 00_c 2010 IEEE.
Majlesi Journal of Electrical Engineering Vol. 4, No. 2, June 2010- 43,"Tree Wrap-data Extraction Using Tree Matching Algorithm"
D. Chamberlin and al. (Eds. ), "XQuery: A query language for XML",http://www. w3. org, 2001
Hsiao-Tzu Lu,Wuu Yang, "A Simple Tree Pattern Matching Algorithm",Vol 64,1999

Index Terms

Computer Science

Information Sciences

Keywords

Web Data Extraction making Mashup mashup Stages html Xml dom Tree