Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface

S. Jeyalatha; B. Vijayakumar; Munawwar Firoz

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface

by S. Jeyalatha, B. Vijayakumar, Munawwar Firoz

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 22 - Number 4

Year of Publication: 2011

Authors: S. Jeyalatha, B. Vijayakumar, Munawwar Firoz

10.5120/2575-3551

S. Jeyalatha, B. Vijayakumar, Munawwar Firoz . Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface. International Journal of Computer Applications. 22, 4 ( May 2011), 1-6. DOI=10.5120/2575-3551

@article{ 10.5120/2575-3551,

author = { S. Jeyalatha, B. Vijayakumar, Munawwar Firoz },

title = { Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface },

journal = { International Journal of Computer Applications },

issue_date = { May 2011 },

volume = { 22 },

number = { 4 },

month = { May },

year = { 2011 },

issn = { 0975-8887 },

pages = { 1-6 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume22/number4/2575-3551/ },

doi = { 10.5120/2575-3551 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:09:01.356195+05:30

%A S. Jeyalatha

%A B. Vijayakumar

%A Munawwar Firoz

%T Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface

%J International Journal of Computer Applications

%@ 0975-8887

%V 22

%N 4

%P 1-6

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper deals with Web Content Mining. While browsing the web, the user has to go through many pages of the Internet, filter the data and download related documents and files. This task of searching and downloading is time consuming. Sometimes the search queries call for specific option, say, limiting search to few links. To reduce the time spent by users, a web extraction and storage tool has been designed and implemented in Java, that automates the downloading task from a given user query. The Test Scenario has been presented with various keywords. The present work can be a useful input to Web Users, Faculty, Students and Web Administrators in a University Environment.

References

Hrvoje Nikšić andGiuseppe Scrivano, GNU WGet, http://www.gnu.org/software/wget/
GNU Wget for windows, http://gnuwin32.sourceforge.net/packages/wget.htm
HTMLCleaner Team, HTMLCleaner, http://htmlcleaner.sourceforge.net/
James Clark and Steve DeRose, W3C XPath Specifications, http://www.w3.org/TR/xpath/
Elliotte Rusty Harold, Java XPath API, http://www.ibm.com/developerworks/library/x-javaxpathapi.html
XPath Tutorial, http://www.w3schools.com/xpath/default.asp
Introduction to Java Programming Y Daniel Liang, Prentice Hall Europe 2007.
N. Freed, N. Borenstein, MIME Format specification, http://www.ietf.org/rfc/rfc2045.txt
List of MIME Types, http://reference.sitepoint.com/html/mime-types-full
R. Kosala., H. Blockeel. Web Mining Research: A Survey, ACM SIGKDD Explorations, 2000, 2:1-15.
Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, "Extensible Markup Language (XML) 1.0", http://www.w3.org/TR/xml, 2008.
S.Jeyalatha., B. Vijayakumar., E.A. Hazarika. Design of an Interface for Page Rank Calculation using Web Link Attributes Information, Informatica Economica, Vol 14, Issue 3, 2010.
Cooley R., Mobasher B., Srivastava J. Web Mining : Information and Pattern Discovery on the World Wide Web. ACM SIGKDD Explorations Newsletter. 2000. 1: 12-23.
K. Pol, N. Patil, S. Patankar, C. Das, "A Survey on Web Content Mining and Extraction of Structured and Semistructured Data", First International Conference on Emerging Trends in Engineering and Technology, ICETET, Nagpur, India, pp. 543-546, 2008.
S. Brin., L. Page. The Anatomy of a Large Scale Hypertextual Web Search Engine, Proc 7th International WWW Conference, Brisbane, Australia, pp 107-117, 1998.
L.K. Joshila Grace, V. Maheshwari, D. Nagamalai, Analysis of Web Logs and Web User in Web Mining, International Journal of Network Security & Its Applications,Vol 3. Issue 1, Jan 2011.
S. B. Boddu, V. P. K. Anne; R. R. Kurra, D.K. Mishra, “Knowledge Discovery and Retrieval on World Wide Web Using Web Structure Mining”, Fourth Asia International Conference on Mathematical/Analytical Modelling and Computer Simulation (AMS), pp 533, June 2010.
B. Singh, H.K. Singh, “Web Data Mining research - A survey” IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Dec 2010.
S.Jeyalatha., B. Vijayakumar., Zainab A. S., Design Considerations for a Data Warehouse in an Academic Environment, World Academy of Science, Engineering and Technology, Issue 71, pp 421-425, Oct 2010.
V. Sathiyamoorthi, V. M. Bhaskaran, “Data Preparation Techniques for Web Usage Mining in World Wide Web – An Approach”, International Journal of Recent trends in Engineering, Vol 2, No 4, Nov 2009.

Index Terms

Computer Science

Information Sciences

Keywords

Content Mining Data Extraction HTML Web Data Retrieval Web Information Extraction.