We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Call for Paper
December Edition
IJCA solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 20 November 2024

Submit your paper
Know more
Reseach Article

Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface

by S. Jeyalatha, B. Vijayakumar, Munawwar Firoz
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 22 - Number 4
Year of Publication: 2011
Authors: S. Jeyalatha, B. Vijayakumar, Munawwar Firoz
10.5120/2575-3551

S. Jeyalatha, B. Vijayakumar, Munawwar Firoz . Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface. International Journal of Computer Applications. 22, 4 ( May 2011), 1-6. DOI=10.5120/2575-3551

@article{ 10.5120/2575-3551,
author = { S. Jeyalatha, B. Vijayakumar, Munawwar Firoz },
title = { Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface },
journal = { International Journal of Computer Applications },
issue_date = { May 2011 },
volume = { 22 },
number = { 4 },
month = { May },
year = { 2011 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume22/number4/2575-3551/ },
doi = { 10.5120/2575-3551 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:09:01.356195+05:30
%A S. Jeyalatha
%A B. Vijayakumar
%A Munawwar Firoz
%T Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface
%J International Journal of Computer Applications
%@ 0975-8887
%V 22
%N 4
%P 1-6
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper deals with Web Content Mining. While browsing the web, the user has to go through many pages of the Internet, filter the data and download related documents and files. This task of searching and downloading is time consuming. Sometimes the search queries call for specific option, say, limiting search to few links. To reduce the time spent by users, a web extraction and storage tool has been designed and implemented in Java, that automates the downloading task from a given user query. The Test Scenario has been presented with various keywords. The present work can be a useful input to Web Users, Faculty, Students and Web Administrators in a University Environment.

References
  1. Hrvoje Nikšić andGiuseppe Scrivano, GNU WGet, http://www.gnu.org/software/wget/
  2. GNU Wget for windows, http://gnuwin32.sourceforge.net/packages/wget.htm
  3. HTMLCleaner Team, HTMLCleaner, http://htmlcleaner.sourceforge.net/
  4. James Clark and Steve DeRose, W3C XPath Specifications, http://www.w3.org/TR/xpath/
  5. Elliotte Rusty Harold, Java XPath API, http://www.ibm.com/developerworks/library/x-javaxpathapi.html
  6. XPath Tutorial, http://www.w3schools.com/xpath/default.asp
  7. Introduction to Java Programming Y Daniel Liang, Prentice Hall Europe 2007.
  8. N. Freed, N. Borenstein, MIME Format specification, http://www.ietf.org/rfc/rfc2045.txt
  9. List of MIME Types, http://reference.sitepoint.com/html/mime-types-full
  10. R. Kosala., H. Blockeel. Web Mining Research: A Survey, ACM SIGKDD Explorations, 2000, 2:1-15.
  11. Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, "Extensible Markup Language (XML) 1.0", http://www.w3.org/TR/xml, 2008.
  12. S.Jeyalatha., B. Vijayakumar., E.A. Hazarika. Design of an Interface for Page Rank Calculation using Web Link Attributes Information, Informatica Economica, Vol 14, Issue 3, 2010.
  13. Cooley R., Mobasher B., Srivastava J. Web Mining : Information and Pattern Discovery on the World Wide Web. ACM SIGKDD Explorations Newsletter. 2000. 1: 12-23.
  14. K. Pol, N. Patil, S. Patankar, C. Das, "A Survey on Web Content Mining and Extraction of Structured and Semistructured Data", First International Conference on Emerging Trends in Engineering and Technology, ICETET, Nagpur, India, pp. 543-546, 2008.
  15. S. Brin., L. Page. The Anatomy of a Large Scale Hypertextual Web Search Engine, Proc 7th International WWW Conference, Brisbane, Australia, pp 107-117, 1998.
  16. L.K. Joshila Grace, V. Maheshwari, D. Nagamalai, Analysis of Web Logs and Web User in Web Mining, International Journal of Network Security & Its Applications,Vol 3. Issue 1, Jan 2011.
  17. S. B. Boddu, V. P. K. Anne; R. R. Kurra, D.K. Mishra, “Knowledge Discovery and Retrieval on World Wide Web Using Web Structure Mining”, Fourth Asia International Conference on Mathematical/Analytical Modelling and Computer Simulation (AMS), pp 533, June 2010.
  18. B. Singh, H.K. Singh, “Web Data Mining research - A survey” IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Dec 2010.
  19. S.Jeyalatha., B. Vijayakumar., Zainab A. S., Design Considerations for a Data Warehouse in an Academic Environment, World Academy of Science, Engineering and Technology, Issue 71, pp 421-425, Oct 2010.
  20. V. Sathiyamoorthi, V. M. Bhaskaran, “Data Preparation Techniques for Web Usage Mining in World Wide Web – An Approach”, International Journal of Recent trends in Engineering, Vol 2, No 4, Nov 2009.
Index Terms

Computer Science
Information Sciences

Keywords

Content Mining Data Extraction HTML Web Data Retrieval Web Information Extraction.