We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Learning based Clustering for the Automatic Annotations from Web Databases

by Richa Saxena, Sushil Kumar Chaturvedi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 113 - Number 7
Year of Publication: 2015
Authors: Richa Saxena, Sushil Kumar Chaturvedi
10.5120/19838-1692

Richa Saxena, Sushil Kumar Chaturvedi . Learning based Clustering for the Automatic Annotations from Web Databases. International Journal of Computer Applications. 113, 7 ( March 2015), 18-23. DOI=10.5120/19838-1692

@article{ 10.5120/19838-1692,
author = { Richa Saxena, Sushil Kumar Chaturvedi },
title = { Learning based Clustering for the Automatic Annotations from Web Databases },
journal = { International Journal of Computer Applications },
issue_date = { March 2015 },
volume = { 113 },
number = { 7 },
month = { March },
year = { 2015 },
issn = { 0975-8887 },
pages = { 18-23 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume113/number7/19838-1692/ },
doi = { 10.5120/19838-1692 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:50:19.918320+05:30
%A Richa Saxena
%A Sushil Kumar Chaturvedi
%T Learning based Clustering for the Automatic Annotations from Web Databases
%J International Journal of Computer Applications
%@ 0975-8887
%V 113
%N 7
%P 18-23
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Rapid increase of use of internet provides knowledge extraction from the web databases and HTML pages associated with it. Although there are various techniques implemented for the access of the annotations of the search results from the web databases. Here in this paper by identifying the problems with the existing techniques for the annotation search results from web databases such as alignment problem or to split composite text node when there are no explicit separators. Here propose an efficient technique which overcomes the above problems by using some supervised learning algorithm such as support vector machine. The technique implemented provides high rate of information by providing high annotations search results from web databases. The proposed method implemented here for the efficient retrieval of text nodes and data units using supervised learning approach using SVM provides efficient precision and recall as compared to the existing approach. The proposed methodology implemented here using SVM based clustering and labeling of search records is compared with existing methodology implemented for the search records. The Result Analysis shows the performance of the proposed methodology. The proposed method shows higher precision and recall as well as has high Accuracy for the prediction of annotated search records from the web databases.

References
  1. Y. Lu, H. He, H. Zhao, W. Meng, and C. Yu, "Annotating Structured Data of the Deep Web," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007.
  2. Priyanka P. Boraste "A Survey on Data Annotation for the Web Databases "IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. XI (Mar-Apr. 2014), PP 68-70 www. iosrjournals. org.
  3. Y. Pauline Jeba, Mrs. P. Rebecca Sandra, "A Survey On Annotating Search Results From Web Databases", International Journal Of Research In Computer Applications And Robotics, Vol -1, Issue-9, 2013.
  4. J. Kahan, M-R. Koivunen, Annotea: an open RDF infrastructure for shared Web annotations. Proceedings of the 10th international conference on World Wide Web, 2001.
  5. L. Gravano, H. Garcia-Molina, A. Tomasic, "GlOSS: Text-Source Discovery over Internet", TODS 24(2), 1999.
  6. K. Khelif, R. Dieng-Kuntz, P. Barbry, An Ontology-based Approach to Support Text Mining and Information Retrieval in the Bio logical Domain, in J. UCS 13(12), pp. 1881-1907, 2007.
  7. A. Setzer, R. Gaizauskas, TimeM L: Robust specification of event and temporal expressions in text. In The second international conference on language resources and evaluation, 2000.
  8. C. Roussey, S. Calabretto, An experiment using Conceptual Graph Structure for a Multilingual Information System, in the 13th International Conference on Conceptual Structures, ICCS'2005.
  9. A Survey of Current Approaches for Mapping of Relational Databases to RDF. Retrieved October 28, 2011 from www. w3. org/2005/Incubator/ rdb2rdf/RDB2RDF_SurveyReport. pdf, 2005.
  10. J. Madhayan et al, "Google's Deep-Web Crawl. " Proceedings of the VLDB Endowment, Vol. 1, Issue 2, pp. 1241-1252, 2008.
  11. A Survey of Web Information Extraction Systems Chia-Hui Chang, Member, IEEE Computer Society, Mohammed Kayed, Moheb Ramzy Girgis, Member, Ieee Transactions On Knowledge And Data Engineering, VOL. 18, NO. 10, OCTOBER 2006
  12. V. Crescenzi, G. Mecca, and P. Merialdo, "RoadRUNNER: Towards Automatic Data Extraction from Large Web Sites," Proc. Very Large Data Bases (VLDB) Conf. , 2001.
Index Terms

Computer Science
Information Sciences

Keywords

Annotations Wrapper Semantic Model HTML Tags NLP Ontology UIUC.