Hybrid Approach for Annotating Unstructured Document

Meghana.h.j; Pushpa Ravikumar

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Hybrid Approach for Annotating Unstructured Document

by Meghana.h.j, Pushpa Ravikumar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 120 - Number 13

Year of Publication: 2015

Authors: Meghana.h.j, Pushpa Ravikumar

10.5120/21291-4270

Meghana.h.j, Pushpa Ravikumar . Hybrid Approach for Annotating Unstructured Document. International Journal of Computer Applications. 120, 13 ( June 2015), 38-41. DOI=10.5120/21291-4270

@article{ 10.5120/21291-4270,

author = { Meghana.h.j, Pushpa Ravikumar },

title = { Hybrid Approach for Annotating Unstructured Document },

journal = { International Journal of Computer Applications },

issue_date = { June 2015 },

volume = { 120 },

number = { 13 },

month = { June },

year = { 2015 },

issn = { 0975-8887 },

pages = { 38-41 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume120/number13/21291-4270/ },

doi = { 10.5120/21291-4270 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:06:10.043657+05:30

%A Meghana.h.j

%A Pushpa Ravikumar

%T Hybrid Approach for Annotating Unstructured Document

%J International Journal of Computer Applications

%@ 0975-8887

%V 120

%N 13

%P 38-41

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Annotation is a process of adding the information into the Document which is useful for extracting the information. A large number of organizations now days generate a large amount of data which is always present in the textual format. But such collections of textual document which contains a large amount of structured information which is completely hidden in the unstructured information. Information extraction algorithm is too costly because it always works on the top of the text and it does not provide the necessary structured information. In our paper, we present a method to generate the structured attribute by identifying the documents which contain the information of interest and this information in future useful for querying the database. The major contribution of this paper, we propose the algorithm, where it identifies the structured attribute which is present in the document by combining both the query workload and the content of the text document. Our Experiment result shows that our technique gives the better results compared to the methods which only relay on the content of the document and only on the query workload.

References

S. R. Jeffery, M. J. Franklin, and A. Y. Halevy, "Pay-as-you-go user feedback for dataspace systems," in ACM SIGMOD, 2008.
A. Jain and P. G. Ipeirotis, "A quality-aware optimizer for information extraction," ACM Transactions on Database Systems, 2009.
M. Jayapandian and H. Jagadish, "Expressive query specification through form customization," in Proceedings of the 11th international conference on Extending database technology: Advances in database technology, ser. EDBT '08. New York, NY, USA: ACM, 2008, pp. 416–427
J. M. Ponte and W. B. Croft, "A language modeling approach to information retrieval," in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR '98. New York, NY, USA: ACM, 1998,
R. Fagin, A. Lotem, and M. Naor, "Optimal aggregation algorithms for middleware," J. Comput. Syst. Sci. , vol. 66, pp. 614–656, June 2003.
G. Tsoumakas and I. Vlahavas, "Random k-labelsets: An ensemble method for multilabel classification," in Proceedings of the 18th European conference on Machine Learning, ser. ECML '07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 406–417

Index Terms

Computer Science

Information Sciences

Keywords

Annotation CADS form CV and QV