A Method for Measuring Semantic Similarity of Documents

andreia Dal Ponte Novelli; Jose Maria Parente De Oliveira

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

A Method for Measuring Semantic Similarity of Documents

by andreia Dal Ponte Novelli, Jose Maria Parente De Oliveira

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 60 - Number 7

Year of Publication: 2012

Authors: andreia Dal Ponte Novelli, Jose Maria Parente De Oliveira

10.5120/9703-4151

andreia Dal Ponte Novelli, Jose Maria Parente De Oliveira . A Method for Measuring Semantic Similarity of Documents. International Journal of Computer Applications. 60, 7 ( December 2012), 17-22. DOI=10.5120/9703-4151

@article{ 10.5120/9703-4151,

author = { andreia Dal Ponte Novelli, Jose Maria Parente De Oliveira },

title = { A Method for Measuring Semantic Similarity of Documents },

journal = { International Journal of Computer Applications },

issue_date = { December 2012 },

volume = { 60 },

number = { 7 },

month = { December },

year = { 2012 },

issn = { 0975-8887 },

pages = { 17-22 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume60/number7/9703-4151/ },

doi = { 10.5120/9703-4151 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:05:54.348975+05:30

%A andreia Dal Ponte Novelli

%A Jose Maria Parente De Oliveira

%T A Method for Measuring Semantic Similarity of Documents

%J International Journal of Computer Applications

%@ 0975-8887

%V 60

%N 7

%P 17-22

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

With the documents increasing amount available in local or Web repositories, the comparison methods have to analyze large documents sets with different types and terminologies to obtain a response with minimum documents and with as much useful content to the user. For large documents sets where each document can contain many pages, it is impossible to compute the similarity using the entire document, to require creating solutions to analyze a few meaningful terms, in summary form. This article presents TextSSimily, a method that compares documents semantically considering only short text for comparison (text summary), using semantics to improve the set of responses and summaries to improve time to obtain results for large sets of documents.

References

D. Metzler et al, "Similarity Measures for Short Segments of Text" Advances in Information Retrieval, vol. 44, pp. 16-27, 2007.
G. R. B. Fachin, "RecuperaçãoInteligente de Informação e Ontologias: um levantamentonaárea de Ciência da Informação," BIBLOS, vol. 23, no. 1, 2009.
K. Breitman, Web Semantica: A Internet do Futuro. : LTC, 2006.
R. R. Souza, "Sistemas de Recuperação de Informação e Macanismos de Buscana Web: Panorama atual e Tendências," Perspectiva em Ciência da Informação, vol. 11, no. 2, pp. 161-173, 2006.
G. A. Navarro, "A Guided tour to Aproximate String Matching," ACM Computing Surveys, vol. 33, pp. 31-88, 2001.
R. Baeza-Yates and B. Ribeiro-Neto, Modern Informarion Retrieval. New York: Addison-Wesley, 1999.
J. C. P. Carvalho and A. S. Silva, "Finding Similar Indenties among Objects from Multiple mWev Sources. ," WIDM, pp. 90-93, 2003.
M. Weis and F. Naumann, "Detecting Duplicate Objects in XML Documents," IQIS, pp. 10-19, 2004.
C. D. Manning and H. Schutze, Foundations of Statitical Natural Language Processing. : The MIT Press, 1999.
L. Fitzpatrick and M. Dent, "Automatic Feedback Using Past Queries: Social Searching," SIGIR, pp. 306-313, 1997.
T. Landauer et al, "An Introduction Latent Samantic Analysis," Discourse Processes, pp. 259-284, 1998.
|P. D. Turney et al, "Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL," ECML 01, 2001.
M. Sahami and T. Heilman, "A Web_Based Kernel Function for Measuring the Similarity of Short Snippet," WWW 06, pp. 2-9, 2006.
F. Giunchiglia and P. Shavaiko, "Semantic Matching," The Knowledge Engineering Review Journal, vol. 18, pp. 265-280, 2004.
J. Brank et al, "Automatic Evaluation of Ontologies," Natural Language Processing and Text Mining, pp. 193-219, 2007.
A. Isaac et al, "An Epirical Study of Instance- based Ontology Matching," 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference, pp. 253-266, 2007.
A. D. P. Novelli and J. M. P. Oliveira, "ESimilyOnto: Um MétoroeficienteparaObtenção da Similaridade entre Documentos da Web Semântica," Sinergia, pp. 89-99, 2008.
G. Varelas et al, "Semantic Smimilarity Methods in WordNet and their Application to Information Retrieval on de Web," 7th ACM International Workshop on Web Informatrion and data Management, pp. 10-16, 2005.
R. Thiagarajan et al, "Computing Semantic Similarity Using Ontologies," International Semantic Web Conference, 2008.
M. W. Berry et al, "Using Linear Algebra for Intelligent Information Retrieval," SIAM Review, vol. 37, no. 4, pp. 573-595, 1995.
S. Deerwester et al, "Indexing by Latent Semantic Analysis," Journal fo The American Society for Information Science, vol. 40, pp. 391-407, 1990.
D. A. H. Foronda, "EstudoExploratório da IndexaçãoSemânticaLatente e das Funções Peso," Dissertação de Mestrado, 2005.
D. S. Mendoça, "AnáliseProbabilistica de SemânticaLatenteAplicada a Sistemas de Recomendação," Dissertação de Mestrado, 2008.
D. Sankoff and J. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and practice of Sequence Comparison. Nova York: Addison-Wesley, 1983.
K. Breitman, M. A. Casanova e W. Truszkowski, Semantic Web: Concepts, Technologies and Applications, Srpinger, 2007.

Index Terms

Computer Science

Information Sciences

Keywords

Semantic Similarity Comparison by Similarity Short Text Comparison