CFP last date
20 January 2025
Reseach Article

Effect of Pronoun Resolution on Document Similarity

by Atul Kumar, Sudip Sanyal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 16
Year of Publication: 2010
Authors: Atul Kumar, Sudip Sanyal
10.5120/341-519

Atul Kumar, Sudip Sanyal . Effect of Pronoun Resolution on Document Similarity. International Journal of Computer Applications. 1, 16 ( February 2010), 60-64. DOI=10.5120/341-519

@article{ 10.5120/341-519,
author = { Atul Kumar, Sudip Sanyal },
title = { Effect of Pronoun Resolution on Document Similarity },
journal = { International Journal of Computer Applications },
issue_date = { February 2010 },
volume = { 1 },
number = { 16 },
month = { February },
year = { 2010 },
issn = { 0975-8887 },
pages = { 60-64 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume1/number16/341-519/ },
doi = { 10.5120/341-519 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:42:43.577342+05:30
%A Atul Kumar
%A Sudip Sanyal
%T Effect of Pronoun Resolution on Document Similarity
%J International Journal of Computer Applications
%@ 0975-8887
%V 1
%N 16
%P 60-64
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents a novel effect of Pronoun Resolution on measurement of document similarity. In this paper we have studied the effect of pronoun resolution within the framework of the Vector Space Model and Probabilistic Latent Semantic Analysis. For this purpose we have developed a Benchmark Corpus consisting of documents whose similarity scores have been given by human beings. We measured the inter-document similarity on these documents using VSM and PLSA. We then performed pronoun resolution on these documents and again calculated the similarity using both methods. Next, the correlation coefficient of the scores was taken with those of the human generated scores. The correlation coefficients clearly demonstrated substantial and consistent improvements of the similarity score after pronoun resolution.

References
  1. Lee, D L; Huei Chuang; Seamons, K (1997) Document ranking and the Vector Space model, Software IEEE Volume 14, Issue 2 Pages 67-75, Mar/Apr (1997).
  2. Baeza –Yates, R and Riberio-Neto, B (1999) Modern Information Retrieval”, Addison Wesley Longman.
  3. Salton, G; Wong, A and Yang, C S (1975) A Vector Space Model for Automatic Indexing, Communications of the ACM, vol. 18, nr. 11, pages 613 – 620.
  4. Salton, G and Lesk, M (1971)Computer evaluation of indexing and text processing”, Prentice Hall, Ing. Englewood Cliffs, New Jersey. 143–180.
  5. Deerweater, S; Dumais S T; Furnas, G W; Landuar, T K and Harshman, R A (1990) Indexing by Latent Semantic Analysis, Journal of the American Society for Information science,41(6).391-407.
  6. Landauer, T K; Foltz P W and Laham D (1998)An Introduction to latent semantic analysis, Discourse Processes, vol. 25, pp. 259-284.
  7. Thomas Hofmann (1999) Probabilistic Latent Semantic Indexing, Annual ACM Conference on Research and Development in Information Retrieval, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, United States, pp 50 – 57
  8. Thomas Hofmann (1999) Probabilistic Latent Semantic Analysis”, Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence.
  9. Tuomo Kakkonen, Niko Myller, Jari Timonen and Erkki Sutinen (2005)Automatic Essay Grading with Probabilistic Latent semantic Analysis, Proceedings of the 2nd Workshop on Building Educational Applications Using NLP, pages 29-36, Ann Arbor, June (2005)
  10. Dempster P; Larid N M and Rubin D B (1977) Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society, 39 1-38.
  11. University of Birmingham, School of computer science http://www.cs.bham.ac.uk/%7Eaxk/ML_PLSA.ppt
  12. Pincombe, B M (2004)Comparison of human and latent semantic analysis (LSA) judgments of pairwise document similarities for a news corpus”, Defence Science and Technology Organisation Research Report DSTO–RR–0278
  13. Girolami and Kaban A ,(2003)On an Equivalence between PLSI and LDA”, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 433-434, Toronto, Canada ACM Press.
  14. Turney P (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning.
  15. Leacock C and Chodorow(1998) Combining local context and Word Net sense similarity for word sense identification,In WordNet an Electronic Lexical Database. The MIT Press.
  16. Wu Z and Palmer M (1994)Verb semantics and lexical selection, Proceedings of the Annual Meeting of the Association for Computational Linguistics.
  17. Rocchio J(1971)“Relevance feedback in information retrieval, Prentice Hall, Ing. Englewood Cliffs, New Jersey.
  18. Mihalcea R, Corley C and Strapparava C(2006) Corpus-based and Knowledge-based Measures of Text Semantic Similarity, AAAI’06, pp 775-780.
  19. Hammouda K M, Kamel M S (2004)Document similarity using a Phrase Indexing Graph Model, Knowledge and Information Systems Springer –Verlag London 6:710-727(2004)
  20. Xu R, Wunsch II D (2005) Survey of clustering algorithm. IEEE Trans Neural Netw 16(3):645-678.
  21. Vivekanandan K and Suguna J(2008)Inferring Document Similarity using the Fuzzy measure, Medwell Journals - Asian Journal of Information Technology 7 (1):1-5.
  22. Wan X and Peng Y(2005)The earth mover's distance as a semantic measure for document similarity, Proceedings of the 14th ACM international Conference on Information and Knowledge Management Bremen, Germany, October 31 - November 05, CIKM '05. ACM Press, New York .
Index Terms

Computer Science
Information Sciences

Keywords

Document Similarity Pronoun Resolution Information Retrieval Statistical Algorithm