CFP last date
20 December 2024
Reseach Article

Design and Development of a Bangla Semantic Lexicon and Semantic Similarity Measure

by Manjira Sinha, Tirthankar Dasgupta, Abhik Jana, Anupam Basu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 95 - Number 5
Year of Publication: 2014
Authors: Manjira Sinha, Tirthankar Dasgupta, Abhik Jana, Anupam Basu
10.5120/16588-6297

Manjira Sinha, Tirthankar Dasgupta, Abhik Jana, Anupam Basu . Design and Development of a Bangla Semantic Lexicon and Semantic Similarity Measure. International Journal of Computer Applications. 95, 5 ( June 2014), 8-16. DOI=10.5120/16588-6297

@article{ 10.5120/16588-6297,
author = { Manjira Sinha, Tirthankar Dasgupta, Abhik Jana, Anupam Basu },
title = { Design and Development of a Bangla Semantic Lexicon and Semantic Similarity Measure },
journal = { International Journal of Computer Applications },
issue_date = { June 2014 },
volume = { 95 },
number = { 5 },
month = { June },
year = { 2014 },
issn = { 0975-8887 },
pages = { 8-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume95/number5/16588-6297/ },
doi = { 10.5120/16588-6297 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:18:36.787781+05:30
%A Manjira Sinha
%A Tirthankar Dasgupta
%A Abhik Jana
%A Anupam Basu
%T Design and Development of a Bangla Semantic Lexicon and Semantic Similarity Measure
%J International Journal of Computer Applications
%@ 0975-8887
%V 95
%N 5
%P 8-16
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, we have proposed a hierarchically organized semantic lexicon in Bangla and also a graph based edge-weighting approach to measure semantic similarity between two Bangla words. We have also developed a graphical user interface to represent the lexical organization. Our proposed lexical structure contains only relations based on semantic association. We have included the frequency of each word over five Bangla corpuses in our lexical structure and also associated more details to words such as, whether the words are mythological or not, whether it can be used as verb or not, in order to use the word as a verb which word should be appended to it etc. As we have earlier discussed, this lexicon can be used in various applications like categorization, semantic web, and natural language processing applications like, document clustering, word sense disambiguation, machine translation, information retrieval, text comprehension and question-answering systems.

References
  1. Aitchison, J. (2012). Words in the mind: An introduction to the mental lexicon Wiley-Blackwell.
  2. Boyd-Graber, J. , Fellbaum, C. , Osherson, D. , and Schapire, R. (2006). Adding dense, weighted connections to wordnet. In Proceedings of the Third International WordNet Conference, pages 29–36.
  3. Das, A. and Bandyopadhyay, S. (2010). Semanticnet-perception of human pragmatics. In Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon, pages 2–11, Beijing, China. Coling 2010 Organizing Committee.
  4. Fellbaum, C. (2010). Wordnet. Theory and Applications of Ontology: Computer Applications, pages 231–243.
  5. Jiang, J. and Conrath, D. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.
  6. Kim, Y. and Kim, J. (1990). A model of knowledge based information retrieval with hierarchical concept graph. Journal of Documentation, 46(2):113–136.
  7. Lee, J. , Kim, M. , and Lee, Y. (1993). Information retrieval based on conceptual distance in is-a hierarchies. Journal of documentation, 49(2):188–207.
  8. Levelt, W. (1989). Speaking: from intention to articulationmit press. Cambridge, MA.
  9. Li, Y. , Bandar, Z. , and McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. Knowledge and Data Engineering, IEEE Transactions on, 15(4):871–882.
  10. Liu, H. and Singh, P. (2004). Conceptnet—a practical commonsense reasoning tool-kit. BT technology journal, 22(4):211–226.
  11. Mukhopadhyay, A. (2005). Samsad Samarthaksabdokosh. SahityaSamsad, 12 edition.
  12. Muller, S. (2008). The mental lexicon. GRIN Verlag.
  13. Rada, R. , Mili, H. , Bicknell, E. , and Blettner, M. (1989). Development and application of a metric on semantic nets. Systems, Man and Cybernetics, IEEE Transactions on, 19(1):17–30.
  14. Resnik, P. (1993a). Selection and information: a class-based approach to lexical relationships. IRCS Technical Reports Series, page 200.
  15. Resnik, P. (1993b). Semantic classes and syntactic ambiguity. In Proc. of ARPA Workshop on Human Language Technology, pages 278–283.
  16. Richardson, R. , Smeaton, A. , and Murphy, J. (1994). Using wordnet as a knowledge base for measuring semantic similarity between words. Technical report, Technical Report Working Paper CA-1294, School of Computer Applications, Dublin City University.
  17. Roy, M. and Muqtadir, M. (2008). Semi-automatic building of wordnet for Bangla. PhD thesis, School of Engineering and Computer Science (SECS), BRAC University.
  18. Ruppenhofer, J. , Ellsworth, M. , Petruck, M. , Johnson, C. , and Scheffczyk, J. (2010). Framenet ii: Extended theory and practice, available online at h ttp. framenet. icsi. berkeley. edu.
  19. Seashore, R. and Eckerson, L. (1940). The measurement of individual differences in general English vocabularies. Journal of Educational Psychology; Journal of Educational Psychology, 31(1):14.
  20. Tversky, A. (1977). Features of similarity. Psychological review, 84(4):327.
  21. Wang, T. and Hirst, G. (2011). Refining the notions of depth and density in wordnet-based semantic similarity measures. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 1003–1011, Stroudsburg, PA, USA. Association for Computational Linguistics.
  22. Biemann, C. (2007). Unsupervised Natural Language Processing. In Proceedings of the NAACL-HLT 2007 Doctoral consotorium,Rochester,April 2007, pages 37-40.
  23. Lin, D. (1998). Automatic Retrieval and Clustering of Similar Words. In COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2,Pages 768-774.
  24. Biemann, C. , Shin,S. , Choi,K. (2004). Semiautomatic extension of CoreNet using a bootstrapping mechanism on corpus-based co-occurrences. In COLING '04 Proceedings of the 20th international conference on Computational Linguistics,Article No. 1227.
  25. Davidov, D. , Rappoport, A. (2006). Efficient unsupervised discovery of word categories using symmetric patterns and high frequency words. In ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics,Pages 297-304.
  26. Davidov, D. , Rappoport, A. , Koppel, M. (2007). Fully Unsupervised Discovery of Concept-Speci?c Relationships by Web Mining. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics,Prague,Czech Republic,June 2007,pages 232-239.
  27. Biemann, C. (2006). Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing , Pages 73-80.
  28. Gentner, D. (1982). Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In S. A. Kuczaj, editor, Language development: Vol. 2. Language, thought, and culture, pages 301-334. Erl- baum, Hillsdale, NJ.
  29. Quasthoff,U. ,Biemann, C. ,Wolff, C. Named entity learning and verification: expectation maximization in large corpora,COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20Pages 1-7.
  30. Sinha, Manish and Reddy, Mahesh and Bhattacharyya, Pushpak, (2006) "An approach towards construction and application of multilingual indo-wordnet", 3rd Global Wordnet Conference (GWC 06), Jeju Island, Korea.
Index Terms

Computer Science
Information Sciences

Keywords

Bangla SynNet Semantic Similarity Category Concept Sub-concept Cluster