CFP last date
20 August 2024
Reseach Article

A Survey of Tools and Techniques for Multiword Expression Detection

by Ujwala P. Mahajan, Ajay S. Patil, Nita V. Patil
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 32
Year of Publication: 2024
Authors: Ujwala P. Mahajan, Ajay S. Patil, Nita V. Patil
10.5120/ijca2024923851

Ujwala P. Mahajan, Ajay S. Patil, Nita V. Patil . A Survey of Tools and Techniques for Multiword Expression Detection. International Journal of Computer Applications. 186, 32 ( Aug 2024), 11-18. DOI=10.5120/ijca2024923851

@article{ 10.5120/ijca2024923851,
author = { Ujwala P. Mahajan, Ajay S. Patil, Nita V. Patil },
title = { A Survey of Tools and Techniques for Multiword Expression Detection },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2024 },
volume = { 186 },
number = { 32 },
month = { Aug },
year = { 2024 },
issn = { 0975-8887 },
pages = { 11-18 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number32/a-survey-of-tools-and-techniques-for-multiword-expression-detection/ },
doi = { 10.5120/ijca2024923851 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-08-05T23:36:31.985754+05:30
%A Ujwala P. Mahajan
%A Ajay S. Patil
%A Nita V. Patil
%T A Survey of Tools and Techniques for Multiword Expression Detection
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 32
%P 11-18
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Multiword expressions (MWEs) find application in almost all NLP tasks such as machine translation, information retrieval, question-answering, etc. Multiword expressions are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasy. MWEs in Marathi are quite varied and many of these are of the types that are not encountered in English. This paper presents a survey regarding MWE research done for Indian and Foreign languages. The study and observations related to approaches, techniques, and features required to implement MWE detection system for various languages are reported.

References
  1. Gupta, Vaishali, Nisheeth Joshi, and Iti Mathur. "Approach for multiword expression recognition & annotation in urdu corpora." 2017 Fourth International Conference on Image Information Processing (ICIIP). IEEE, 2017.
  2. Goyal, Kapil Dev, and Vishal Goyal.(2018) "Extraction of Named Entities from Punjabi-English Parallel Corpora."
  3. Goyal, Kapil Dev, and Vishal Goya(2017)l” Multiword Expressionsin Indian Languages “
  4. Barman, A. K., Sarmah, J., & Sarma, S. K. (2013, August). Automatic identification of assamese and bodo multiword expressions. In 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 26-30). IEEE.
  5. Md Jaynal Abedin , Bipul Syam Purkayastha and Kh. Raju Singha 2015 Automated Multiword Expressions Detection in Bengali Md Jaynal Abedin et al. / International Journal of Computer Science Engineering (IJCSE) ISSN: 2319-7323 Vol. 4 No.02 March 2015
  6. Tanmoy Chakraborty 2014 Towards Identification of Nominal Multiword Expressions in Bengali Language Open Access Library Journal, 1: e582. http://dx.doi.org/10.4236/oalib.1100582
  7. Vivekananda Gayen and Kamal Sarkar. 2013. Automatic identification of Bengali noun-noun compounds using random forest. In Proceedings of the 9th Workshop on Multiword Expressions, pages 64–72, Atlanta, Georgia, USA, June. Association for Computational Linguistics
  8. Vivekananda Gayen, Kamal Sarkar, “A Machine Learning Approach for the Identification of Bengali Noun-Noun Compound Multiword Expressions”, and Proceedings of ICON-2013: 10th International Conference on Natural Language Processing, http: //ltrc.iiit.ac.in/proceedings /ICON-2013.
  9. Chakraborty, Tanmoy. "Identification of Nominal Multiword Expressions in Bengali using CRF." 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI). IEEE, 2012.
  10. Tanmoy Chakraborty, Dipankar Das, Sivaji Bandyopadhyay, "Semantic Clustering: an Attempt to Identify Multiword Expressions in Bengali", Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE 2011), Portland, Oregon, USA, 23 June 2011, pp. 8–13.
  11. Chakraborty, T., Bandyopadhyay, S.: Identification of reduplication in Bengali corpus and their semantic analysis: a rule-based approach. In: Proceedings of 23rd International Conference on Computational Linguistics, pp. 73–76 (2010)
  12. Joon, R., & Singhal, A. (2017). Analysis of MWES in Hindi text using NLTK. Int. J. Nat. Lang. Computer, 6(1), 13-22.
  13. Dhirendra Singh, Sudha Bhingardive, Pushpak Bhattacharyya, “Multiword Expression Dataset for Indian Languages”, LREC 2016. Portoroz, Slovenia, May 23-28 2016.
  14. Singh, R., Ojha, A. K., & Jha, G. N. (2016). Classification and identification of reduplicated multiword expressions in Hindi. Classification and identification of reduplicated multiword expressions in Hindi, WILDRE, 18-22.
  15. .Patel, Dhirendra Singh Sudha Bhingardive Kevin, and Pushpak Bhattacharyya, “Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features” In 12th International Conference on Natural Language Processing, p. 291, 2015
  16. Joon, R. & Singhal, A., (2015), "A System for Compound Adverbs MWEs extraction in Hindi." in the proceedings of International Conference on Contemporary Computing (IC3), IEEE Computer Society, Vol. 8, pp336-341.
  17. R. Mahesh K. Sinha “Stepwise Mining of Multi-Word Expressions in Hindi” Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE 2011), Portland, Oregon, USA, 23 June 2011. @ 2011 Association for Computational Linguistics
  18. S. Venkatapathy, and A. Joshi, “Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features”, In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Association for Computational Linguistics, pp. 899-906, 2009
  19. R. Mahesh K. Sinha. Mining Complex Predicates In Hindi Using Parallel Hindi-English Corpus, ACL-IJCNLP 2009 Workshop on Multi Word Expression, Singapore. 2009b
  20. Debasri Chakrabarti, Hemang Mandalia, Ritwik Priya, Vaijayanthi Sarma, Pushpak Bhattacharyya (2008) Hindi Compound Verbs and their Automatic Extraction Coling 2008: Companion volume – Posters and Demonstrations, pages 27–30 Manchester, August 2008
  21. Anoop Kunchukuttan, Om P. Damani, " A System for Compound Noun Multiword Expression Extraction for Hindi", Proceedings of ICON-2008: 6th International Conference on Natural Language Processing, Macmillan Publishers, India.
  22. Amitabha Mukerjee, Ankit Soni, and Achla M Raina. 2006. Detecting complex predicates in hindi using pos projection across parallel corpora. In Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pages 28–35. Association for Computational Linguistics
  23. Venkatapathy, Sriram, and Aravind K. Joshi. 2006. Using information about multi-word expressions for the word-alignment task. In Proceedings of Coling-ACL 2006: Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, pp. 20-27.
  24. Venkat apathy, S., Agrawal, P., Josh, A.K. : Relative compositionality of Noun +Verb multi-word expressions in Hindi. In: Proceedings of ICON Conference on Natural Language Processing, Kanpur (2005)
  25. Nongmeikapam, K., Nonglenjaoba, L., Nirmal, Y., Bandyopadhyay, S.: Reduplicated MWE (RMWE) helps in improving the CRF based Manipuri POS tagger. Int. J. Inf. Technol. Computer Sci. 2(1), 45–59 (2012)
  26. N. Kishorjit, L. Dhiraj, N. Bikramjit Singh, Ng. Mayekleima Chanu, and B. Sivaji, Identification of Reduplicated Multiword Expressions Using CRF, A. Gelbukh (Ed.):CICLing 2011, LNCS vol.6608, Part I, pp. 41–51, Berlin, Germany: Springer-Verlag
  27. Kishorjit Nongmeikapam, NingombamHerojit Singh, Bishworjit Salam and Sivaji Bandyopadhyay. 2011. Transliteration of CRF Based Multiword Expression (MWE) in Manipuri: From Bengali Script Manipuri to Meitei Mayek (Script) Manipuri. International Journal of Computer Science and Information Technology, vol.2 (4) . pp. 1441-1447
  28. Kishorjit Nongmeikapam, Sivaji Bandyopadhyay, “Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification”, International Journal of Computer Science & Information Technology (IJCSIT) Vol 3, No 5, Oct 2011, pp 53-66
  29. Singh, Thoudam Doren, and Sivaji Bandyopadhyay. "Integration of reduplicated multiword expressions and named entities in a phrase based statistical machine translation system." Proceedings of 5th international joint conference on natural language processing. 2011.
  30. N. Kishorjit, and B. Sivaji, Identification of Reduplicated MWEs in Manipuri: A Rule based Approached. In the Proceeding 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL-2010), San Francisco, pp 49-54, 2010
  31. N. Kishorjit and S. Bandyopadhyay, Identification of MWEs Using CRF in Manipuri and Improvement Using Reduplicated MWEs, In the Proceedings of 8th International Conference on Natural Language (ICON-2010), IIT Kharagpur, India, pp 51-57, 201
  32. Thoudam Doren Singh and Sivaji Bandyopadhya Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), pages 35–42, the 23rd International Conference on Computational Linguistics (COLING), Beijing, August 2010
  33. Scott Songlin Piao a, Paul Rayson b, Dawn Archer a, Tony McEnery(2005) Comparing and combining a semantic tagger and a statistical tool for MWE extraction Computer Speech and Language 19 (2005) 378–397
  34. Tim Van de Cruys and Begona Villada Moir ˜ on: Semantics-based Multiword Expression Extraction Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pages 25–32, Prague, June 2007. c 2007 Association for Computational Linguistics
  35. Ram Boukobza, Ari Rappoport, “Multi-Word Expression Identification Using Sentence Surface Features”, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-7 August 2009, pp. 468–477
  36. Ramisch, C., Villavicencio, A., Boitet, C.: mwetoolkit: a Framework for Multiword Expression Identification, LREC, 2010.
  37. Istvan Nagy T, Veronika Vincze, Gabor Berend, "Domain-dependent identification of multiword expressions", proceedings of the Conference: Recent Advances in Natural Language Processing, RANLP 2011, Hissar, Bulgaria, 12-14 September 2011.
  38. Veronika Vincze, Istvan Nagy T., and Gabor Berend,
  39. Multiword Expressions and Named Entities in the Wiki50 Corpus Proceedings of Recent Advances in Natural Language Processing, pages 289–295, Hissar, Bulgaria, 12-14 September 2011
  40. Victoria Rosen, Gyri Smørdal Losnegaard, Koenraad De Smedt, Eduard Bejcek, Agata Savary, Adam Przepiorkowski, Petya Osenova, and Verginica Barbu Mititelu A Survey of Multiword Expressions in Treebanks Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14) 2015
  41. S.Kumova-Metin, “Neighbour Unpredictability Measure in Multiword Expression Extraction”, International Journal of Computer Systems Science and Engineering: 31-3, 2016.
  42. Agrawal, S., Sanyal, R., & Sanyal, S. (2018). Hybrid method for automatic extraction of multiword expressions. International Journal of Engineering & Technology, 7(2.6), 33-38.
  43. Mohammed Attia, Antonio Toral, Lamia Tounsi, Pavel Pecina, Josef van Genabith, “Automatic Extraction of Arabic Multiword Expressions”, Proceedings of the Multiword Expressions: From Theory to Applications (MWE 2010), Beijing, August 2010, pp 19–27
  44. Attia, Mohammed. 2006. Accommodating Multiword Expressions in an Arabic LFG Grammar. In Salakoski, Tapio, Filip Ginter, Sampo Pyysalo, Tapio Pahikkala (Eds.): Advances in Natural Language Processing. Vol. 4139, pp. 87–98. SpringerVerlag: Berlin, Heidelberg.
  45. Boulaknadel S, Daille B &Aboutajdine D. (2008). A multi-word term extraction program for Arabic language. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pp. 1485–1488, Marrakech, Morocco
  46. Habash, Nizar, Owen Rambow and Ryan Roth. 2009. A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization. In the 2 nd International Conference on Arabic Language Resources and Tools (MEDAR 2009), pp. 102–109. Cairo, Egypt
  47. Bar, M. Diab and A. Hawwari, "Arabic Multiword Expressions," in Language, Culture, and Computation. Computational Linguistics and Linguistics: Essays Dedicated to Yaacov Choueka on the Occasion of His 75th Birthday, Part III, N. Dershowitz and E. Nissan, Eds., Berlin, Springer Berlin Heidelberg, 2014, pp. 64-81
  48. Piao, S, Sun, G, Rayson, P and Yuan, Q “Automatic extraction of Chinese multiword expressions with a statistical tool” Paper presented at Workshop on Multi-word-expressions in a Multilingual Context held in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, 2006
  49. W. Li, Q. Lu and J. Liu,“ Chinese typed collocation extraction using corpus based syntactic collocation patterns”, IEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering,2007.
  50. Jian Xu, Jingsong Yu, Huilin Wang(2010) Automatic Extraction of Multiword Expressions Combining Statistical and Similarity Approaches 2010 Fourth International Conference on Genetic and Evolutionary Computing.
  51. Spence Green, Marie-Catherine de Marneffe, John Bauer, Christoper D. Manning, "Multiword expression identification with tree substitution grammars: A parsing TOUR DE FORCE with French", Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, , John McIntyre Conference Centre, Edinburgh, UK, 27-31 July 2011, pp.725-735
  52. Matej Katuscak, Jan Genci, "Identification of Multiword Expressions for the Slovak Language", Proceedings of the Faculty of Electrical Engineering and Informatics of the Technical University of Kosice, 2015.
  53. Elena Tutubalina (2015) Clustering-based Approach to Multiword Expression Extraction and Ranking Proceedings of NAACL-HLT 2015, pages 39–43, Denver, Colorado, May 31 – June 5, 2015. c 2015 Association for Computational Linguistics
  54. Natalia Loukachevitch1,2(B) and Ekaterina Parkhomenko2 Recognition of Multiword Expressions Using Word Embeddings: 16th Russian Conference, RCAI 2018, Moscow, Russia, September 24-27, 2018, Proceedings
  55. Dani Gunawan, Amalia Amalia, Indra Charisma (2016)Automatic Extraction of Multiword Expression Candidates for Indonesian Language 2016 6th IEEE International Conference on Control System, Computing and Engineering, 25–27 November 2016, Penang, Malaysia
  56. Arvi Hurskainen, Multiword Expressions and Machine Translation Technical Reports in Language Technology Report No 1, 2008
  57. Y. Tsvetkov and S. Wintner, “Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources”, Proceedings of the 2011Conference on Empirical Methods in Natural Language Processing, pages:836-845, Edinburgh, Scotland, UK, July 2731,2011
  58. P.Pecina, “A Machine Learning Approach to Multiword Expression Extraction”, Proceedings of the LREC 2008 Workshop towards a Shared Task for Multiword Expressions, 2008.
  59. Weller, M., Heid, U., 2010. Extraction of German multiword expressions from parsed corpora using context features. In: Proceedings of LREC 2010
  60. S. Kim, J. Yoon and M. Song,"Automatic Extraction of Collocations from Korean Text", Computers and the Humanities 35: 273–297, 2001.
  61. K.Oflazer, O.Çetinoğlu, and B. Say," Integrating morphology with multi-word expression processing in Turkish", Proceedings of the Workshop on Multiword Expressions: Integrating Processing, .p. 64-71, 2004.
  62. S.Kumova-Metin andB. Karaoğlan,, "Collocation Extraction in Turkish Texts Using Statistical Methods", 7th International Conference on Natural Language Processing (LNCS-ISI) IceTAL, Reykjavik, Iceland,2010
Index Terms

Computer Science
Information Sciences

Keywords

MWEs Machine Translation Information Retrieval Multiword expressions.