CFP last date
20 December 2024
Reseach Article

Robust Rule-based Approach in Arabic Processing

by Riadh Ouersighni
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 93 - Number 12
Year of Publication: 2014
Authors: Riadh Ouersighni
10.5120/16269-6001

Riadh Ouersighni . Robust Rule-based Approach in Arabic Processing. International Journal of Computer Applications. 93, 12 ( May 2014), 31-37. DOI=10.5120/16269-6001

@article{ 10.5120/16269-6001,
author = { Riadh Ouersighni },
title = { Robust Rule-based Approach in Arabic Processing },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 93 },
number = { 12 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 31-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume93/number12/16269-6001/ },
doi = { 10.5120/16269-6001 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:15:35.486484+05:30
%A Riadh Ouersighni
%T Robust Rule-based Approach in Arabic Processing
%J International Journal of Computer Applications
%@ 0975-8887
%V 93
%N 12
%P 31-37
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A parsing system is a key element of many computer applications such as Information Retrieval, Knowledge Extraction and automatic translation. This paper presents a robust large-scale parser system for parsing Arabic sentences. From a practical point of view, the system is able to analyze real-world sentences thanks to a wide coverage of its linguistic knowledge that is realized within the DIINAR-MBC European project . The parser is designed for robustness against difficult input that cannot be parsed correctly according to the standard grammar rules in the system, whether it is an extra-grammatical, ill-formed or unexpected input. Most systems use algorithmic approaches to robustness where parsing programs are extended to include heuristics to handle defect cases. This study adopts another solution based on a robust grammar-based approach for parsing. It consists of introducing robust rules in the grammar itself and relaxing constraints if necessary. The parser has been evaluated against real-world sentences and the results were very encouraging. The parser provides 95% coverage.

References
  1. Al-Daoud, E. and Basata, A, 2009, "A Framework to Automate the Parsing of Arabic Language Sentences," Computer Journal of The International Arab Journal of Information Technology, vol. 6, no. 2, pp. 191-195.
  2. Al-Taani, A. , Msallam, M. and Wedian, S. , 2010, "A Top-Down Chart Parser for Analyzing Arabic Sentences", The International Arab Journal of Information Technology (IAJIT).
  3. Attia, M. , 2008, "Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation," UK, PhD Thesis.
  4. Bataineh, B. and Bataineh,E. , 2009, An Efficient Recursive Transition Network Parser for Arabic Language, in Proceedings of the World Congress on Engineering WCE, UK, pp. 124- 127.
  5. Ben Fraj, F. , Ben Othmane-Zribi, and Ben Ahmed, M. , 2010, "Parsing Arabic Texts Using Real Patterns of Syntactic Trees" The Arabian Journal for Science and Engineering, Volume 35, Number 2C.
  6. Blache, P and Azulay, D. O. , 2002, "Parsing ill-formed inputs with constraint graphs", Lecture notes in computer science ISSN 0302-9743, Computational linguistics and intelligent text processing: Mexico City, 17-23 February.
  7. Chanod, J. P. , 2001, "Robust Parsing and Beyond", in J. C. Junqua and G. van Noord (eds. ) Robustness in Language and Speech Technology, Dordrecht, Kluwer, pp. 187-204.
  8. Coppen. P. A. , 1996, The use of AGFL in sequential Modular NLP systems in proceedings of the first AGFL Workshop, CSI (computing Science Institute Nijmegen), Nijmegen University.
  9. Dichy, J. and Hassoun, M. , 2005, The DIINAR. 1-"?????" Arabic Lexical Resource, an outline of contents and methodology". In the ELRA Newsletter, Vol. 10, n°2, April-June 2005 : 5-10.
  10. Ditters. E, 1992, A formal approach to arabic syntax: the noun phrase and the verb phrase, Phd, Nijmegne University, Holland.
  11. Ditters. E, 2001, A formal Grammar for the description of sentence structure in Modern Standard Arabic, in proceedings of the Arabic Language Processing Workshop, Association for computational linguistics (ACL) 39th Annual Meeting and 10th Conference of the European Chapter, Toulouse.
  12. El-Beze M, Merrialdo B. Rozeron B. and Derouault A. , 1994, Accentuation automatique de textes par des méthodes probabilistes, Technique et sciences informatique. Volume 13- n°6/1994, pages 797-815.
  13. Koster, C. H. A. and Oltmans. E. (Eds) 1996, proceedings of the first AGFL workshop, Computing Science Institute, Nijmegen.
  14. Koster, C. H. A. and Tiberius, C. , 1996, AGFL Grammars for full-Text Information Retrieval, in proceeding of the NLDB.
  15. Koster. C. H. A. , 1991, "Affix Grammars For Natural Languages", in H. Albas & B. Melichar (eds), Attribute Grammar Applications and Systems, SLNCS, 545, Springer, pp-469-484.
  16. Lavie, A. , 1994, An Integrated Heuristic Scheme for Partial Parse Evaluation. In Proceedings of the 32nd meeting of the Association for Computational Linguistics (ACL 94), pages 316-319, Las Cruces, New Mexico, New Mexico State University.
  17. Magerman, D. and Weir, C. 1992, Efficiency, Robustness, and Accuracy in Picky Chart Parsing. In Proceedings of the 30st meeting of the Association for Computational Linguistics (ACL 92), pages 40-47, Newark, Delaware, University of Delaware.
  18. Mohammed, M. A. and Omar, N. , 2011, "Rule Based Shallow Parser for Arabic Language", Journal of Computer Science 7, Science Publications.
  19. Oltmans, E. , 1999, "A Knowledge-based Approach to Robust Parsing", the Netherlands, PhD Thesis.
  20. Othman, E. , Shaalan, K and Rafea, A. , 2003, A Chart Parser for Analyzing Modern Standard Arabic Sentence, MT Summit IX Workshop on Machine Translation for Semitic Languages: Issues and Approaches, USA.
  21. Ouersighni, R, 2001, "A major offshoot of the DIINAR-MBC project: AraParse, a morpho-syntactic analyzer of unvowelled Arabic texts". In ACL 39th Annual Meeting. Workshop on Arabic Language processing: Status and Prospect, Toulouse, pp. 66-72.
  22. Ouersighni, R, 2008, Towards Developing A Robust Large-Scale Parser for Arabic Sentences, in Proceedings of the International Arab Conference on Information Technology, pp. 15-18.
  23. Ouersighni, R. 2002, La conception et la réalisation d'un système d'analyse morphosyntaxique pour l'arabe : utilisation pour la détection et le diagnostic des fautes. PHD, Lyon2 University.
  24. Ouersighni, R. and Ghenima, M. 2009, Un système d'analyse morphologique à large couverture de l'arabe, actes de la 2ème Conférence internationale Systèmes d'Information & Intelligence Economique (www. siie. fr), IHE édition pp. 559-572, 12-14, Hammamet, Tunisie.
  25. Rauzy, S. and Blache, 2012, P. Robustness and processing difficulty models. A pilot study for eye-tracking data on the French Treebank, in proceedings of Eye-tracking and NLP workshop, COLING-2012
  26. Shaalan, K, Farouk, A. and Rafea, A, 1999, Towards An Arabic Parser for Modern Scientific Text, In Proceeding of the 2nd Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), pp. 103-114, Egypt.
  27. Shaalan, K. , 2010, "Rule-Based Approach in Arabic Natural Language" Processing. Int. J. Inf. Commun. Technol. , 3: 11-19.
  28. Strzalkowsk, T. 1993, Natural language processing in large-scale text retrieval tasks, in the first text retrieval conference (TREC-1), D. K. Harman, ed. , U. S. Department of commerce, National Institute of Standards and Technology, Washington, DC, 173-187, NIST Special Publication 500-207.
  29. Tounsi, L. and Van Genabith, J. 2010, Arabic parsing using grammar transforms. In: LREC - 7th conference on International Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta.
  30. Tounsi, L. , Attia, M. and Genabith, J. , 2009 PARSING ARABIC USING TREEBANK-BASED LFG RESOURCES, Proceedings of the LFG09 Conference, Miriam Butt and Tracy Holloway King (Editors) CSLI Publications.
Index Terms

Computer Science
Information Sciences

Keywords

Morphological analysis Lexicon Parsing Formal grammar Arabic language