CFP last date
20 December 2024
Reseach Article

An Ontology-based Summarization System for Arabic Documents (OSSAD)

by Ibrahim Imam, Nihal Nounou, Alaa Hamouda, Hebat Allah Abdul Khalek
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 74 - Number 17
Year of Publication: 2013
Authors: Ibrahim Imam, Nihal Nounou, Alaa Hamouda, Hebat Allah Abdul Khalek
10.5120/12980-0237

Ibrahim Imam, Nihal Nounou, Alaa Hamouda, Hebat Allah Abdul Khalek . An Ontology-based Summarization System for Arabic Documents (OSSAD). International Journal of Computer Applications. 74, 17 ( July 2013), 38-43. DOI=10.5120/12980-0237

@article{ 10.5120/12980-0237,
author = { Ibrahim Imam, Nihal Nounou, Alaa Hamouda, Hebat Allah Abdul Khalek },
title = { An Ontology-based Summarization System for Arabic Documents (OSSAD) },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 74 },
number = { 17 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 38-43 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume74/number17/12980-0237/ },
doi = { 10.5120/12980-0237 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:42:34.785901+05:30
%A Ibrahim Imam
%A Nihal Nounou
%A Alaa Hamouda
%A Hebat Allah Abdul Khalek
%T An Ontology-based Summarization System for Arabic Documents (OSSAD)
%J International Journal of Computer Applications
%@ 0975-8887
%V 74
%N 17
%P 38-43
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the problem of increased web resources and the huge amount of information available, the necessity of having automatic summarization systems appeared. Since summarization is needed the most in the process of searching for information on the web, where the user aims at a certain domain of interest according to his query, domain-based summaries would serve the best. Despite the existence of plenty of research work in the domain-based summarization in English, there is lack of them in Arabic due to the shortage of existing knowledge bases. In this paper an Ontology-based Summarization System for Arabic Documents, OSSAD, is introduced. Domain knowledge is extracted from an Arabic corpus and represented by topic related concepts/keywords and the lexical relations among them. The user's query is first expanded by using the Arabic WordNet and then by adding the domain-specific knowledge base to the expansion. For summarization, decision tree algorithm (C4. 5) is used, which was trained by a set of features extracted from the original documents. For the testing dataset, Essex Arabic Summaries Corpus (EASC) was used. Recall Oriented Understudy for Gisting Evaluation (ROUGE) was used to compare OSSAD summaries with the human summaries along with other automatic summarization systems, showing that the proposed approach demonstrated promising results.

References
  1. Dragomir R. Radev, Kathleen McKeown, "Introduction to the Special Issue on Summarization", Computational Linguistics – Summarization, Vol 28, No. 4, pp. 399-408, 2002.
  2. Rakesh Verma, Ping Chen, Wei Lu, "A Semantic Free-text Summarization System Using Ontology Knowledge", IEEE Transactions on Information Technology in Biomedicine, Vol 5, No. 4, pp. 261-270, 2007.
  3. Kamal Sarkar, "Using Domain Knowledge for Text Summarization in Medical Domain", International Journal of Recent Trends in Engineering, Vol 1, No. 1, pp. 200-205, 2009.
  4. Vivi Nastase, "Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation", conference on Empirical Methods in Natural Language Processing, Waikiki, Honolulu, Hawaii, 2008.
  5. A. A. Kogilavani, B. Dr. P. Balasubramanie, "Ontology Enhanced Clustering Based Summarization of Medical Documents", International Journal of Recent Trends in Engineering, Vol 1, No. 1, pp. 546-549, 2009.
  6. Ping Chen, Rakesh Verma, "A Query-based Medical Information Summarization System Using Ontology Knowledge", Computer-based Medical Systems (CBMS), 19th IEEE International Symposium, USA, pp. 37 – 42, 2006.
  7. Chia-Wei Wu, Chao-Lin Liu, "Ontology-based Text Summarization for Business News Articles", ISCA 18th International Conference on Computers and Their Applications, Honolulu, Hawaii, USA, pp. 389-392, 2003.
  8. Paul Buitelaar, Philipp Cimiano, Bernardo Magnini, Ontology Learning from Text: Methods, Application and Evaluation, IOS Press, 2003.
  9. Ivan Bedini, Benjamin Nguyen, "Automatic Ontology Generation: State of the Art", Molecular Evolution, Vol 44, No. 2, pp. 226-233, 1997.
  10. Maryam Hazman, Samhaa R El-Beltagy, Ahmed Rafea, "A Survey of Ontology Learning Approaches",Vol 22, No. 9, pp. 36-43, 2011.
  11. Elena Demidova, Iryna Oelze, "Automatic Keyword Extraction for Database Search", PhD thesis, University of Hannover, 2009.
  12. Philipp Cimiano, Aleksander Pivk, Lars Schmidt-Thieme, Steffen Staab, "Learning Taxonomic Relations from Heterogeneous Evidence", In: Ontology Learning from Text: Methods, Applications and Evaluation, pp. 59-73, IOS Press, 2005.
  13. Wikipedia, http://en. wikipedia. org/wiki/Formal_concept_analysis, (10-01-2013).
  14. Wikipedia, http://en. wikipedia. org/wiki/WordNet, (10-01-2013).
  15. William BLACK, Sabri ELKATEB, "Introducing the Arabic WordNet Project", Third International WordNet Conference (GWC-06), Korea, 2006.
  16. The Stanford Natural Language Processing Group, http://nlp. stanford. edu/software/tagger. shtml, (14-01-2013).
  17. Xing Jiang, Ah-Hwee Tan, "Mining Ontological Knowledge from Domain-Specific Text Documents", Data Mining, Fifth IEEE International Conference, Singapore, 2005.
  18. Euthymios Drymonas, "Exploring multi-word similarity measures for Information Retrieval applications: the T-SRM method", PhD thesis, Technical University of Crete (TUC), Department of Electronics and Computer Engineering, 2006.
  19. Sophia Ananiadou, Hideki Mima, "An Application and Evaluation of the C/NC-value Approach for the Automatic term Recognition of Multi-Word units in Japanese", International Journal of Terminology, Vol 6, No. 2, pp. 175–194, 2000.
  20. Ahmed Cherif Mazari, Hassina Aliane, Zaia Alimazighi. "Automatic construction of ontology from Arabic texts", ICWIT, Vol 867, pp. 193-202. 2012.
  21. Mohammed Attia, Antonio Toral, Lamia Tounsi, Pavel Pecina, "Automatic Extraction of Arabic Multiword Expressions", the 7th Conference on Language Resources and Evaluation (LREC), 2010.
  22. Katerina Frantzi, Sophia Ananiadou, Hideki Mima, "Automatic recognition of multi-word terms: the C-value/NC-value method", International Journal on Digital Libraries, Vol. 3, No. 2, pp. 115-130, 2000.
  23. Philipp Cimiano, Johanna Völker, "Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery", 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Spain, pp. 227-238, 2005.
  24. Mahmoud O. EL-HAJ, Bassam H. HAMMO, "Evaluation of Query-Based Arabic Text Summarization System", Natural Language Processing and Knowledge engineering International Conference, IEEE, Jordan, pp. 1-7, 2008.
  25. Mahmoud El-Haj, Udo Kruschwitz, Chris Fox, "Multi-Document Arabic Text Summarization", Computer Science and Electronic Engineering Conference (CEEC), IEEE, UK, pp. 40 – 44, 2011.
  26. Summarisation Corpora, http://privatewww. essex. ac. uk/~melhaj/easc. htm, (14-01-2013).
  27. PCMAG. com, http://www. pcmag. com/encyclopedia_term/0,1237,t=Mechanical+Turk&i=57289,00. asp, (14-01-2013).
  28. ROUGE, http://www. berouge. com/Pages/DownloadROUGE. aspx, (14-01-2013).
  29. Kavita Ganesan, ChengXiang Zhai, Jiawei Han, "Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions", the 23rd International Conference on Computational Linguistics (COLING '10), China, 2010.
  30. Jonas Sjobergh, "Older versions of the ROUGEeval summarization evaluation system were easier to fool", the International Journal of Information Processing and Management, Vol. 43, No. 6, pp. 1500-1505, 2007.
  31. Mahmoud El-Haj, Udo Kruschwitz, Chris Fox, "Using Mechanical Turk to Create a Corpus of Arabic Summaries", the Seventh conference on International Language Resources and Evaluation, Valletta, Malta, 2010.
Index Terms

Computer Science
Information Sciences

Keywords

Arabic text summarization Knowledge-based summarization Query expansion Ontology extraction from text Arabic WordNet