CFP last date
20 December 2024
Reseach Article

Amharic Text Chunker using Conditional Random Fields

by Birhan Hailu, Birchiko Achamyeleh, Gebeyehu Belay
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 183 - Number 30
Year of Publication: 2021
Authors: Birhan Hailu, Birchiko Achamyeleh, Gebeyehu Belay
10.5120/ijca2021921694

Birhan Hailu, Birchiko Achamyeleh, Gebeyehu Belay . Amharic Text Chunker using Conditional Random Fields. International Journal of Computer Applications. 183, 30 ( Oct 2021), 59-63. DOI=10.5120/ijca2021921694

@article{ 10.5120/ijca2021921694,
author = { Birhan Hailu, Birchiko Achamyeleh, Gebeyehu Belay },
title = { Amharic Text Chunker using Conditional Random Fields },
journal = { International Journal of Computer Applications },
issue_date = { Oct 2021 },
volume = { 183 },
number = { 30 },
month = { Oct },
year = { 2021 },
issn = { 0975-8887 },
pages = { 59-63 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume183/number30/32125-2021921694/ },
doi = { 10.5120/ijca2021921694 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:18:23.916438+05:30
%A Birhan Hailu
%A Birchiko Achamyeleh
%A Gebeyehu Belay
%T Amharic Text Chunker using Conditional Random Fields
%J International Journal of Computer Applications
%@ 0975-8887
%V 183
%N 30
%P 59-63
%D 2021
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper introduces Amharic text chunkerusing conditional random fields. To get the optimal feature set of the chunker; the researchers’ conduct different experiments using different scenarios until a promising result obtained. In this study different sentences are collected from Amharic grammar books, new articles, magazines and news of Walta Information Center (WIC) for the training and testing datasets. Thus, these datasets were analyzed and tagged manually and used as a corpus for our model training and testing. The entire datasets were chunk tagged manually for the training dataset and approved by linguistic professionals. For the identification of the boundary of the phrases IOB2 chunk specification is selected and used in this study. The result of all experiments is reported with the maximum overall accuracy off 97.26%, with a window size of two on both sides, with their corresponding POS tag of each token and the worst performance achieved is 84.57%, with only the window size of one word on both the left and right sides.

References
  1. A. Ibrahim, “A Hybrid Approach to Amharic Base Phrase Chunking and Parsing,” Addis Abeba University, 2013.
  2. N. Khoufi, C. Aloulou, and L. H. Belguith, “Chunking Arabic Texts Using Conditional Random,” IEEE, pp. 428–432, 2014.
  3. K. Sarkar and V. Gayen, “Bengali Noun Phrase Chunking Based on Conditional Random Fields,” IEEE, pp. 148–153, 2014.
  4. K. H. AMARE and A, “Tigrigna question answering system for factoid questions,” Addis Abeba University, 2016.
  5. D. Abebaw, “LETEYEQ (ሌጠየቅ)-A Web Based Amharic Question Answering System for Factoid Questions Using Machine Learning Approach,” Addis Abeba University, 2013.
  6. Muhe Seid, “TETEYEQ: Amharic Question Answering System for Factoid Questions,” Addis Abeba University, 2009.
  7. Y. Zhao and T. Zhao, “Exploiting clause boundary information as features for Chinese functional chunk parsing,” IEEE, pp. 874–878, 2016.
  8. A. Ibrahim and Y. Assabie, “Hierarchical Amharic Base Phrase Chunking Using HMM with Error Pruning,” Springer Int. Publ. Switz., vol. 8387, pp. 126–135, 2014.
  9. X. Vwhp, “shallow parsing natural language processing implementation for intelligent automatic customer service,” IEEE, pp. 274–279, 2014.
  10. W. Ali, M. K. Malik, S. Hussain, S. Shahid, and A. Ali, “urdu noun phrase chunking,” IEEE, pp. 494–497, 2010.
  11. A. M. and F. C. N. P. J. Lafferty, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” Proc. Eighteenth Int. Conf. Mach. Learn. (ICML 2001), pp. 282–289, 2001.
  12. “CSA (Central Statistics Agency), Addis Ababa, Ethiopia: Central Statistics Agency,” http://www.csa.gov.et, 2007. .
  13. G. B. Kumar, “UCSG Shallow Parser : A Hybrid Architecture for a Wide Coverage Natural Language Parsing System,” 2007.
  14. Taku Kudo, “Machine Learning and Data Mining Approaches to to Practical Natural Language Processing,” Nara Institute of Science and Technology, 2003.
  15. K. Roman and T. Katrin, “Classical Probabilistic Models and Conditional Random Fields,” Dortmund, 2007.
  16. A. M. and F. C. N. P. J. Lafferty, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” Proc. Eighteenth Int. Conf. Mach. Learn. (ICML 2001), pp. 282–289, 2001.
Index Terms

Computer Science
Information Sciences

Keywords

Amharic text chunker base phrase chunker conditional random fields clause boundary identification