CFP last date
20 January 2025
Reseach Article

Predicting Human Assessment of Machine Translation Quality by Combining Automatic Evaluation Metrics using Binary Classifiers

by Michael Paul, andrew Finch, Eiichiro Sumita
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 59 - Number 10
Year of Publication: 2012
Authors: Michael Paul, andrew Finch, Eiichiro Sumita
10.5120/9581-4062

Michael Paul, andrew Finch, Eiichiro Sumita . Predicting Human Assessment of Machine Translation Quality by Combining Automatic Evaluation Metrics using Binary Classifiers. International Journal of Computer Applications. 59, 10 ( December 2012), 1-7. DOI=10.5120/9581-4062

@article{ 10.5120/9581-4062,
author = { Michael Paul, andrew Finch, Eiichiro Sumita },
title = { Predicting Human Assessment of Machine Translation Quality by Combining Automatic Evaluation Metrics using Binary Classifiers },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 59 },
number = { 10 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume59/number10/9581-4062/ },
doi = { 10.5120/9581-4062 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:03:47.573075+05:30
%A Michael Paul
%A andrew Finch
%A Eiichiro Sumita
%T Predicting Human Assessment of Machine Translation Quality by Combining Automatic Evaluation Metrics using Binary Classifiers
%J International Journal of Computer Applications
%@ 0975-8887
%V 59
%N 10
%P 1-7
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents a method to predict human assessments of machine translation (MT) quality based on a combination of binary classifiers using a coding matrix. The multiclass categorization problem is reduced to a set of binary problems that are solved using standard classification learning algorithms trained on the results of multiple automatic evaluation metrics. Experimental results using a large-scale human-annotated evaluation corpus show that the decomposition into binary classifiers achieves higher classification accuracies than the multiclass categorization problem. In addition, the proposed method achieves a higher correlation with human judgments on the sentence level compared to standard automatic evaluation measures.

References
  1. Yasuhiro Akiba, Marcello Federico, Noriko Kando, Hiromi Nakaiwa, Michael Paul, and Jun'ichi Tsujii. Overview of the IWSLT04 evaluation campaign. In Proc. of the International Workshop on Spoken Language Translation, pages 1–12, Kyoto, Japan, 2004.
  2. Yasuhiro Akiba, Kenji Imamura, and Eiichiro Sumita. Using multiple edit distances to automatically rank machine translation output. In Proc. of MT Summit VIII, pages 15– 20, 2001.
  3. Erin Allwein, Robert Schapire, and Yoram Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113– 141, 2000.
  4. Satanjeev Banerjee and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, 2005.
  5. Luisa Bentivogli, Marcello Federico, Giovanni Moretti, and Michael Paul. Getting Expert Quality from the Crowd for MT Evaluation. In Proceedings of the MT Summmit XIII, pages 521–528, Xiamen, China, 2011.
  6. John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis, and Nicola Ueffing. Confidence estimation for SMT. In Final Report of the JHU Summer Workshop, 2003.
  7. Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar Zaidan. Findings of the 2010 joint workshop on smt and metrics for machine translation. In Proceedings of the Joint Fifth Workshop on SMT and MetricsMATR, pages 17–53, Uppsala, Sweden, 2010. Association for Computational Linguistics.
  8. Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263–286, 1995.
  9. George Doddington. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. InProc. of the HLT 2002, pages 257–258, San Diego, USA, 2002.
  10. Hiroshi Echizen-ya and Kenji Araki. Automatic evaluation of machine translation based on recursive acquisition of an intuitive common parts continuum. In Proc. of the MT SUMMIT XI, pages 151–158, Copenhagen, Denmark, 2007.
  11. Trevor Hastie and Robert Tibshirani. Classification by pairwise coupling. The Annals of Statistics, 26(2):451–471, 1998.
  12. Genichiro Kikui, Seiichi Yamamoto, Toshiyuki Takezawa, and Eiichiro Sumita. Comparative study on corpora for speech translation. IEEE Transactions on Audio, Speech, Language Processing, 14(5):1674–1682, 2006.
  13. Alex Kulesza and Stuart M. Shieber. A learning approach to improving sentence-level MT evaluation. In Proc. of the TMI04, USA, 2004.
  14. Sonja Niessen, Franz J. Och, Gregor Leusch, and Hermann Ney. An evaluation tool for machine translation: Fast evaluation for machine translation research. In Proc. of the 2nd LREC, pages 39–45, Athens, Greece, 2000.
  15. Franz J. Och and Hermann Ney. Statistical multi-source translation. In Proc. of the MT Summit VIII, pages 253– 258, Santiago de Compostella, Spain, 2001.
  16. Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. BLEU: a method for automatic evaluation of machine translation. In Proc. of the 40th ACL, pages 311–318, Philadelphia, USA, 2002.
  17. Michael Paul, Marcello Federico, and Sebastian St¨ucker. Overview of the IWSLT 2010 Evaluation Campaign. In Proc. of IWSLT, pages 3–27, Paris, France, 2010.
  18. Mark Przybocki and Kay Peterson. NIST Open Machine Translation Evaluation. http://www. nist. gov/speech/tests/mt, 2009.
  19. Mark Przybocki, Kay Peterson, and Sebastien Bronsart. Metrics for MAchine TRanslation Challenge (MetricsMATR08). http://nist. gov/speech/ tests/metricsmatr/2008/results, 2008.
  20. Christopher B. Quirk. Training a sentence-level machine translation confidence measure. In Proc. of 4th LREC, pages 825–828, Portugal, 2004.
  21. Rulequest. Data mining tool c5. 0. http://rulequest. com/see5-info. html, 2004.
  22. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. A study of translation edit rate with targeted human annotation. In Proc. of the AMTA, pages 223–231, Cambridge and USA, 2006.
  23. Eiichiro Sumita, Setsuo Yamada, Kazuhide Yamamoto, Michael Paul, Hideki Kashioka, Kai Ishikawa, and Satoshi Shirai. Solutions to problems inherent in spoken-language translation: The ATR-MATRIX approach. In Proc. of the MT Summit VII, pages 229–235, Singapore, 1999.
  24. Joseph Turian, Luke Shen, and I. Melamed. Evaluation of machine translation and its evaluation. In Proc. of the MT Summit IX, pages 386–393, New Orleans, USA, 2003.
  25. John White, Theresa O'Connell, and Lynn Carlson. Evaluation of machine translation. In Proc. of the Human Language Technology Workshop (ARPA), pages 206–210, 1993.
  26. John White, Theresa O'Connell, and Francis O'Mara. The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In Proc of the AMTA, pages 193– 205, 1994.
Index Terms

Computer Science
Information Sciences

Keywords

Evaluation Metric Combination Human Assessment Predictionifx