Statistical Machine Translation from Indonesian to Regional Languages in Indonesia

Dewi Soyusiawaty; Bella Okta Sari Miranda

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Statistical Machine Translation from Indonesian to Regional Languages in Indonesia

by Dewi Soyusiawaty, Bella Okta Sari Miranda

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 184 - Number 49

Year of Publication: 2023

Authors: Dewi Soyusiawaty, Bella Okta Sari Miranda

10.5120/ijca2023922603

Dewi Soyusiawaty, Bella Okta Sari Miranda . Statistical Machine Translation from Indonesian to Regional Languages in Indonesia. International Journal of Computer Applications. 184, 49 ( Mar 2023), 18-23. DOI=10.5120/ijca2023922603

@article{ 10.5120/ijca2023922603,

author = { Dewi Soyusiawaty, Bella Okta Sari Miranda },

title = { Statistical Machine Translation from Indonesian to Regional Languages in Indonesia },

journal = { International Journal of Computer Applications },

issue_date = { Mar 2023 },

volume = { 184 },

number = { 49 },

month = { Mar },

year = { 2023 },

issn = { 0975-8887 },

pages = { 18-23 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume184/number49/32634-2023922603/ },

doi = { 10.5120/ijca2023922603 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:24:22.259478+05:30

%A Dewi Soyusiawaty

%A Bella Okta Sari Miranda

%T Statistical Machine Translation from Indonesian to Regional Languages in Indonesia

%J International Journal of Computer Applications

%@ 0975-8887

%V 184

%N 49

%P 18-23

%D 2023

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The current condition in Indonesia has 617 regional languages. There are 15 regional languages that are declared extinct and 139 others are in endangered status. Utilization of computer-based tools can be used as an effort to preserve regional languages digitally according to current technological developments, including by building digital dictionaries and translation machines. The digital dictionary has the ability to translate regional languages into Indonesian with the approach used is translating word for word, although it is not effective when done manually. An alternative solution is to create a machine translation application. Machine translation can be dictionary-based or language-parallel corpus data-based. Statistical Machine Translation (SMT) is a machine translation approach with translation results generated on the basis of a statistical model whose parameters are taken from the results of a parallel corpus analysis. The quality of the SMT translation results is influenced by several factors. The most fundamental factor is the number of parallel corpus available and the quality of the corpus used as the basis for building translation models and language models. This study aims to determine the role of parallel corpus in improving SMT accuracy, especially in regional languages in Indonesia. The research data used is parallel corpus text of 3000 pairs of sentences. Based on the results of the research that has been done, it is found that the optimization of parallel corpus can increase the value of translation accuracy. Better translation accuracy can be achieved with optimized parallel corpus. Besides that, testing with single sentences will provide higher accuracy than using compound sentences. Testing of 3000 random parallel corpus parallels can increase accuracy by 11.4%, higher than testing with 3000 random parallel corpus.

References

M. G. Asparilla, H. Sujaini, and R. D. Nyoto, “Perbaikan Kualitas Korpus untuk Meningkatkan Kualitas Mesin Penerjemah Statistik ( Studi Kasus : Bahasa Indonesia – Jawa Krama ),” vol. 1, no. 2, pp. 66–74, 2018.
P. Permata and Z. Abidin, “Statistical Machine Translation Pada Bahasa Lampung Dialek Api Ke Bahasa Indonesia,” J. Media Inform. Budidarma, vol. 4, no. 3, p. 519, 2020, doi: 10.30865/mib.v4i3.2116.
L. Specia, Statistical machine translation, no. May 2012. 2012. doi: 10.4018/978-1-4666-2169-5.ch004.
T. Apriani, H. Sujaini, and N. Safriadi, “Pengaruh kuantitas korpus terhadap akurasi mesin penerjemah statistik bahasa Bugis Wajo ke bahasa Indonesia,” JUSTIN (Jurnal Sist. dan Teknol. Informasi), vol. 1, no. 1, pp. 1–6, 2016.
S. Mandira, H. Sujaini, and A. B. Putra, “Perbaikan Probabilitas Lexical Model Untuk Meningkatkan Akurasi Mesin Penerjemah Statistik,” J. Edukasi dan Penelit. Inform., vol. 2, no. 1, pp. 3–7, 2016, doi: 10.26418/jp.v2i1.13393.
F. Rahutomo, R. A. Asmara, and D. K. P. Aji, “Computational analysis on rise and fall of Indonesian vocabulary during a period of time,” 2018 6th Int. Conf. Inf. Commun. Technol. ICoICT 2018, vol. 0, no. c, pp. 75–80, 2018, doi: 10.1109/ICoICT.2018.8528812.
H. Ardhi, H. Sujaini, and A. B. Putra, “Analisis Penggabungan Korpus dari Hadits Nabi dan Alquran untuk Mesin Penerjemah Statistik,” J. Linguist. Komputasional, vol. 1, no. 1, p. 31, 2018.
R. Nugroho Aditya, T. Adji Bharata, and B. Hantono S, “Penerjemahan Bahasa Indonesia dan Bahasa Jawa Menggunakan Metode Statistik Berbasis Frasa,” Semin. Nas. Teknol. Inf. dan Komun., vol. 2015, no. Sentika, 2015.
M. A. Sulaeman and A. Purwarianti, “Development of Indonesian-Japanese statistical machine translation using lemma translation and additional post-process,” Proc. - 5th Int. Conf. Electr. Eng. Informatics Bridg. Knowl. between Acad. Ind. Community, ICEEI 2015, no. i, pp. 54–58, 2015, doi: 10.1109/ICEEI.2015.7352469.
A. A. Suryani, D. H. Widyantoro, A. Purwarianti, and Y. Sudaryat, “Experiment on a phrase-based statistical machine translation using PoS Tag information for Sundanese into Indonesian,” 2015 Int. Conf. Inf. Technol. Syst. Innov. ICITSI 2015 - Proc., 2016, doi: 10.1109/ICITSI.2015.7437678.
D. Soyusiawaty and A. H. S. Jones, “Pemanfaatan Bahasa Alami Dalam Penelusuran Informasi Skripsi Melalui Digital Library,” Mob. Forensics, vol. 2, no. 1, pp. 22–31, 2020, doi: 10.12928/mf.v2i1.2040.
K. M. Shahih and A. Purwarianti, “Utterance disfluency handling in Indonesian-English machine translation,” 4th IGNITE Conf. 2016 Int. Conf. Adv. Informatics Concepts, Theory Appl. ICAICTA 2016, pp. 0–4, 2016, doi: 10.1109/ICAICTA.2016.7803104.
M. Aadil and M. Asger, “An Overview of Statistical Machine Translation Tools,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 7, no. 7, p. 289, 2017, doi: 10.23956/ijarcsse/v7i7/0201.
A. Wibawa, “Indonesian-to-Javanese Machine Translation,” Int. J. Innov. Manag. Technol., vol. 4, no. 4, pp. 451–454, 2013, doi: 10.7763/ijimt.2013.v4.440.

Index Terms

Computer Science

Information Sciences

Keywords

Regional languages Bengkulu Malay BLEU Parallel Corpus Statistical Machine Translation