Bi-Gram based Probabilistic Language Model for Template Messaging

Rina Damdoo

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

An Easily Comprehendible Unicode based Sorting Algorithm for Bangla Words

October

2013

Detection and Prevention of Sybil Attack in MANET using MAC Address

July

2015

A Comparative Study of Assessing Software Reliability using SPC: An MMLE Approach

July

2012

Performance Comparison of Three Types of Sensor Matrices for Indoor Multi-Robot Localization

Nov

2018

Reseach Article

Bi-Gram based Probabilistic Language Model for Template Messaging

by Rina Damdoo

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 66 - Number 18

Year of Publication: 2013

Authors: Rina Damdoo

10.5120/11182-6266

Rina Damdoo . Bi-Gram based Probabilistic Language Model for Template Messaging. International Journal of Computer Applications. 66, 18 ( March 2013), 11-17. DOI=10.5120/11182-6266

@article{ 10.5120/11182-6266,

author = { Rina Damdoo },

title = { Bi-Gram based Probabilistic Language Model for Template Messaging },

journal = { International Journal of Computer Applications },

issue_date = { March 2013 },

volume = { 66 },

number = { 18 },

month = { March },

year = { 2013 },

issn = { 0975-8887 },

pages = { 11-17 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume66/number18/11182-6266/ },

doi = { 10.5120/11182-6266 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:22:44.589580+05:30

%A Rina Damdoo

%T Bi-Gram based Probabilistic Language Model for Template Messaging

%J International Journal of Computer Applications

%@ 0975-8887

%V 66

%N 18

%P 11-17

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This work reports the benefits of Statistical Machine Translation (SMT) in template messaging domain. SMT has become an actual and practical technology due to significant increment in both the computational power and storage capacity of computers and the availability of large volumes of bilingual data. Through SMT a sentences written with misspelled words, short forms and chatting slang can be corrected. The problem of machine translation is to automatically produce a target-language (e. g. , Long form English) sentence from a given source-language (e. g. , Short form message) sentence. SMS Lingo is a language used by youngsters for instant messaging or for chatting on social networking websites called chatting slang. Such terms often originate with the purpose of saving keystrokes. This work presents a pioneering step in designing Bi-Gram based back-off decoder for template messages. Among the different machine translation approaches, the Probabilistic N-gram-based system has proved to be comparable with the state-of-the-art phrase-based systems. In N-gram Language Model N words are used to find the context of a word. In this work, Bi-gram LM is used. First LM is trained with bi-lingual parallel word aligned corpus to get Probability Distribution Tables (Bi-gram PDT and Uni-gram PDT). Back-off decoder along with these PDTs is then employed to translate template messages into full form text. Idea behind this work is to deal with text normalization as a translation task with the Bi-gram-based system. The main goal behind this project is to analyze the improvement in efficiency of Language Model as the size of bilingual corpus increases. This work will help researchers as a lead way in the field of N-Gram Probabilistic Machine Translation and Human Computer Interaction. This work will help users to combine multiple languages with larger vocabulary and is a useful tool for small devices like mobile phones. It is also a time saver for those who cannot operate the keys efficiently. Machine learning and translation systems, dictionary and textbook preparations, patent and reference searches and various information retrieval systems are the main applications of the work.

References

Deana Pennell, Yang Liu, "Toward text message normalization: modeling abbreviation generation", IEEE ICASSP 2011, pp. 5364-5367
Carlos A. Henr´?quez Q. , Adolfo Hern´andez H. , "A N-gram based statistical machine translation approach for text normalization on chatspeak style communication", 2009 CAW2. 0 2009, April 21, 2009, Madrid, Spain
Waqas Anvar, Xuan Wang, lu Li, Xiao-Long Wang, "A statistical based part of speech tagger for Urdu language", IEEE International Conference on Machine Learing and Cybernetics 2007, 19-22 Aug. 2007, pp. 3418-342.
Srinivas Bangalore, Vanessa Murdock, and Giuseppe Riccardi, "Bootstrapping bilingual data using consensus translation for a multilingual instant messaging system" in 19th International Conference on Computational linguistics, Taipei, Taiwan, 2002, pp. 1–7.
Yong Zhao, Xiaodong He, "Using n-gram based features for machine translation", Proceedings of NAACL HLT 2009: Short Papers, Boulder, Colorado, June 2009, pp. 205–208,
Marcello Federico, Mauro Cettolo, "Efficient handling of n-gram language models for statistical machine translation", Proceedings of the Second Workshop on Statistical Machine Translation, June 2007, Prague, pages 88–95.
Josep M. Crego, Jos´e B. Mari ˜no, "Extending MARIE: an N-gram-based SMT decoder", Proceedings of the ACL 2007 Demo and Poster Sessions, Prague, June 2007, pages 213–216
Zhenyu Lv, Wenju Liu, Zhanlei Yang, "A novel interpolated n-gram language model based on class hierarchy", IEEE International Conference, NLPKE-2009, pp. 1-5
Najeeb Abdulmutalib, Norbert Fuhr, "Language models and smoothing methods for collections with large variation in document length", IEEE International Workshop on DEXA-2008, pp. 9-14
Aarthi Reddy, Richard C. Rose, "Integration of statistical models for dictation of document translations in a machine-aided human translation task", IEEE transactions on audio, speech, and language processing, vol. 18, no. 8, November 2010
Evgeny Matusov, "System combination for machine translation of spoken and written language", IEEE transactions on audio, speech, and language processing, vol. 16, no. 7, September 2008
Keisuke Iwami, Yasuhisa Fujii , Kazumasa Yamamoto, Seiichi Nakagawa, "Out-Of-Vocabulary Term Detection By N-Gram Array With Distance From Continuous Syllable Recognition Results", IEEE 2010
Daniel Jurafsky and James H. Martin, "Speech and Language Processing", Pearson Publications, Edition 2011
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra and R. L. Mercer. "The mathematics of statistical machine translation: Parameter estimation", Computational Linguistics, 19(2):263–311, 1993.
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. , "Moses: Open source toolkit for statistical machine translation", In Proceedings of the ACL 2007
J. B. Mari˜no, R. E. Banchs, J. M. Crego, A. de Gispert, P. Lambert, J. A. Fonollosa, and M. R. Costa-juss`a. , "N-gram based machine translation", Computational Linguistics, 32(4):527–549,2006.
S. M. Katz. "Estimation of probabilities from sparse data for the language model component of a speech Recognizer". IEEE Trans. Acoust. , Speech and Signal Proc. , ASSP-35(3), 1987,pp. 400–401,
AiTi Aw, Min Zhang, Juan Xian and Jian Su, "A P"A phrase-based statistical model for SMS text normalization," in COLING/ACL, Sydney, Australia, 2006, pp. 33–40.
Catherine Kobus, Francois Yvon, and G´eraldine Damnati, "Normalizing SMS: Are two metaphors better than one?", in 22nd International Conference on Computational Linguistics, Manchester, UK, 2008, pp. 441–448.

Index Terms

Computer Science

Information Sciences

Keywords

SMS Lingo Bi-grams Template messaging