CFP last date
20 January 2025
Reseach Article

Bi-Gram based Probabilistic Language Model for Template Messaging

by Rina Damdoo
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 66 - Number 18
Year of Publication: 2013
Authors: Rina Damdoo
10.5120/11182-6266

Rina Damdoo . Bi-Gram based Probabilistic Language Model for Template Messaging. International Journal of Computer Applications. 66, 18 ( March 2013), 11-17. DOI=10.5120/11182-6266

@article{ 10.5120/11182-6266,
author = { Rina Damdoo },
title = { Bi-Gram based Probabilistic Language Model for Template Messaging },
journal = { International Journal of Computer Applications },
issue_date = { March 2013 },
volume = { 66 },
number = { 18 },
month = { March },
year = { 2013 },
issn = { 0975-8887 },
pages = { 11-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume66/number18/11182-6266/ },
doi = { 10.5120/11182-6266 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:22:44.589580+05:30
%A Rina Damdoo
%T Bi-Gram based Probabilistic Language Model for Template Messaging
%J International Journal of Computer Applications
%@ 0975-8887
%V 66
%N 18
%P 11-17
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This work reports the benefits of Statistical Machine Translation (SMT) in template messaging domain. SMT has become an actual and practical technology due to significant increment in both the computational power and storage capacity of computers and the availability of large volumes of bilingual data. Through SMT a sentences written with misspelled words, short forms and chatting slang can be corrected. The problem of machine translation is to automatically produce a target-language (e. g. , Long form English) sentence from a given source-language (e. g. , Short form message) sentence. SMS Lingo is a language used by youngsters for instant messaging or for chatting on social networking websites called chatting slang. Such terms often originate with the purpose of saving keystrokes. This work presents a pioneering step in designing Bi-Gram based back-off decoder for template messages. Among the different machine translation approaches, the Probabilistic N-gram-based system has proved to be comparable with the state-of-the-art phrase-based systems. In N-gram Language Model N words are used to find the context of a word. In this work, Bi-gram LM is used. First LM is trained with bi-lingual parallel word aligned corpus to get Probability Distribution Tables (Bi-gram PDT and Uni-gram PDT). Back-off decoder along with these PDTs is then employed to translate template messages into full form text. Idea behind this work is to deal with text normalization as a translation task with the Bi-gram-based system. The main goal behind this project is to analyze the improvement in efficiency of Language Model as the size of bilingual corpus increases. This work will help researchers as a lead way in the field of N-Gram Probabilistic Machine Translation and Human Computer Interaction. This work will help users to combine multiple languages with larger vocabulary and is a useful tool for small devices like mobile phones. It is also a time saver for those who cannot operate the keys efficiently. Machine learning and translation systems, dictionary and textbook preparations, patent and reference searches and various information retrieval systems are the main applications of the work.

References
  1. Deana Pennell, Yang Liu, "Toward text message normalization: modeling abbreviation generation", IEEE ICASSP 2011, pp. 5364-5367
  2. Carlos A. Henr´?quez Q. , Adolfo Hern´andez H. , "A N-gram based statistical machine translation approach for text normalization on chatspeak style communication", 2009 CAW2. 0 2009, April 21, 2009, Madrid, Spain
  3. Waqas Anvar, Xuan Wang, lu Li, Xiao-Long Wang, "A statistical based part of speech tagger for Urdu language", IEEE International Conference on Machine Learing and Cybernetics 2007, 19-22 Aug. 2007, pp. 3418-342.
  4. Srinivas Bangalore, Vanessa Murdock, and Giuseppe Riccardi, "Bootstrapping bilingual data using consensus translation for a multilingual instant messaging system" in 19th International Conference on Computational linguistics, Taipei, Taiwan, 2002, pp. 1–7.
  5. Yong Zhao, Xiaodong He, "Using n-gram based features for machine translation", Proceedings of NAACL HLT 2009: Short Papers, Boulder, Colorado, June 2009, pp. 205–208,
  6. Marcello Federico, Mauro Cettolo, "Efficient handling of n-gram language models for statistical machine translation", Proceedings of the Second Workshop on Statistical Machine Translation, June 2007, Prague, pages 88–95.
  7. Josep M. Crego, Jos´e B. Mari ˜no, "Extending MARIE: an N-gram-based SMT decoder", Proceedings of the ACL 2007 Demo and Poster Sessions, Prague, June 2007, pages 213–216
  8. Zhenyu Lv, Wenju Liu, Zhanlei Yang, "A novel interpolated n-gram language model based on class hierarchy", IEEE International Conference, NLPKE-2009, pp. 1-5
  9. Najeeb Abdulmutalib, Norbert Fuhr, "Language models and smoothing methods for collections with large variation in document length", IEEE International Workshop on DEXA-2008, pp. 9-14
  10. Aarthi Reddy, Richard C. Rose, "Integration of statistical models for dictation of document translations in a machine-aided human translation task", IEEE transactions on audio, speech, and language processing, vol. 18, no. 8, November 2010
  11. Evgeny Matusov, "System combination for machine translation of spoken and written language", IEEE transactions on audio, speech, and language processing, vol. 16, no. 7, September 2008
  12. Keisuke Iwami, Yasuhisa Fujii , Kazumasa Yamamoto, Seiichi Nakagawa, "Out-Of-Vocabulary Term Detection By N-Gram Array With Distance From Continuous Syllable Recognition Results", IEEE 2010
  13. Daniel Jurafsky and James H. Martin, "Speech and Language Processing", Pearson Publications, Edition 2011
  14. P. F. Brown, S. A. Della Pietra, V. J. Della Pietra and R. L. Mercer. "The mathematics of statistical machine translation: Parameter estimation", Computational Linguistics, 19(2):263–311, 1993.
  15. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. , "Moses: Open source toolkit for statistical machine translation", In Proceedings of the ACL 2007
  16. J. B. Mari˜no, R. E. Banchs, J. M. Crego, A. de Gispert, P. Lambert, J. A. Fonollosa, and M. R. Costa-juss`a. , "N-gram based machine translation", Computational Linguistics, 32(4):527–549,2006.
  17. S. M. Katz. "Estimation of probabilities from sparse data for the language model component of a speech Recognizer". IEEE Trans. Acoust. , Speech and Signal Proc. , ASSP-35(3), 1987,pp. 400–401,
  18. AiTi Aw, Min Zhang, Juan Xian and Jian Su, "A P"A phrase-based statistical model for SMS text normalization," in COLING/ACL, Sydney, Australia, 2006, pp. 33–40.
  19. Catherine Kobus, Francois Yvon, and G´eraldine Damnati, "Normalizing SMS: Are two metaphors better than one?", in 22nd International Conference on Computational Linguistics, Manchester, UK, 2008, pp. 441–448.
Index Terms

Computer Science
Information Sciences

Keywords

SMS Lingo Bi-grams Template messaging