We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Language Identification of Kannada Language using N-Gram

by Deepamala. N, Ramakanth Kumar. P
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 46 - Number 4
Year of Publication: 2012
Authors: Deepamala. N, Ramakanth Kumar. P
10.5120/6896-9245

Deepamala. N, Ramakanth Kumar. P . Language Identification of Kannada Language using N-Gram. International Journal of Computer Applications. 46, 4 ( May 2012), 24-28. DOI=10.5120/6896-9245

@article{ 10.5120/6896-9245,
author = { Deepamala. N, Ramakanth Kumar. P },
title = { Language Identification of Kannada Language using N-Gram },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 46 },
number = { 4 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 24-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume46/number4/6896-9245/ },
doi = { 10.5120/6896-9245 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:38:52.999958+05:30
%A Deepamala. N
%A Ramakanth Kumar. P
%T Language Identification of Kannada Language using N-Gram
%J International Journal of Computer Applications
%@ 0975-8887
%V 46
%N 4
%P 24-28
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Language identification is an important pre-processing step for any Natural Language Processing task. Kannada Language is an Indian Language and lot of research is being carried out on Kannada Language Processing. Major parts of online documents like websites are combination of Kannada and English Sentences. Language Identification is a preprocessing step for NLP tasks like POS tagging, Sentence Boundary Detection or Data mining technique. In this paper, we present an n-gram method of language identification for documents with Kannada, Telugu and English sentences. It has been shown how performance can be improved by n-gram processing only last word of the sentence instead of complete sentence. This method could also be preprocessing step for Sentence Boundary Detection discussed in [1].

References
  1. Deepamala. N and Ramakanth Kumar. P, "Sentence Boundary Detection in Kannada Language. " International Journal of Computer Applications (0975 – 8887) Volume 39– No. 9, February 2012.
  2. M. C. Padma, P. A. Vijaya, "Global Approach for Sript Identification using Wavelet Packet based Features", International Journal of Signal Processing, Image Processing and Pattern Recognition, Vol. 3, No. 3, September, 2010.
  3. M. C. Padma, P. A. Vijaya, "Script Identification from Trilingual Documents using Profile based Features", International Journal of Computer Science and Applications, Technomathematics Research Foundation Vol. 7 No. 4, pp. 16 - 33, 2010.
  4. Mallikarjun Hangarge , B. V. Dhandra, "Offline Handwritten Script Identification in Document Images" International Journal of Computer Applications (0975 – 8887) Volume 4 – No. 6, July 2010
  5. U. Pal and B. B. Chaudhuri, "Multi-Script Line identification from Indian Documents," 7th ICDAR, 2003.
  6. W. B. Cavnar and J. M. Trenkle. "N-gram-based text categorization". In Proceedings of SDAIR-94, the 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 161. 175, Las Vegas, Nevada, U. S. A, 1994.
  7. Ted Dunning. 1994. "Statistical identification of language". Technical Report MCCS-94-273, Computing Research Lab, New Mexico State University.
  8. Lena Grothe, Ernesto William De Luca, Andreas Nürnberger. "A Comparative Study on Language Identification Methods. " Proceedings of the Sixth International Language Resources and Evaluation (LREC'08). Marrakech, 2008. 980-985.
  9. Yew Choong Chew, Yoshiki Mikami, Robin Lee. " Language Identification of Web Pages Based on Improved N-gram Algorithm. " IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 3, No. 1, May 2011
  10. M. Padro and L. Padro, "Comparing methods for language identication," Proceedings of the XX Congreso de la Sociedad Espanola para el Procesamientodel Lenguage Natural, Barcelona, Spain, 2004.
  11. Shiho Nobesawa and Ikuo Tahara, "Language Identification for Person Names Based on Statistical Information. " Proceedings of PACLIC 19, the 19th Asia-Pacific Conference on Language, Information and Computation.
  12. Vatanen, Tommi and Väyrynen, Jaakko J. and Virpioja, Sami. "Language Identification of Short Text Segments with N-gram Models. " European Language Resources Association, 2010
  13. B. Ahmed, S. Cha, "Language Identification from Text Using N-gram Based Cumulative Frequency Addition", Proceedings of CSIS 2004, Pace University, May 7th, 2004
  14. http://software. wise-guys. nl/libtextcat/
Index Terms

Computer Science
Information Sciences

Keywords

N-gram Processing Verb Suffix Langauge Identification.