Language Identification of Kannada Language using N-Gram

Deepamala. N; Ramakanth Kumar. P

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Language Identification of Kannada Language using N-Gram

by Deepamala. N, Ramakanth Kumar. P

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 46 - Number 4

Year of Publication: 2012

Authors: Deepamala. N, Ramakanth Kumar. P

10.5120/6896-9245

Deepamala. N, Ramakanth Kumar. P . Language Identification of Kannada Language using N-Gram. International Journal of Computer Applications. 46, 4 ( May 2012), 24-28. DOI=10.5120/6896-9245

@article{ 10.5120/6896-9245,

author = { Deepamala. N, Ramakanth Kumar. P },

title = { Language Identification of Kannada Language using N-Gram },

journal = { International Journal of Computer Applications },

issue_date = { May 2012 },

volume = { 46 },

number = { 4 },

month = { May },

year = { 2012 },

issn = { 0975-8887 },

pages = { 24-28 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume46/number4/6896-9245/ },

doi = { 10.5120/6896-9245 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:38:52.999958+05:30

%A Deepamala. N

%A Ramakanth Kumar. P

%T Language Identification of Kannada Language using N-Gram

%J International Journal of Computer Applications

%@ 0975-8887

%V 46

%N 4

%P 24-28

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Language identification is an important pre-processing step for any Natural Language Processing task. Kannada Language is an Indian Language and lot of research is being carried out on Kannada Language Processing. Major parts of online documents like websites are combination of Kannada and English Sentences. Language Identification is a preprocessing step for NLP tasks like POS tagging, Sentence Boundary Detection or Data mining technique. In this paper, we present an n-gram method of language identification for documents with Kannada, Telugu and English sentences. It has been shown how performance can be improved by n-gram processing only last word of the sentence instead of complete sentence. This method could also be preprocessing step for Sentence Boundary Detection discussed in [1].

References

Deepamala. N and Ramakanth Kumar. P, "Sentence Boundary Detection in Kannada Language. " International Journal of Computer Applications (0975 – 8887) Volume 39– No. 9, February 2012.
M. C. Padma, P. A. Vijaya, "Global Approach for Sript Identification using Wavelet Packet based Features", International Journal of Signal Processing, Image Processing and Pattern Recognition, Vol. 3, No. 3, September, 2010.
M. C. Padma, P. A. Vijaya, "Script Identification from Trilingual Documents using Profile based Features", International Journal of Computer Science and Applications, Technomathematics Research Foundation Vol. 7 No. 4, pp. 16 - 33, 2010.
Mallikarjun Hangarge , B. V. Dhandra, "Offline Handwritten Script Identification in Document Images" International Journal of Computer Applications (0975 – 8887) Volume 4 – No. 6, July 2010
U. Pal and B. B. Chaudhuri, "Multi-Script Line identification from Indian Documents," 7th ICDAR, 2003.
W. B. Cavnar and J. M. Trenkle. "N-gram-based text categorization". In Proceedings of SDAIR-94, the 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 161. 175, Las Vegas, Nevada, U. S. A, 1994.
Ted Dunning. 1994. "Statistical identification of language". Technical Report MCCS-94-273, Computing Research Lab, New Mexico State University.
Lena Grothe, Ernesto William De Luca, Andreas Nürnberger. "A Comparative Study on Language Identification Methods. " Proceedings of the Sixth International Language Resources and Evaluation (LREC'08). Marrakech, 2008. 980-985.
Yew Choong Chew, Yoshiki Mikami, Robin Lee. " Language Identification of Web Pages Based on Improved N-gram Algorithm. " IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 3, No. 1, May 2011
M. Padro and L. Padro, "Comparing methods for language identication," Proceedings of the XX Congreso de la Sociedad Espanola para el Procesamientodel Lenguage Natural, Barcelona, Spain, 2004.
Shiho Nobesawa and Ikuo Tahara, "Language Identification for Person Names Based on Statistical Information. " Proceedings of PACLIC 19, the 19th Asia-Pacific Conference on Language, Information and Computation.
Vatanen, Tommi and Väyrynen, Jaakko J. and Virpioja, Sami. "Language Identification of Short Text Segments with N-gram Models. " European Language Resources Association, 2010
B. Ahmed, S. Cha, "Language Identification from Text Using N-gram Based Cumulative Frequency Addition", Proceedings of CSIS 2004, Pace University, May 7th, 2004
http://software. wise-guys. nl/libtextcat/

Index Terms

Computer Science

Information Sciences

Keywords

N-gram Processing Verb Suffix Langauge Identification.