International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 46 - Number 4 |
Year of Publication: 2012 |
Authors: Deepamala. N, Ramakanth Kumar. P |
10.5120/6896-9245 |
Deepamala. N, Ramakanth Kumar. P . Language Identification of Kannada Language using N-Gram. International Journal of Computer Applications. 46, 4 ( May 2012), 24-28. DOI=10.5120/6896-9245
Language identification is an important pre-processing step for any Natural Language Processing task. Kannada Language is an Indian Language and lot of research is being carried out on Kannada Language Processing. Major parts of online documents like websites are combination of Kannada and English Sentences. Language Identification is a preprocessing step for NLP tasks like POS tagging, Sentence Boundary Detection or Data mining technique. In this paper, we present an n-gram method of language identification for documents with Kannada, Telugu and English sentences. It has been shown how performance can be improved by n-gram processing only last word of the sentence instead of complete sentence. This method could also be preprocessing step for Sentence Boundary Detection discussed in [1].