International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 147 - Number 14 |
Year of Publication: 2016 |
Authors: Md. Mahfuzur Rahaman |
10.5120/ijca2016911305 |
Md. Mahfuzur Rahaman . A Revised Unicode based Sorting Algorithm for Bengali Texts. International Journal of Computer Applications. 147, 14 ( Aug 2016), 35-40. DOI=10.5120/ijca2016911305
This paper describes a sorting algorithm for Bengali texts which is one of the most vital tasks for Bengali Natural Language Processing. As Unicode is much more preferable than ASCII encoding, we need to use this representation for Bengali Language. But due to some distinct properties of Bengali Language, they cannot be sorted directly using the order in Unicode character scheme. A few works have been done on this topics – some of them are for ASCII encoding whether some are for Unicode. But still they have some drawbacks and still there is no standard to sort Bengali texts. In this paper, we have discussed about the previous approaches and proposing a revised and easier procedure to sort Unicode Bengali texts. We used a mapping to simplify the sorting process. The efficiency depends on the efficiency of the sorting algorithm. This method is able to sort any Unicode Bengali texts. It will also work for Unicode text of any language if we just change the mapping part. So the process is both keyboard and language independent.