International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 30 - Number 7 |
Year of Publication: 2011 |
Authors: Tirupathaiah Kommi, Srikanth Jatla |
10.5120/3653-5105 |
Tirupathaiah Kommi, Srikanth Jatla . Text Categorization using Distributional Features and Semantic Equivalence. International Journal of Computer Applications. 30, 7 ( September 2011), 30-35. DOI=10.5120/3653-5105
In text mining domain, text categorization is widely used which is nothing but assigning predefined categories to text. The process of assigning values to words based on the occurrences of words known as bag-of-word approach was used by previous researchers in order to find how frequently a word is used in the document. This approach has a drawback as it does not consider other features of words except the count of it. This paper throws light into assigning other values to a word known as distributional features. This approach is novel and the distributional features include the position of first occurrence of word and compactness of its appearances. Our experimental results revealed that text categorization has been improved with the help of distributional features and semantic equivalence. The research has thrown light into another fact that distributional features are very useful when writing style is casual and document is long. The semantic equivalence used to extend equivalence rough set approach.