International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 50 - Number 11 |
Year of Publication: 2012 |
Authors: Putu Wira Buana, Sesaltina Jannet D.r.m., I Ketut Gede Darma Putra |
10.5120/7817-1105 |
Putu Wira Buana, Sesaltina Jannet D.r.m., I Ketut Gede Darma Putra . Combination of K-Nearest Neighbor and K-Means based on Term Re-weighting for Classify Indonesian News. International Journal of Computer Applications. 50, 11 ( July 2012), 37-42. DOI=10.5120/7817-1105
KNN is one of the accepted classification tool, it used all training samples in the classification which cause to a high level of computation complexity. To resolve this problem, it is necessary to combine traditional KNN algorithm and K-Means cluster algorithm that is proposed in this paper. After completing the preprocessing step, the first thing to do is weighting the word (term) by usingTerm Frequency-Inverse Document Frequency (TF-IDF). TF-IDF weightedthe words calculating the number of words that appear in a document. Second, grouping all the training samples of each category of K-means algorithm, and take all the cluster centers as the new training sample. Third, the modified training samples are used for classification with KNN algorithm. Finally, calculate the accuracy of the evaluation using precision, recall and f-measure. The simulation results show that the combination of the proposed algorithm in this study has a percentage accuracy reached 87%, an average value of f-measure evaluation= 0. 8029 with the best k-values= 5 and the computation takes 55 second for one document.