International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 157 - Number 9 |
Year of Publication: 2017 |
Authors: Titin Winarti, Jati Kerami, Sunny Arief |
10.5120/ijca2017912761 |
Titin Winarti, Jati Kerami, Sunny Arief . Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming. International Journal of Computer Applications. 157, 9 ( Jan 2017), 8-13. DOI=10.5120/ijca2017912761
In a term based clustering technique with the vector space model, the issue of high dimensional vector space due to the number of words used always appears. This causes the clustering performance drops because the distance among the points tends to have the same value. The reduction of dimension by decreasing the number of words can be done by stemming. Stemming was used as term selection to reduce the many terms generated on preprocessing. The utilization of algorithm of enhance confix stripping stemmer reduced the terms that must be processed of 199.358 terms resulted from 108 text documents, became 5.476 terms result of the stemming. This reduction would speed up the process and saved the storage media. The evaluation by utilizing clustering was done using confusion matrix. The accuracy of experiment increased.