We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

A Text Clustering Comparison Methodology

by F.M. Kwale, P.W. Wagacha, A. Mwaura
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 139 - Number 13
Year of Publication: 2016
Authors: F.M. Kwale, P.W. Wagacha, A. Mwaura
10.5120/ijca2016909515

F.M. Kwale, P.W. Wagacha, A. Mwaura . A Text Clustering Comparison Methodology. International Journal of Computer Applications. 139, 13 ( April 2016), 12-19. DOI=10.5120/ijca2016909515

@article{ 10.5120/ijca2016909515,
author = { F.M. Kwale, P.W. Wagacha, A. Mwaura },
title = { A Text Clustering Comparison Methodology },
journal = { International Journal of Computer Applications },
issue_date = { April 2016 },
volume = { 139 },
number = { 13 },
month = { April },
year = { 2016 },
issn = { 0975-8887 },
pages = { 12-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume139/number13/24550-2016909515/ },
doi = { 10.5120/ijca2016909515 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:40:51.188870+05:30
%A F.M. Kwale
%A P.W. Wagacha
%A A. Mwaura
%T A Text Clustering Comparison Methodology
%J International Journal of Computer Applications
%@ 0975-8887
%V 139
%N 13
%P 12-19
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text Clustering is a problem of dividing text documents into groups, such that documents in one group are more similar than those in other groups. Although comparisons of the different algorithms have been done in an attempt to choose some over the others, such comparisons have been found to be either too limited or inadequate. In such comparisons, either the researchers (who are usually the authors of the algorithms being compared with others) did not apply a formal comparison methodology, or the comparisons were based on inadequate data, metrics and procedures.Also, the comparisons always focus on only the aspects where their algorithms are superior to the other algorithms. The few algorithms being compared with theirs obviously seem to be carefully selected such that they are the ones performing lesser than theirs on those aspects.Thus, there is still a large gap on the most suitable methodology for comparing the algorithms. In this paper, a methodology for fairly comparing text clustering algorithms is proposed.

References
  1. Chen, J 2005, Comparison of Clustering Algorithms and its Application to Document Clustering, PhD Thesis, Princeton University.
  2. Chen, Y, Qin, B, Liu, T, Liu, Y, & Li, S 2010,‘The Comparison of SOM and K-means for Text Clustering’, International Journal of Computer and Information Science, vol. 3, no. 2.
  3. Prelic, A, Bleuler, S, Zimmermann, P, Wille, A, Buhlmann, P, Gruissem, W, Hennig, L, Thiele, L, &Zitzler, E 2006, ‘A systematic comparison and evaluation of biclustering methodsfor gene expression data’, Oxford University Press, vol. 22, no. 9.
  4. Greene, D 2007, A State-of-the-Art Toolkit for Document Clustering, PhD Thesis, University of Dublin.
  5. Amigo, E, Gonzalo, J, Artiles, J &Verdejo, F 2009, A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints, Technical Report, Departamento de Lenguajes y SistemasInformaticos, UNED, Madrid, Spain, viewed 19 January 2015, http://nlp.uned.es/docs/amigo2007a.pdf.
  6. Akinola, S &Oyabugbe O 2015, ‘Accuracies and Training Times of Data Mining Classsifications Algorithms: An Empirical Comparative Study’, Journal of software Engineering and Applications, vol. 8, 470-477.
  7. Shahzad, W 2010, Classification and Associative Classification Rule Discovery Using Ant Colony Optimization, PhD Thesis, FAST National University of Computer & Emerging Sciences.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Text Clustering Metrics.