CFP last date
20 January 2025
Reseach Article

An Efficient Text Clustering Framework

by Francis M. Kwale
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 79 - Number 8
Year of Publication: 2013
Authors: Francis M. Kwale
10.5120/13763-1607

Francis M. Kwale . An Efficient Text Clustering Framework. International Journal of Computer Applications. 79, 8 ( October 2013), 30-38. DOI=10.5120/13763-1607

@article{ 10.5120/13763-1607,
author = { Francis M. Kwale },
title = { An Efficient Text Clustering Framework },
journal = { International Journal of Computer Applications },
issue_date = { October 2013 },
volume = { 79 },
number = { 8 },
month = { October },
year = { 2013 },
issn = { 0975-8887 },
pages = { 30-38 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume79/number8/13763-1607/ },
doi = { 10.5120/13763-1607 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:52:30.324969+05:30
%A Francis M. Kwale
%T An Efficient Text Clustering Framework
%J International Journal of Computer Applications
%@ 0975-8887
%V 79
%N 8
%P 30-38
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The amount of data for analysis is increasing at a dramatic rate, for example web data. And so, it's important to improve techniques of searching relevant information from the huge data so as to increase efficiency. One such technique is text clustering, whereby we group (or cluster) text documents into various groups (or clusters), such as clustering web search engine results into meaningful groups. Data mining is a computer science area that can be defined as extraction of useful information from large structured data. Text mining on the other hand is an extension of data mining dealing only with (unstructured) text data. Text clustering is thus a text mining technique. In this paper, we give an insight of text clustering including the text mining related areas, techniques, and application areas. We also propose a framework for doing text clustering based on the K Means algorithm. The paper thus gives guidance to researchers of text mining concerning the state of art of text clustering.

References
  1. Alelyani, S. , Tang, J. , and Liu, H. Feature selection for clustering: A review. Online notes, unpublished.
  2. Bharathi, G. , and Venkatesan, D. 2012. Study of ontology or thesaurus based document clustering and information retrieval. Journal of Theoretical and Applied Information Technology. Vol. 40, no. 1.
  3. Boomija, M. , 2008. Comparison of partition based clustering algorithms. Journal of Computer Applications, Vol. 1, no. 4.
  4. Chen, C. , Tseng, F. , and Liang, T. 2010. Mining fuzzy frequent item sets for hierarchical document clustering. Information Processing and Management. Vol. 46, no. 2, pp. 193–211.
  5. Chifu, E. 2010. Self organizing maps in web mining and semantic web, PhD Thesis, Technical University of Cluj-Napoca.
  6. Fung, B. 1999. Hierarchical document clustering using frequent item sets. MSc Thesis, Simon Fraser University, 1999.
  7. Geraci, F. 2008. Fast clustering for web information retrieval. PhD Thesis, Universit' A Degli Studi Di Siena.
  8. Gruber, T. 1995. Toward principles for the design of ontologies used for knowledge sharing. International Journal Human-Computer Studies. Vol. 43, nos. 5-6, pp. 907-928.
  9. Guduru, N. 2006. Text mining with support vector machines and non-negative matrix factorization algorithms. MSc Thesis, University of Rhode Island.
  10. Hao, Z. 2012. A new text clustering method based on KGA. Journal of Software. Vol. 7, no. 5, pp. 1-5.
  11. Jayabharathy, J. , Kanmani, S. , and Parveen, A. 2011. A survey of document clustering algorithms with topic discovery. Journal of Computing. Vol. 3, no. 2, pp. 1-3.
  12. Khan, L. 2000. Ontology-based information selection. PhD Thesis, University of Southern California.
  13. Krishna, B. , Satheesh, P. , and Kumar, S. 2012. Comparative study of K-means and Bisecting K-means techniques in Wordnet-based document clustering. International Journal of Engineering and Advanced Technology. Vol 1, no 6, pp 1-4.
  14. Langville, A. and Meyer, C. Text mining using the nonnegative matrix factorization. SIAM-SEAS–Charleston, 2005, unpublished.
  15. Lasek, P. 2011. Efficient density-based clustering. PhD Thesis, Warsaw University of Technology.
  16. Lee, S. , Song, J. , and Kim, Y. An empirical comparison of four text mining methods. Journal of Computer Information Systems, 2010, unpublished.
  17. Liu, T. , Liu, S. , Chen, Z. , and Ma, Z. 2003. An evaluation on feature selection for text clustering. Paper presented at proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC.
  18. Li, Y. , 2007. High performance text document clustering. PhD Thesis, Wright State University.
  19. Li, Y. , Congnan, L. , and Soon, M. 2008. Text clustering with feature selection by using statistical data. IEEE Transactions on Knowledge and Data Engineering, vol. XX, no. YY.
  20. Magatti, D. 2010. Graphical models for text mining: knowledge extraction and performance estimation. PhD Thesis, UNIVERSITÀ DEGLI STUDI DI MILANO – BICOCCA.
  21. Ng, R. , and Han, J. 2002. CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering. Vol. 14, no. 5.
  22. Ning, W. 2005. Text mining and organization in large corpus. MSc Thesis, Technical University of Denmark (DTU).
  23. Punitha, S. , and Punithavalli, M. 2012. A comparative study to find a suitable method for text document clustering. IJCSNS International Journal of Computer Science and Network Security. Vol. 12, no. 10.
  24. Rehurek, R. 2011. Scalability of semantic analysis in natural language processing. PhD Thesis, Masaryk University.
  25. Rai, P. 2010. A survey of clustering techniques. International Journal of Computer Applications. Vol. 7, no 12.
  26. Rosell, M. , "Clustering exploration: Swedish text representation and clustering results unraveled", PhD Thesis, Stockholm, Sweden, 2009.
  27. Sharma, S. , and Gupta, V. 2012. Recent development in text clustering techniques. International Journal of Computer Applications (0975 – 8887). Vol. 37, no. 6, pp. 1-5.
  28. Sree K. , and Murthy J. 2012. Clustering based on cosine similarity measure. International Journal of Engineering Science & Advanced Technology. Vol 2, no 3, pp 1-2.
Index Terms

Computer Science
Information Sciences

Keywords

clusters data mining structured text clustering text mining unstructured