We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Recent Developments in Text Clustering Techniques

by Saurabh Sharma, Vishal Gupta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 37 - Number 6
Year of Publication: 2012
Authors: Saurabh Sharma, Vishal Gupta
10.5120/4611-6604

Saurabh Sharma, Vishal Gupta . Recent Developments in Text Clustering Techniques. International Journal of Computer Applications. 37, 6 ( January 2012), 14-19. DOI=10.5120/4611-6604

@article{ 10.5120/4611-6604,
author = { Saurabh Sharma, Vishal Gupta },
title = { Recent Developments in Text Clustering Techniques },
journal = { International Journal of Computer Applications },
issue_date = { January 2012 },
volume = { 37 },
number = { 6 },
month = { January },
year = { 2012 },
issn = { 0975-8887 },
pages = { 14-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume37/number6/4611-6604/ },
doi = { 10.5120/4611-6604 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:23:36.227447+05:30
%A Saurabh Sharma
%A Vishal Gupta
%T Recent Developments in Text Clustering Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 37
%N 6
%P 14-19
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In order to make better business decisions, faster database browsing and reducing processing time of queries, Extraction of Information from text documents in efficient manner is needed. Clustering of huge number of text documents into different clusters, for better management of information, provides for a wide area in which a whole lot of research is currently being pursued. Recent developments in this area have tried number of different techniques. This paper reviews and discusses “Text Clustering” and partially covers all major techniques currently in use for the Process.

References
  1. Campi, A. and Ronchi, S., "The Role of Clustering in Search Computing ," in 20th International Workshop on Databases and Expert Systems Application , Linz, Austria, pp. 432-436, 2009. DOI: 10.1109/DEXA.2009.89
  2. Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W., "Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections", in Fifteenth Annual International ACM SIGIR Conference, pp. 318-329, June 1992.
  3. Hearst, M. A. and Pedersen, J. O., "Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results," in 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 74-84,1996.
  4. A. K. Jain and R. C. Dubes, "Algorithms for Clustering Data", Prentice Hall, Englewood Cliffs,1988.
  5. A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, Vol. 31, No. 3, pp. 264-323,1999.
  6. Congnan Luo, Yanjun Li, Soon M. Chung, "Text document Clustering Based on Neighbors", Data & Knowledge Engineering, Vol: 68, No: 11, pp: 1271-1288, November 2009.
  7. Xiangwei Liu, Pilian, “A Study On Text Clustering Algorithms Based On Frequent Term Sets”, Advanced Data Mining and Applications, Lecture Notes in Computer Science, 2005, Vol. 3584/2005, pp. 347-354, DOI: 10.1007/11527503_42.
  8. S. Suneetha, Dr. M. Usha Rani, Yaswanth Kumar.Avulapati, "Text Clustering Based on Frequent Items Using Zoning and Ranking", International Journal of Computer Science and Information Security, Vol. 9, No. 6, pp. 208-209, June 2011
  9. Yanjun Li, "High Performance Text Document Clustering" Wright State University, 2007.
  10. Van Rijsbergen, C. J., "Information Retrieval", London: Butterworth Ltd., second edition.1979.
  11. Benjamin C. M. Fung, Ke Wang, and Martin Ester, "Hierarchical Document Clustering", Encyclopedia of Data Warehousing and Mining, pp. 555-559, 2005, DOI: 10.4018/978-1-59140-557-3.ch105
  12. G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing", Communications of the ACM, 18(11): pp. 613–620, 1975. (see also TR74-218, Cornell University, NY, USA)
  13. G. Salton, J. Allan, and C. Buckley, "Automatic structuring and retrieval of large text files", Communications of the ACM, 37(2): pp. 97–108, Feb 1994.
  14. G. Miller, "Wordnet: A Lexical Database for English," CACM, vol. 38, no. 11, pp.39-41, 1995.
  15. Andreas Hotho, Andreas N¨urnberger, Gerhard Paaß, "A Brief Survey of Text Mining”, Journal for Computational Linguistics and Language Technology, pp. 27, 2005
  16. L. Khan, "Ontology-based Information Selection," PhD Thesis, 2000.
  17. L. Khan and D. McLeod, "Audio Structuring and Personalized Retrieval Using Ontology," Proceedings of IEEE Advances in Digital Libraries, 2000.
  18. T. Gruber, "A Translation Approach to Portable Ontology Specifications", Knowledge Acquisition, vol. 5, no. 2, pp. 199-220, 1993.
  19. Thomas R. Gruber, "Toward Principles for the Design of Ontologies Used for Knowledge Sharing", Proceedings of International Workshop on Formal Ontology, 1993.
  20. Liping Jing, "Survey of Text Clustering", The University of Hong Kong, HongKong, China, pp.3-4, 2005
  21. Abdelmalek Amine, Zakaria Elberrichi, and Michel Simonet, "Evaluation of Text Clustering Methods Using WordNet", International Arab Journal of Information Technology, Vol. 7, No. 4, pp. 351, October 2010
  22. D. J. Hand, H. Mannila, and P. Smyth, "Principles of Data Mining", MIT Press, Cambridge, MA, USA. 2001 ISBN 0-262-08290-X.
  23. Magnus Rosell, "Introduction to Text Clustering", KTH CSC, pp. 14-15, September, 2008.
  24. Hammouda, K.M. and Kamel, M.S., "Efficient Phrase-Based Document Indexing for Web Document Clustering," IEEE Transaction on Knowledge and Data Engineering, vol. 16, no. 10, pp. 1279-1296, 2004.
  25. Hung, C. and Xiaotie, D., "Efficient Phrase-Based Document Similarity for Clustering," IEEE Transaction on Knowledge and Data Engineering, vol. 20, no. September, pp. 1217-1229, 2008.
  26. Fung, B.C.M., Wang, K., and Ester, M., "Hierarchical Document Clustering Using Frequent Itemsets,” Proceedings of SIAM International Conference on Data Mining, 2003.
  27. Soon, M. C. , John, D. H., and Yanjun, L., "Text Document Clustering Based on Frequent Word Meaning Sequences," Data& Knowledge Engineering, ELSEVIER vol. 64, pp. 381-404, 2008.
  28. Pepper, S., “Topic Maps,” Encyclopedia of Library and Information Sciences, Third Edition 2010
  29. Muhammad Rafi, M. Shahid Shaikh, Amir Farooq, "Document Clustering Based on Topic Maps", International Journal of Computer Applications (0975 – 8887) Volume 12– No.1, pp. 33, December 2010
  30. C. Fellbaum (Ed.), "WordNet: An Electronic Lexical Database", MIT Press, May, 1998.
  31. Fabrizio Sebastiani, “Machine Learning in Automated Text Categorization”, ACM Computing Surveys, Vol. 34, No. 1, March 2002
  32. Yanjun Li, Congnan Luo,” Text Clustering with Feature Selection by Using Statistical Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 20 No.5, May 2008
  33. Manoranjan Dash ,Kiseok Choi ,Peter Scheuermann ,Huan Liu,” Feature Selection for Clustering – A Filter Solution” Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02)0-7695-1754-4/02 © 2002 IEEE
  34. Tao Liu, Shengping Liu , Zheng Chen, Wei-Ying Ma,”An Evaluation on Feature Selection for Text Clustering”, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.
  35. MS. K.Mugunthadevi, MRS. S.C. Punitha, Dr..M. Punithavalli, "Survey on Feature Selection in Document Clustering" International Journal on Computer Science and Engineering, Vol. 3 No. 3, pp.1240-1241, Mar 2011
  36. Nora Oikonomakou and Michalis Vazirgiannis, "A Review of Web Document Clustering Approaches", Data Mining and Knowedge Discovery Handbook, VI, pp. 921-943, 2005, DOI: 10.1007/0-387-25465-X_43
Index Terms

Computer Science
Information Sciences

Keywords

Text clustering K-mean clustering hierarchical clustering topic tracing feature selection ontology WORDNET frequent word sequence.