CFP last date
20 December 2024
Reseach Article

An Efficient Document Clustering by Optimization Technique for Cluster Optimality

by A. K. Santra, C. Josephine Christy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 43 - Number 16
Year of Publication: 2012
Authors: A. K. Santra, C. Josephine Christy
10.5120/6187-8666

A. K. Santra, C. Josephine Christy . An Efficient Document Clustering by Optimization Technique for Cluster Optimality. International Journal of Computer Applications. 43, 16 ( April 2012), 15-20. DOI=10.5120/6187-8666

@article{ 10.5120/6187-8666,
author = { A. K. Santra, C. Josephine Christy },
title = { An Efficient Document Clustering by Optimization Technique for Cluster Optimality },
journal = { International Journal of Computer Applications },
issue_date = { April 2012 },
volume = { 43 },
number = { 16 },
month = { April },
year = { 2012 },
issn = { 0975-8887 },
pages = { 15-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume43/number16/6187-8666/ },
doi = { 10.5120/6187-8666 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:33:34.948489+05:30
%A A. K. Santra
%A C. Josephine Christy
%T An Efficient Document Clustering by Optimization Technique for Cluster Optimality
%J International Journal of Computer Applications
%@ 0975-8887
%V 43
%N 16
%P 15-20
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Document clustering grows to be a very famous technique with the popularity of the web which also indicates that quick and best clustering technique acts as an important issue. Document clustering is about identifying semantically interconnected groups from formless collection of text documents. Feature Selection is significant for clustering process because number of the isolated or redundant feature should misguide the clustering results. Existing work presented improved Niching memetic algorithm and improved Genetic algorithm (GA) for feature selection. To attain more perfect document clustering, more instructive features including optimal conceptual weight are essential. In this paper, the proposed work presents the optimization technique to evaluate the cluster optimality for efficient document clustering based on the optimized conceptual feature words. The conceptual words (similarity words) are extracted from the featured words by using feature selection process. The important of cluster words are identified by the optimal conceptual word weight values. Experiments are carried out to evaluate the proposed optimization technique for efficient document clustering in terms of Conceptual word weight, Number of conceptual words and optimal conceptual word weight.

References
  1. A. K. Santra, C. Josephine Christy and B. Nagarajan, "Cluster Based Hybrid Niche Memetic and Genetic Algorithm for Text Document Categorization", IJCSI, vol. 8, Issue 5, no. 2,pp. 450-456, Sep 2011.
  2. A. K. Santra and C. Josephine Christy, "Genetic Algorithm and Confusion Matrix for Document Clustering" , IJCSI, vol. 9, Issue 1, no. 2,pp. 322-328, Sep 2012.
  3. K. Deep and K. N. Das. Quadratic approximation based Hybrid Genetic Algorithm for Function Optimization. AMC, Elsevier, Vol. 203: 86-98, 2008.
  4. Sun Park, Dong Un An, Choi Im Cheon, "Document Clustering Method Using Weighted Semantic Features and Cluster Similarity," digitel, pp. 185-187, 2010 Third IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning, 2010
  5. Wen-Hui Yang, Dao-Qing Dai, and Hong Yan, Fellow, IEEE," feature extraction and uncorreleted discriminant analysis for high dimentional data", IEEE transactions on knowledge and data engineering, vol. 20, no. 5, may 2008
  6. Yanjun Li, Congnan Luo,," Text clustering with feature selection by using statistical data", IEEE Transactions on Knowledge and Data Engineering, vol: 20 no:5, may 2008.
  7. Huan Liu, Senior Member, IEEE, and Lei Yu, Student Member, ieee," Toward Integrating Feature Selectio Algorithms for Classification and Clustering", ieee transactions on knowledge and data engineering, vol. 17, no. 4, April 2005
  8. C. Wei, C. S. Yang, H. W. Hsiao, T. H. Cheng, Combining preference- and content-based approaches for improving document clustering effectiveness, Information Processing & Management 42 (2) (2006) 350–372.
  9. Renchu Guan, Xiaohu Shi, Maurizio Marchese, Chen Yang, and Yanchun Liang, "Text Clustering with Seeds Affinity Propagation" IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 4, APRIL 2011
  10. Y. J. Li, C. Luo, and S. M. Chung, "Text Clustering with Feature Selection by Using Statistical Data," IEEE Trans. Knowledge and Data Eng. , vol. 20, no. 5, pp. 641-652, May 2008.
  11. B. J. Frey and D. Dueck, "Non-Metric Affinity Propagation for Un- Supervised Image Categorization," Proc. 11th IEEE Int'l Conf. Computer Vision (ICCV '07), pp. 1-8, Oct. 2007.
  12. L. P. Jing, M. K. Ng, and J. Z. Huang, "An Entropy Weighting KMeans Algorithm for Subspace Clustering of High-Dimensional Sparse Data," IEEE Trans. Knowledge and Data Eng. , vol. 19, no. 8, pp. 1026-1041, Aug. 2007.
  13. Z. H. Zhou and M. Li, "Distributional Features for Text Categorization," IEEE Trans. Knowledge and Data Eng. , vol. 21, no. 3, pp. 428-442, Mar. 2009.
  14. F. Pan, X. Zhang, and W. Wang, "Crd: Fast Co-Clustering on Large Data Sets Utilizing Sampling-Based Matrix Decomposition," Proc. ACM SIGMOD, 2008.
  15. Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee, "A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, MARCH 2011
Index Terms

Computer Science
Information Sciences

Keywords

Document Clustering Conceptual Words Cluster Optimality