CFP last date
20 December 2024
Reseach Article

Integrating Swarm Intelligence and Statistical Data for Feature Selection in Text Categorization

by M. Janaki Meena, K.R. Chandran, J. Mary Brinda
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 11
Year of Publication: 2010
Authors: M. Janaki Meena, K.R. Chandran, J. Mary Brinda
10.5120/248-405

M. Janaki Meena, K.R. Chandran, J. Mary Brinda . Integrating Swarm Intelligence and Statistical Data for Feature Selection in Text Categorization. International Journal of Computer Applications. 1, 11 ( February 2010), 16-21. DOI=10.5120/248-405

@article{ 10.5120/248-405,
author = { M. Janaki Meena, K.R. Chandran, J. Mary Brinda },
title = { Integrating Swarm Intelligence and Statistical Data for Feature Selection in Text Categorization },
journal = { International Journal of Computer Applications },
issue_date = { February 2010 },
volume = { 1 },
number = { 11 },
month = { February },
year = { 2010 },
issn = { 0975-8887 },
pages = { 16-21 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume1/number11/248-405/ },
doi = { 10.5120/248-405 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:45:56.655227+05:30
%A M. Janaki Meena
%A K.R. Chandran
%A J. Mary Brinda
%T Integrating Swarm Intelligence and Statistical Data for Feature Selection in Text Categorization
%J International Journal of Computer Applications
%@ 0975-8887
%V 1
%N 11
%P 16-21
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Feature selection is the principal step in classification problems with attributes of high dimension. It may also be considered as a problem to determine the subset of terms in training corpus, which maximizes the classifier’s performance. Most of the machine learning algorithms has tainted performance in high dimensional feature space. In this paper, a novel feature selection method based on Ant Colony Optimization, a swarm intelligence algorithm is proposed. Ant Colony Optimization is a metaheuristic algorithm used to increase the ability of finding high quality solutions to NP-hard problems. The heuristic information required for the optimization process is obtained through a chi-square based statistical method, CHIR which results in fast convergence. Performance of the classifier with features selected by proposed method is compared to the feature selected by conventional chi-square and CHIR methods. It is found that the proposed algorithm identifies better feature set than the conventional methods.

References
  1. Marico Dorgio and Thomas Stutzle 2005. Ant Colony Optimization. MIT Press.
  2. Yanjun Li, Congnan Luo, and Soon M. Chung Text Clustering with Feature Selection by using Statistical Data, IEEE Transactions on Knowledge and Data Engineering, Vol., XX, May 2008, 641-652.
  3. Huan Liu and Lei Yu. Towards Integrating Feature Selection Algorithms for Classification and Clustering, IEEE Transactions on Knowledge and Data Engineering, Vol., 17, No. 4, April 2005, 491-502.
  4. Elena Montanes, Irene Diaz, Jose Ranilla, Elias F. Combarro, and Javier Fernandez. Scoring and Selecting Terms for Text Categorization, IEEE Intelligent Systems, 2005.
  5. Elias F. Combarro, Elena Montanes, Irene Diaz, Jose Ranilla, and Ricardo Mones. Introducing A Family Of Linear Measures For Feature Selection In Text Categorization, IEEE Transactions on Knowledge and Data Engineering, Vol., 17, No. 9, 2005, 1223-1232.
  6. Baoli Li, Neha Sugandh, Ernest V. Garcia, Ashwin Ram. Adapting Associative Classification to Text Categorization, ACM Symposiun on Document Engineering, Winnipeg, Canada, August 28-31, 2007.
  7. Xiao-Bing Xue and Zhi-Hua Zhou. Distributional Features for Text Categorization, IEEE Transactions on Knowledge and Data Engineering, Vol., 21, No. 3 2009, 428-442.
  8. M. Dash, H. Liu. Feature Selection for Classification, Intelligent Data Analysis 1, 1997, 131-156.
  9. Tao Liu, Shengping Liu, Zheng Chen, and Wei Ying Ma. An Evaluation On Feature Selection For Text Clustering, Proceedings of the twentieth International Conference on Machine Learning, Washington DC 2003.
  10. Ciya Liao, Shamim Alpha and Paul Dixon. Feature Preparation in Text categorization, Oracle Corporation.
  11. Yiming Yang, Jan O. Pedersen. A Comparative Study On Feature Selection In Text Categorization., in Proc. 1997. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.9956.
  12. George Formann. Feature Selection we’ve Barely scratched the surface, Hewlett Packard Laboratories, Palo Alto, 2007.
  13. Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (Mar. 2003), 1289-1305.
  14. Ahmed Al-Ani. Ant Colony Optimization for Feature Subset Selection, Proceedings of World Academy of Science, Engineering and Technology, Vol. 4, February 2005, 35-38.
  15. Thomas Stutzle and Holgar Hoos. Max-Min Ant System And Local Search For The Traveling Salesman Problem, IEEE Conference 1997
  16. Hisham Al-Mubaid and Syed A. Umair. A New Text Categorization Technique using Distributional Clustering and Learning Logic, IEEE Transactions on Knowledge and Data Engineering, Volume 18, No. 9, pp 1156 – 1165, September, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Machine learning feature selection Ant colony optimization chi-square method