CFP last date
20 January 2025
Call for Paper
February Edition
IJCA solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 20 January 2025

Submit your paper
Know more
Reseach Article

Text Categorization using Distributional Features and Semantic Equivalence

by Tirupathaiah Kommi, Srikanth Jatla
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 30 - Number 7
Year of Publication: 2011
Authors: Tirupathaiah Kommi, Srikanth Jatla
10.5120/3653-5105

Tirupathaiah Kommi, Srikanth Jatla . Text Categorization using Distributional Features and Semantic Equivalence. International Journal of Computer Applications. 30, 7 ( September 2011), 30-35. DOI=10.5120/3653-5105

@article{ 10.5120/3653-5105,
author = { Tirupathaiah Kommi, Srikanth Jatla },
title = { Text Categorization using Distributional Features and Semantic Equivalence },
journal = { International Journal of Computer Applications },
issue_date = { September 2011 },
volume = { 30 },
number = { 7 },
month = { September },
year = { 2011 },
issn = { 0975-8887 },
pages = { 30-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume30/number7/3653-5105/ },
doi = { 10.5120/3653-5105 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:16:56.943607+05:30
%A Tirupathaiah Kommi
%A Srikanth Jatla
%T Text Categorization using Distributional Features and Semantic Equivalence
%J International Journal of Computer Applications
%@ 0975-8887
%V 30
%N 7
%P 30-35
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In text mining domain, text categorization is widely used which is nothing but assigning predefined categories to text. The process of assigning values to words based on the occurrences of words known as bag-of-word approach was used by previous researchers in order to find how frequently a word is used in the document. This approach has a drawback as it does not consider other features of words except the count of it. This paper throws light into assigning other values to a word known as distributional features. This approach is novel and the distributional features include the position of first occurrence of word and compactness of its appearances. Our experimental results revealed that text categorization has been improved with the help of distributional features and semantic equivalence. The research has thrown light into another fact that distributional features are very useful when writing style is casual and document is long. The semantic equivalence used to extend equivalence rough set approach.

References
  1. L.D.Bakerand A.K.McCallum, Distributional Clustering of Words for Text Classification, Proc. ACM SIGIR ’98, pp. 96-103, 1998.
  2. R. Bekkerman, R El-Yaniv, N. Tishb, and Y.Winter Distributional Word Clusters versus Words for Text Categorization, J. Machine Learning Research, vol. 3, pp. 1182-1208, 03.
  3. J.P. Callan, Passage Retrieval Evidence in Document Retrieval, Proc. ACM SIGIR ’94, pp. 302-310, 1994.
  4. M.F. Caropreso, S. Matwin, and F.Sebastiani, A Learner- Independent Evaluation of the Usefulness of Statistica Phrases for Automated Text Categorization,Text Databases and Document Management Theory and Practice, A.G. Chin, ed., pp. 78-102, Idea Group Publishing, 2001.
  5. F.Debole and F.Sebastiani, Supervised Term Weighting for Automated Text Categorization, Proc. 18th ACM Symp. Applied Computing (SAC ’03), pp. 784-788, 2003.
  6. S.T. Dumais, J.C. Platt, D. Heckerman, and M. Sahami, Inductive Learning Algorithms and Representations for Text Categorization, Proc. Seventh Int’l Conf. Information and Knowledge Management (CIKM ’98), pp. 148-155, 1998.
  7. C. Fellbaum, WordNet: An Electronic Lexical Database. MIT Press, 1998.
  8. J. Kim and M.H. Kim, An Evaluation of Passage-Based Text Categorization, J. Intelligent Information Systems, vol. 23, no. 1, pp. 47-65, 2004.
  9. K. Lang, Newsweeder: Learning to Filter Netnews Proc. 12th Int’l Conf. Machine Learning (ICML ’95), pp. 331-339, 1995.
  10. E. Leopold and J. Kingermann, Text Categorization with Support Vector Machines: How to Represent Text in Input Space? Machine Learning, vol. 46, nos. 1-3, pp. 423-444, 2002.
  11. R.E. Schapire and Y.Singer, Boostexter: A Boosting-Based System for Text Categorization, Machine Learning, vol. 39, nos. 2/3, pp.135-168, 2000.
  12. F.Sebastiani, Machine Learning in Automated Text categorization, ACM Computing Surveys, vol. 34, no 1, pp. 1-47, 2002
  13. S. Shankar and G.Karypis, A Feature Weight Adjustment Algorithm for Document Classification,Proc. SIGKDD’00 Workshop Text Mining, 2000.
  14. P. Soucy and G.W. Mineau, Beyond tfidf Weighting for Text Categorization in the Vector Space Model, Proc.19thInt’l J Artificial Intelligence (IJCAI ’05), pp.1130-1135,2005
  15. X.-B. Xue and Z.-H. Zhou, Distributional Features for Text Categorization, Proc.17th European Conf. Machine Learning (ICML ’06), pp. 497-508, 2006.
  16. Y. Yang and J.O. Pedersen, A Comparative Study on Feature Selection in Text Categorization, Proc. 14th Int’l Conf. Machine Learning (ICML ’97), pp. 412-420, 1997.
Index Terms

Computer Science
Information Sciences

Keywords

Text mining machine learning text categorization distributional feature tfidf