We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Preprocessing Techniques in Text Categorization

Published on December 2013 by Pritam C. Gaigole, L. H. Patil, P. M Chaudhari
National Conference on Innovative Paradigms in Engineering & Technology 2013
Foundation of Computer Science USA
NCIPET2013 - Number 3
December 2013
Authors: Pritam C. Gaigole, L. H. Patil, P. M Chaudhari
3c83de00-1425-45ec-b561-a4c301a4a1cf

Pritam C. Gaigole, L. H. Patil, P. M Chaudhari . Preprocessing Techniques in Text Categorization. National Conference on Innovative Paradigms in Engineering & Technology 2013. NCIPET2013, 3 (December 2013), 1-3.

@article{
author = { Pritam C. Gaigole, L. H. Patil, P. M Chaudhari },
title = { Preprocessing Techniques in Text Categorization },
journal = { National Conference on Innovative Paradigms in Engineering & Technology 2013 },
issue_date = { December 2013 },
volume = { NCIPET2013 },
number = { 3 },
month = { December },
year = { 2013 },
issn = 0975-8887,
pages = { 1-3 },
numpages = 3,
url = { /proceedings/ncipet2013/number3/14708-1334/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Innovative Paradigms in Engineering & Technology 2013
%A Pritam C. Gaigole
%A L. H. Patil
%A P. M Chaudhari
%T Preprocessing Techniques in Text Categorization
%J National Conference on Innovative Paradigms in Engineering & Technology 2013
%@ 0975-8887
%V NCIPET2013
%N 3
%P 1-3
%D 2013
%I International Journal of Computer Applications
Abstract

Bulk data is generated in the era ofInformation Technology. If it is not stored in aproperly systematic manner then the generated datacannot be reused. This is because navigation becomes if not impossible, certainly very difficult. The data generated is to analyze so as to maximizethe benefits, for intelligent decision making. Textcategorization is an important and extensively studiedproblem in machine learning. The basic phases in textcategorization include preprocessing features, extractingrelevant features against the features in a database, andfinally categorizing a set of documents into predefinedcategories. Most of the researches in text categorization arefocusing more on the development of algorithms andcomputer techniques.

References
  1. K. Aas "Text categorization: A survey", Technicalreport,Norwegian Computing Center, June, 1999.
  2. Katharina, M. and Martin, S. (2004) "The Mining Mart Approach to Knowledge Discovery in Databases", NingZhong and Jiming Liu (editors), Intelligent Technologies for Information Analysis Springer, Pp. 47-65.
  3. Xue, X. and Zhou, Z. (2009),"Distributional Features for Text Categorization", IEEE Transactions on Knowledge and Data Engineering,Vol. 21, No. 3, Pp. 428-442.
  4. Salton, G. (1989), "Automatic Text Processing: TheTransformation, Analysis, and Retrieval of Information ByComputer", Pennsylvania, Addison-Wesley, Reading.
  5. Porter, M. (1980) "An algorithm for suffix stripping, Program",Vol. 14, No. 3, Pp. 130–137.
  6. Salton, G. and Buckley, C. (1988) "Term weighting approaches In automatic text retrieval, Information Processing and Management",Vol. 24, No. 5, Pp. 513-523.
  7. Karbasi, S. and Boughanem, M. (2006),"Document lengthnormalization using effective level of term frequency in largecollections", Advances in Information Retrieval, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, Vol. 3936/2006, Pp. 72-83.
  8. Diao, Q. and Diao, H. (2000) "Three Term Weighting and Classification Algorithms in Text Automatic Classification", The Fourth International Conference on High-Performance Computing in theAsia-Pacific Region,Vol. 2, P. 629.
  9. Chisholm, E. and Kolda, T. F. (1998) "New term weighting Formulas for the vector space method in information retrieval",Technical Report, Oak Ridge National Laboratory.
  10. C. Apte, F. Damerau and S. Weiss "Towards language independent automated learning of text categorization models". Proceeding of 17th Annual ACM/SIGIR conference,1994.
  11. William W. Cohen and Yoram Singer, "Context sensitive learning methods for text categorization", In SIGIR'96: Proceeding of 19th Annual International ACM/SIGIR conference on research and development in information retrieval, 1996.
  12. R. H. Creecy, B. M. Masand, S. J. Smith and D. L. Waltz, "Trading mips and memory for knowledge Engineering", classifying census returns on the connection machine comm. . ACM, 35:48-63,1992
  13. N. Fuhr, S. Hartmanna, G. Lusting, M. Schwanter and K. Tzeras, " Rule based multistage indexing systems for large subject field", In 606-623, editor, Proceedings of RIAO'91.
  14. D. Koller and M. Sahami," Toward optimal feature selection", In proceedings of the 13th international conference on machine learning 1996
  15. D. D. Lewis and M. Ringvette, "Comparison of two learning algorithm for text categorization", In Proceeding Analysis and Information Retrieval(SDAIR'94) 1994.
Index Terms

Computer Science
Information Sciences

Keywords

Preprocessing Text Categorization