We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

An Effective Supervised Streamed Text Classification Approach for Mining Positive and Negative Examples

by Safdar Sardar Khan, Divakar Singh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 75 - Number 1
Year of Publication: 2013
Authors: Safdar Sardar Khan, Divakar Singh
10.5120/13075-7334

Safdar Sardar Khan, Divakar Singh . An Effective Supervised Streamed Text Classification Approach for Mining Positive and Negative Examples. International Journal of Computer Applications. 75, 1 ( August 2013), 24-29. DOI=10.5120/13075-7334

@article{ 10.5120/13075-7334,
author = { Safdar Sardar Khan, Divakar Singh },
title = { An Effective Supervised Streamed Text Classification Approach for Mining Positive and Negative Examples },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 75 },
number = { 1 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 24-29 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume75/number1/13075-7334/ },
doi = { 10.5120/13075-7334 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:43:07.472606+05:30
%A Safdar Sardar Khan
%A Divakar Singh
%T An Effective Supervised Streamed Text Classification Approach for Mining Positive and Negative Examples
%J International Journal of Computer Applications
%@ 0975-8887
%V 75
%N 1
%P 24-29
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the field of text mining. This survey paper is based on effective classification of streamed data for text mining by PNLH & one-class classification SVM for text contained audit, we consider the problem of one-class classification of text streams with respect to concept drift where a large volume of documents arrives at a high speed and with change of user interests and data distribution. In this case, only a small number of positively labelled documents is available for training. And text classification without negative examples revisit, by this we propose a labelling heuristic called PNLH to tackle this problem. PNLH aims at extracting high quality positive examples and negative examples from U and our survey can be used on top of any existing classifiers.

References
  1. D. R. Cutting, D. R. Karger, J. O. Pederson, and J. W. Tukey, "Scatter/Gather a Cluster-Based Approach to Browsing Large Document Collections," Proc. 15th Int'l Conf. Research and Development in Information Retrieval, 1992.
  2. H. Schutze, D. A. Hull, and J. O. Pedersen, "A Comparison of Classifiers and Document Representations for the Routing Problem," Proc. 18th Int'l Conf. Research and Development in Information Retrieval, 1995.
  3. D. Bennett and A. Demiritz, "Semi-Supervised Support VectorMachines," Advances in Neural Information Processing Systems,vol. 11, 1998.
  4. P. Bradley and U. Fayyad, "Refining Initial Points for k-Means Clustering," Proc. 15th Int'l Conf. Machine Learning, 1998.
  5. T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," Proc. 10th European Conf. Machine Learning, 1998.
  6. R. Klinkenberg and I. Renz, "Adaptive information filtering: learning in the presence of concept drifts". Workshop Notes of the ICML-98Workshop on Learning for Text Categorization, pages 33–40, 1998.
  7. B. Larsen and C. Aone, "Fast and Effective Text Mining Using Linear-Time Document Clustering," Proc. Fifth Int'l Conf. Knowledge Discovery and Data Mining, 1999.
  8. T. Zhang, "The Value of Unlabeled Data for Classification Problems," Proc. 17th Int'l Conf. Machine Learning, 2000.
  9. K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, "Text Classification from Labeled and Unlabeled Documents Using EM," Machine Learning, vol. 39, 2000.
  10. R. Klinkenberg and T. Joachims, "Detecting concept drift with support vector machines," In Proceedings of the Seventeenth International Conference on Machine Learning (ICML'00), pages 487–494, 2000.
  11. T. Dietterich, "Ensemble methods in machine learning," Proceedings of the First International Workshop on Multiple Classifier Systems, pages 1–15, 2000.
  12. W. Street and Y. Kim, "A streaming ensemble algorithm (SEA) for large-scale classification," Proceedings of the seventh international conference on Knowledge discovery and data mining, (KDD'01), pages 377–382, 2001.
  13. D. Tax. One-class classification, "Doctoral dissertation," Delft University of Technology, 2001.
  14. Y. Yang, "A Study on Thresholding Strategies for Text Categorization," Proc. 24th Int'l Conf. Research and Development in Information Retrieval, 2001.
  15. J. Allan, "Topic detection and tracking," event-based information organization Kluwer Academic Publishers, 2002.
  16. F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, vol. 1 pages 1–47, 2002.
  17. J. Bockhorst and M. Craven, "Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data," Proc. 19th Int'l Conf. Machine Learning, 2002.
  18. R. Ghani, "Combining Labeled and Unlabeled Data for Multiclass Text Categorization," Proc. 19th Int'l Conf. Machine Learning, 2002.
  19. J. Kolter and M. Maloof, "Dynamic weighted majority: a new ensemble method for tracking concept drift," Third International Conference on Data Mining, (ICDM'03), pages 123–130, 2003.
  20. B. Liu, Y. Dai, X. Li, L. W. S. , and Y. P. , "Building Text Classifiers Using Positive and Unlabeled Examples," Proceedings of the Third IEEE International Conference on Data Mining, (ICDM'03), pages 179–186, 2003.
  21. Page Classification Using SVM," Proc. Ninth Int'l Conf. Knowledge Discovery and Data Mining, 2003.
  22. R. Klinkenberg, "Learning drifting concepts: example selection vs. example weighting," Intelligent Data Analysis, pages 281–300, 2004.
  23. B. Liu, X. Li, L. W. S. , and Y. P. , "Text Classification by Labeling Words," Proceedings of Nineteeth National Conference on Artificial Intellgience (AAAI-2004), pages 425–430, 2004.
  24. X. Zhu, X. Wu, and Y. Yang, "Dynamic classifier selection for effective mining from noisy data streams," Proceedings of the 4th international conference on Data Mining, (ICDM'04), pages 305–312, 2004.
  25. Symposium on Computer-Based Medical Systems, (CBMS'06), pages 679–684, 2006.
  26. S. Wu, C. Yang, and J. Zhou, "Clustering-training for data stream mining," Sixth IEEE International Conference of Data Mining Workshops, pages 653–656, 2006.
  27. Y. Zhang and X. Jin, "An automatic construction and organization strategy for ensemble learning on data streams," ACM SIGMOD Record, vol. 3, pages 28–33, 2006.
  28. S. Huang and Y. Dong, "An active learning system for mining time-changing data streams," Intelligent Data Analysis, vol. 4, pages 401–419, 2007.
  29. X. Zhu, P. Zhang, X. Lin, and S. Y. , "Active Learning from Data Streams," Proceedings of the Sixth International Conference on Data Mining, (ICDM'06), 2007.
  30. X. Jeffrey member, "Text classification without negative examples revist," IEEE computer society 2008.
  31. Z. Zhang Yang, "One-class classification of text streams with concept drift," University of Queensland Australia, 2008.
  32. Z. Jiawei Han and Micheline Kamber, "Data mining concepts and techniques," third edition, 2010.
  33. Arun K Pujari, "Data mining & techniques," second edition, Universities Press, 2011.
  34. Ning Zhong, Yuefeng Li, "Effective Pattern Discovery for Text Mining," IEEE Transactions on Knowledge and data engineering vol. 24, No. 1 January 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Text mining text categorization partially supervised learning labelling unlabelled data pattern mining information filtering