CFP last date
20 December 2024
Reseach Article

Review on Pattern based Document Modelling Techniques

by Jimsy Johnson, Smitha C.S.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 132 - Number 15
Year of Publication: 2015
Authors: Jimsy Johnson, Smitha C.S.
10.5120/ijca2015907653

Jimsy Johnson, Smitha C.S. . Review on Pattern based Document Modelling Techniques. International Journal of Computer Applications. 132, 15 ( December 2015), 1-5. DOI=10.5120/ijca2015907653

@article{ 10.5120/ijca2015907653,
author = { Jimsy Johnson, Smitha C.S. },
title = { Review on Pattern based Document Modelling Techniques },
journal = { International Journal of Computer Applications },
issue_date = { December 2015 },
volume = { 132 },
number = { 15 },
month = { December },
year = { 2015 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume132/number15/23667-2015907653/ },
doi = { 10.5120/ijca2015907653 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:29:27.857774+05:30
%A Jimsy Johnson
%A Smitha C.S.
%T Review on Pattern based Document Modelling Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 132
%N 15
%P 1-5
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Topic Modelling has been widely used in the fields of machine learning, text mining etc. It was proposed to generate statistical models to classify multiple topics in a collection of document, and each topic is represented by distribution of words. But many variants of topic models have been proposed and most of them are based on the concept of bag-of-words and it ignores the association of words for representing topics. Nowadays patterns are used for representing topics, since they have more discriminative power than words for representing multiple topics in a document. A detailed survey of some of the most important methods for topic modelling is presented. A brief comparison among the key techniques is also presented to complete the survey.

References
  1. Ma, Bing Liu Wynne Hsu Yiming. ”Integrating classification and association rule mining.” Proceedings of the fourth international conference on knowledge discovery and data mining. 1998.
  2. Borgelt, Christian. ”Frequent item set mining.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2.6 (2012): 437-456.
  3. Masseglia, Florent, Maguelonne Teisseire, and Pascal Poncelet. ”Sequential Pattern Mining.” (2009): 1800-1805.
  4. Cerf, Loc, et al. ”Data Peeler: Contraint-Based Closed Pattern Mining in n-ary Relations.” SDM. Vol. 8. 2008
  5. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4):77 84.
  6. Cavnar et al,Furnkranz,” N-gram-based text categorization” Ann Arbor MI, 48113(2):161175.1998
  7. Park, J. S., Chen, M.-S., and Yu, P. S. (1995). An eective hash-based algorithm for mining association rules, volume 24. ACM.
  8. Savasere, A., Omiecinski, E. R., and Navathe, S. B. (1995). An ecient algorithm for mining association rules in large databases.
  9. Brin, S., Motwani, R., Ullman, J. D., and Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In ACM SIGMOD Record, volume 26, pages 255264. ACM
  10. Yang Gao ,Yue Xu and Yuefung Li, ”Pattern based Topics for Document Modelling in Information Filtering” IEEE Transactions on Knowledge Engineering and Data Mining,Vol 27,No:6,June 2015.
  11. H. Cheng, X. Yan, J. Han, and C.-W. Hsu, Discriminative frequent pattern analysis for effective classication, in Proc. IEEE 23rd Int. Conf. Data Eng., 2007, pp. 716725.
  12. R. J. Bayardo Jr, Efciently mining long patterns from databases, in Proc. ACM Sigmod Record, 1998, vol. 27, no. 2, pp. 8593.
  13. N. Zhong, Y. Li, and S.-T.Wu, Effective pattern discovery for text mining, IEEE Trans. Knowl. Data Eng., vol. 24, no. 1, pp. 3044, Jan. 2012.
  14. Dumais, Susan T. ”Latent semantic analysis.” Annual review of information science and technology 38.1 (2004): 188-230.
  15. Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal, Mining frequent patterns with counting inference, ACM SIGKDD Explorations Newslett., vol. 2, no. 2, pp. 6675, 2000.
  16. Ball, Geoffrey H., and David J. Hall. ”A clustering technique for summarizing multivariate data.” Behavioral science 12.2 (1967): 153-155.
  17. H. M. Wallach, Topic modeling: Beyond bag-of-words, in Proc. 23rd Int. Conf. Mach. Learn., 2006, pp. 977984.
  18. Purver, Matthew, et al. ”Unsupervised topic modelling for multi-party spoken discourse.” Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.
  19. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. ”Latent dirichlet allocation.” the Journal of machine Learning research 3 (2003): 993-1022.
  20. Hoffman, Matthew, Francis R. Bach, and David M. Blei. ”Online learning for latent dirichlet allocation.” advances in neural information processing systems. 2010.
Index Terms

Computer Science
Information Sciences

Keywords

Information Retrieval Information Filtering Topic models