International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 178 - Number 39 |
Year of Publication: 2019 |
Authors: S. Likhitha, B. S. Harish, H. M. Keerthi Kumar |
10.5120/ijca2019919265 |
S. Likhitha, B. S. Harish, H. M. Keerthi Kumar . A Detailed Survey on Topic Modeling for Document and Short Text Data. International Journal of Computer Applications. 178, 39 ( Aug 2019), 1-9. DOI=10.5120/ijca2019919265
Text mining is one of the most significant field in the digital era due to the rapid growth of textual information. Topic models are gaining popularity in the last few years. A topic comprises of a group of words that are often take place together. Topic models are better performing techniques to extract semantic knowledge presented in the data. The various methods used for topic models are, LSA (Latent Semantic Analysis), PLSA (Probabilistic Latent Semantic Analysis), LDA (Latent Dirichlet Allocation). These methods gained popularity in extracting hidden themes from the document (corpus). Various topic modeling algorithms are developed to inquiry, summarize and extract hidden semantic structures of large corpus. In this paper, we present a detailed survey covering the various topic modeling techniques proposed in last decade. Additionally, we focus on different strategies of extracting the topics in social media text, where the goal is to find and aggregate the topic within short texts. Further, we summarize the various applications and quantitative evaluation of the various methods, with statistical and mathematical knowledge to predict the convergence of results.