CFP last date
20 December 2024
Reseach Article

Reducing Dimensionality of Text Documents using Latent Semantic Analysis

by Chelsea Boling, Kumer Das
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 112 - Number 5
Year of Publication: 2015
Authors: Chelsea Boling, Kumer Das
10.5120/19660-1078

Chelsea Boling, Kumer Das . Reducing Dimensionality of Text Documents using Latent Semantic Analysis. International Journal of Computer Applications. 112, 5 ( February 2015), 9-12. DOI=10.5120/19660-1078

@article{ 10.5120/19660-1078,
author = { Chelsea Boling, Kumer Das },
title = { Reducing Dimensionality of Text Documents using Latent Semantic Analysis },
journal = { International Journal of Computer Applications },
issue_date = { February 2015 },
volume = { 112 },
number = { 5 },
month = { February },
year = { 2015 },
issn = { 0975-8887 },
pages = { 9-12 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume112/number5/19660-1078/ },
doi = { 10.5120/19660-1078 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:48:37.397250+05:30
%A Chelsea Boling
%A Kumer Das
%T Reducing Dimensionality of Text Documents using Latent Semantic Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 112
%N 5
%P 9-12
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Latent semantic analysis (LSA) is a technique that analyzes relationships between documents and its terms, and it discovers a data representation that has a lower dimension than the original semantic space. Essentially, the reduced dimensionality preserves the most crucial aspects of the data since LSA analyzes documents to find latent meaning in the corpus. The latent semantic space is determined by singular value decomposition (SVD), which enables a powerful process to simplify any rectangular matrix into a product of three unique components. The purpose of using SVD is to retrieve a sufficient amount of dimensions, which reveal a relevant structure that spans the original term-document matrix. In this study, LSA was used to find particular associations with user queries in a sample of documents from Medline Industries, Inc. Selecting an appropriate dimension for a reduced representation is suitable to represent the original latent space. The reduced model of the term-document matrix shows that SVD is capable of dealing with semantic problems. Overall, the goal is to overcome the problem of unsatisfactory indexed results by revealing hidden relationships among the terms and documents.

References
  1. Berry, M. W. , S. T. Dumais, and G. W. O'Brien. "Using linear algebra for intelligent information retrieval," SIAM Review 37(4) (1995): 573–595.
  2. Deerwester, S. , S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. "Indexing by latent semantic analysis," Journal of the American Society for Information Science 41(6) (1990): 391–407.
  3. Dumais, S. T. , G. W. Furnas, T. K. Landauer, S. Deerwester, and R. Harshman. "Using latent semantic analysis to improve access to textual information. " CHI 88: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM Press 25 (23) (1988): 281–285.
  4. Fayyad, Usama M. , Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy. "Advances in knowledge discovery and data mining. " (1996).
  5. Feldman, Ronen, and Ido Dagan. "Knowledge Discovery in Textual Databases (KDT). " In KDD, vol. 95, pp. 112-117. 1995.
  6. Hill, Richard. "Elementary Linear Algebra with Applications. " (Saunders College Pub. , 1996), Third Edition.
  7. Kumar, Aswani Ch. "Analysis of unsupervised dimensionality reduction techniques. " Computer Science and Information Systems/ComSIS 6, no. 2 (2009): 217-227.
  8. Landauer, T. K. , D. Laham, B. Rehder, and M. E. Schreiner, "How Well Can Passage Meaning Be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans. " Proc. 19th Ann. Meeting of the Cognitive Science Soc. (1997): 412-417.
  9. Landauer, T. K. , P. W. Foltz, and D. Laham. "Introduction to Latent Semantic Analysis. " Discourse Processes 25(23) (1998): 259–284.
  10. Medline Research Library (2014). Medline Industries, Inc. URL http//www. medline. com/research/library/.
  11. R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www. R-project. org/.
  12. Ramos, Juan. "Using tf-idf to determine word relevance in document queries. " Proceedings of the First Instructional Conference on Machine Learning. 2003
Index Terms

Computer Science
Information Sciences

Keywords

Latent Semantic Analysis Singular Value Decomposition Text Mining Dimensionality Reduction