CFP last date
20 January 2025
Reseach Article

Multistep Sparse Approximation Technology in Information Retrieval

by Chi Shen, Wen Li, Mike F. Unuakhalu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 99 - Number 10
Year of Publication: 2014
Authors: Chi Shen, Wen Li, Mike F. Unuakhalu
10.5120/17406-7991

Chi Shen, Wen Li, Mike F. Unuakhalu . Multistep Sparse Approximation Technology in Information Retrieval. International Journal of Computer Applications. 99, 10 ( August 2014), 1-8. DOI=10.5120/17406-7991

@article{ 10.5120/17406-7991,
author = { Chi Shen, Wen Li, Mike F. Unuakhalu },
title = { Multistep Sparse Approximation Technology in Information Retrieval },
journal = { International Journal of Computer Applications },
issue_date = { August 2014 },
volume = { 99 },
number = { 10 },
month = { August },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume99/number10/17406-7991/ },
doi = { 10.5120/17406-7991 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:27:48.923693+05:30
%A Chi Shen
%A Wen Li
%A Mike F. Unuakhalu
%T Multistep Sparse Approximation Technology in Information Retrieval
%J International Journal of Computer Applications
%@ 0975-8887
%V 99
%N 10
%P 1-8
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With large sets of text documents increasing rapidly, being able to efficiently utilize this vast volume of new information and service resource presents challenges to computational scientists. Text documents are usually modeled as a term-document matrix which has high dimensional and space vectors. To reduce the high dimensions, one of the various dimensionality reduction methods, concept decomposition, has been developed by some researchers. This method is based on document clustering techniques and leastsquare matrix approximation to approximate the matrix of vectors. However the numerical computation is expensive, as an inverse of a dense matrix formed by the concept vector matrix is required. In this paper we presented a class of multistep spare matrix strategies for concept decomposition matrix approximation. In this approach, a series of simple sparse matrices are used to approximate the decompositions. Our numerical experiments on both small and large datasets show the advantage of such an approach in terms of storage costs and query time compared with the least-squares based approach while maintaining comparable retrieval quality.

References
  1. M. Berry and M. Browne. Understanding Search Engines: Mathematical modeling and text retrieval. SIAM,1999
  2. C. M. Chen, N. Stoffel, M. Post, C. Basu, D. Bassu, and C. Behrens. Telcordia LSI engine: Implementation and scalability issues. In Proceedings of the Eleventh International Workshop on Research Issues in Data Engineering (RIDE 2001), Heidelberg, Germany, April 2001.
  3. J. Chen and Y. Saad. Divide and Conquer Strategies for Effective Information Retrieval, Proceedings of the Ninth SIAM International Conference on Data Mining (SDM), 2009.
  4. E. Chow. A priori sparsity patterns for parallel sparse approximate inverse preconditioners. SIAM J. Sci. Comput. , 21(5):1804–1822, 2000.
  5. I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1):143–175, 2001.
  6. J. Dobsa and B. J. Basic. Concept decomposition by fuzzy kmeans algorithm, In Proceedings of IEEE Web Intelligence Conference (WI2003), pp. 684-688, October 2003
  7. J. Gao and J. Zhang. Text retrieval using sparsified concept decomposition matrix. Book Chapter of Computational and Information Science, Springer Berlin/Heidelberg, vol. 3314, pp. 523 – 529, 2005
  8. J. Gao and J. Zhang. Clustered SVD strategies in latent semantic indexing. in Information Processing and Management, 41(5):1051-1063, 2005.
  9. P. Husbands, H. Simon, and C. Ding. On the use of singular value decomposition for text retrieval. In M. W. Berry, editor, Computational Information Retrieval, pages 145–156. SIAM, Philadelphia, PA, 2001.
  10. E. R. Jessup and J. H. Martin. Taking a new look at the latent semantic analysis approach to information retrieval. In Proceedings of the SIAM Workshop on Computational Information Retrieval, Raleigh, NC, 2000.
  11. A. Kontostathis, W. M. Pottenger, and B. D. Davison. Identifcation of critical values in Latent Semantic Indexing (LSI). Book chapter of Foundations of Data Mining and Knowledge Discovery, pp. 333-346, 2005.
  12. A. Kontostathis and W. M. Pottenger. A framework for understanding Latent Semantic Indexing(LSI) performance. In-formation Processing and Management , 42(1):56-73, 2006.
  13. A. Kontostathis, W. M. Pottenger, and B. D. Davison. Identification of critical values in Latent Semantic Indexing(LSI). In Foundations of Data Mining and Knowledge Discovery pp. 333-346, 2005.
  14. A. Kontostathis. Essential Dimensions of Latent Semantic Indexing (LSI). 40th Annual Hawaii International Conference on System Sciences, pp. 73, 2007.
  15. V. Raghavan, P. Bollmann, and G. S. Jung. A critical investigation of recall and precision as measures of retrieval system performance. ACM Transactions on Information Systems, Vol. 7, pp. 205-229, 1989.
  16. C. Tang, S. Dwarkadas, and Z. Xu. On scaling latent semantic indexing for large peer-to-peer systems. In Proceedings of the 27th Annual International ACM SIGIR Conference, Sheffield, UK, 2004
  17. K. Wang and J. Zhang. MSP:A class of parallel multistep successive sparse approximate inverse preconditioning strategies. SIAM Journal on Scientific Computing, Vol. 24, No. 4, pp. 1141-1151, 2003
  18. C. Shen and D. Williams. Approximate Matrix Decomposition Technniques in Information Retrieval, In Proceedings of World Congress on Engineering and Computer Science 2007, pp. 347 - 352, October 24-26, 2007.
Index Terms

Computer Science
Information Sciences

Keywords

Term-document matrix Concept decomposition matrix Sparsity pattern Multistep Sparse matrix approximation