International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 99 - Number 10 |
Year of Publication: 2014 |
Authors: Chi Shen, Wen Li, Mike F. Unuakhalu |
10.5120/17406-7991 |
Chi Shen, Wen Li, Mike F. Unuakhalu . Multistep Sparse Approximation Technology in Information Retrieval. International Journal of Computer Applications. 99, 10 ( August 2014), 1-8. DOI=10.5120/17406-7991
With large sets of text documents increasing rapidly, being able to efficiently utilize this vast volume of new information and service resource presents challenges to computational scientists. Text documents are usually modeled as a term-document matrix which has high dimensional and space vectors. To reduce the high dimensions, one of the various dimensionality reduction methods, concept decomposition, has been developed by some researchers. This method is based on document clustering techniques and leastsquare matrix approximation to approximate the matrix of vectors. However the numerical computation is expensive, as an inverse of a dense matrix formed by the concept vector matrix is required. In this paper we presented a class of multistep spare matrix strategies for concept decomposition matrix approximation. In this approach, a series of simple sparse matrices are used to approximate the decompositions. Our numerical experiments on both small and large datasets show the advantage of such an approach in terms of storage costs and query time compared with the least-squares based approach while maintaining comparable retrieval quality.