International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 109 - Number 14 |
Year of Publication: 2015 |
Authors: Payal Joshi, S. V. Patel |
10.5120/19260-1017 |
Payal Joshi, S. V. Patel . AuTopicGen: Rule based Positional Pattern Approach for Topic Collection in IR. International Journal of Computer Applications. 109, 14 ( January 2015), 44-47. DOI=10.5120/19260-1017
IR systems consist of phases like document preprocessing, indexing, query expansion, query matching, ranking etc. The document preprocessing phase is the most important phase to parse the document and collect keywords. Relevance of overall IR system improves if main topics of document are perfectly identified during this phase. It is a known fact that Topics are mostly phrase based. Existing phrase search methods like n-grams or positional indexes are quite complex and also suffer from problems of inaccuracy, requirement of large storage space etc. Moreover, IR system like digital library may consist of eBooks on one or more subjects. So for phrase collection, one may have to use appropriate ontology to retrieve phrases or topics. This paper presents a new approach called AuTopicGen (Automatic Topic Generator) that automatically collects most relevant topics of eBooks from its contents and indexes using rule based positional patterns approach. From the collected topics, we create topic hierarchy that can work as light weight ontology to improve overall performance of information retrieval system especially for phrase based queries and to assist user with query recommendation. Further this will be useful as topic maps, mind maps, to improve user interface to help user navigate through topics, for categorization, query expansion and ranking algorithms. We have successfully implemented the approach for topics collection practically on eBooks and presented in this paper.