International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 122 - Number 1 |
Year of Publication: 2015 |
Authors: Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde |
10.5120/21662-4718 |
Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde . Developing an Expert IR System from Multidimensional Dataset. International Journal of Computer Applications. 122, 1 ( July 2015), 6-9. DOI=10.5120/21662-4718
Now-a-days due to increase in the availability of computing facilities, large amount of data in electronic form is been generated. The data generated is to be analyzed in order to maximize the benefit of intelligent decision making. Text categorization is an important and extensively studied problem in machine learning. The basic phases in the text categorization include preprocessing features like removing stop words from documents and applying TF-IDF is used which results into increase efficiency and deletion of irrelevant data from huge dataset. Application of TF-IDF algorithm on dataset gives weight for each word which summarized by Weight matrix. Preprocessing reduces the size of dataset which ultimately improves the performance of search engine. After that, index is generated from dataset. Index contains term with its occurrence in file and also its location in file. This paper discusses the implication of efficient Information Retrieval system for text-based data using clustering approaches.