CFP last date
20 January 2025
Reseach Article

Developing an Expert IR System from Multidimensional Dataset

by Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 122 - Number 1
Year of Publication: 2015
Authors: Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde
10.5120/21662-4718

Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde . Developing an Expert IR System from Multidimensional Dataset. International Journal of Computer Applications. 122, 1 ( July 2015), 6-9. DOI=10.5120/21662-4718

@article{ 10.5120/21662-4718,
author = { Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde },
title = { Developing an Expert IR System from Multidimensional Dataset },
journal = { International Journal of Computer Applications },
issue_date = { July 2015 },
volume = { 122 },
number = { 1 },
month = { July },
year = { 2015 },
issn = { 0975-8887 },
pages = { 6-9 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume122/number1/21662-4718/ },
doi = { 10.5120/21662-4718 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:11:24.060225+05:30
%A Anagha Chaudhari
%A Amitabh Mudiraj
%A Swati Shinde
%T Developing an Expert IR System from Multidimensional Dataset
%J International Journal of Computer Applications
%@ 0975-8887
%V 122
%N 1
%P 6-9
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Now-a-days due to increase in the availability of computing facilities, large amount of data in electronic form is been generated. The data generated is to be analyzed in order to maximize the benefit of intelligent decision making. Text categorization is an important and extensively studied problem in machine learning. The basic phases in the text categorization include preprocessing features like removing stop words from documents and applying TF-IDF is used which results into increase efficiency and deletion of irrelevant data from huge dataset. Application of TF-IDF algorithm on dataset gives weight for each word which summarized by Weight matrix. Preprocessing reduces the size of dataset which ultimately improves the performance of search engine. After that, index is generated from dataset. Index contains term with its occurrence in file and also its location in file. This paper discusses the implication of efficient Information Retrieval system for text-based data using clustering approaches.

References
  1. V. Srividhya, R. Anitha , " Evaluating Preprocessing Techniques in Text Categorization ",ISSN 0974-0767,International Journal of Computer Science and Application Issue 2010
  2. Xue, X. and Zhou, Z. (2009) " Distributional Features for TextCategorization ", IEEE Transactions on Knowledge and Data Engineering,Vol. 21, No. 3, Pp. 428-442.
  3. Porter, M. (1980) "An algorithm for suffix stripping, Program ", Vol. 14, No. 3, Pp. 130–137.
  4. Salton, G. , "Automatic information organization and retrieval", McGraw-Hill, New York. 1968
  5. Spärck Jones, K. , "A statistical interpretation of term specificity and its application in retrieval", Journal of Documentation, vol. pp. 28, 11–21, 1972.
  6. Tian Xia, Yanmei Chai "An Improvement to TF-IDF: Term Distribution based Term Weight Algorithm", JOURNAL OF SOFTWARE, VOL. 6, NO. 3, MARCH 2011
  7. René Arnulfo García-Hernández , J. Fco. Martínez-Trinidad and J. Ariel Carrasco-Ochoa, Finding Maximal "Sequential patterns in Text Document Collections and Single Documents" Informatica 34 (2010) 93–101 93
  8. Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth".
  9. R. Srikant and R. Agrawal. "Mining sequential patterns: Generalizations and performance improvements". In Proc. 5th Int. Conf. Extending Database Technology (EDBT'96), pages 3–17, Avignon, France, Mar. 1996.
  10. Carlos Cobos, Henry Muñoz-Collazos, Richar Urbano-Muñoz, Martha Mendoza, Elizabeth Leónc, Enrique Herrera-Viedma "Clustering of web search results based on the cuckoo search algorithm and balanced bayesian information criterion" ELSEVIER Publication, 2014 Elsevier Inc. All rights reserved ,21 May9 2014.
  11. X. -S. Yang, "Nature-Inspired Metaheuristic Algorithms" (2008) 128.
  12. Rui Tang, Simon Fong, Xin-She Yang, Suash Deb," Integrating nature-inspired optimization algorithms to k-means clustering", 978-1-4673-2430-4/12/$31. 00 ©2012 IEEE.
  13. Carlos Cobos, Henry Muñoz-Collazos, Richar Urbano-Muñoz, Martha Mendoza, Elizabeth Leónc, Enrique Herrera-Viedma "Clustering Of Web Search Results Based On The Cuckoo Search Algorithm And Balanced Bayesian Information Criterion " ELSEVIER Publication, 2014 Elsevier Inc. All rights reserved ,21 May 2014.
  14. Manoj Chahal,Jaswinder Singh "Effective Information Retrieval Using Similarity Function: Horngand Yeh Coefficient",Volume 3, Issue 8, August 2013.
  15. Agnihotri, D. ; Verma, K. ; Tripathi, P. , "Pattern and Cluster Mining on Text Data," Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on, vol. , no. , pp. 428,432, 7-9 April 2014
  16. Patil, L. H. ; Atique, M. , "A novel approach for feature selection method TF-IDF in document clustering," Advance Computing Conference (IACC), 2013
  17. http://www. ardendertat. com/2011/05/30/how-to- implement-a-search-enginepart-1-create-index/
  18. Anagha Chaudhari, Amitabh Mudiraj, Yogesh Jagdale, Pravin Phjadtare, Raviraj Mohite, Rohan Petare, Pranil Kudale, "Preprocessing of High Dimensional Dataset for Developing Expert IR System", ICCUBEA-2015, March 2015.
Index Terms

Computer Science
Information Sciences

Keywords

Information retrieval stop words TF IDF text based clustering fitness functions