We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Efficient Information Retrieval through Comparison of Dimensionality Reduction Techniques with Clustering Approach

by Poonam P. Rajurkar, Aditya G. Bhor, Komal K. Rahane, Neha S. Pathak, Anagha N. Chaudhari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 129 - Number 4
Year of Publication: 2015
Authors: Poonam P. Rajurkar, Aditya G. Bhor, Komal K. Rahane, Neha S. Pathak, Anagha N. Chaudhari
10.5120/ijca2015906829

Poonam P. Rajurkar, Aditya G. Bhor, Komal K. Rahane, Neha S. Pathak, Anagha N. Chaudhari . Efficient Information Retrieval through Comparison of Dimensionality Reduction Techniques with Clustering Approach. International Journal of Computer Applications. 129, 4 ( November 2015), 36-40. DOI=10.5120/ijca2015906829

@article{ 10.5120/ijca2015906829,
author = { Poonam P. Rajurkar, Aditya G. Bhor, Komal K. Rahane, Neha S. Pathak, Anagha N. Chaudhari },
title = { Efficient Information Retrieval through Comparison of Dimensionality Reduction Techniques with Clustering Approach },
journal = { International Journal of Computer Applications },
issue_date = { November 2015 },
volume = { 129 },
number = { 4 },
month = { November },
year = { 2015 },
issn = { 0975-8887 },
pages = { 36-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume129/number4/23064-2015906829/ },
doi = { 10.5120/ijca2015906829 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:22:33.540983+05:30
%A Poonam P. Rajurkar
%A Aditya G. Bhor
%A Komal K. Rahane
%A Neha S. Pathak
%A Anagha N. Chaudhari
%T Efficient Information Retrieval through Comparison of Dimensionality Reduction Techniques with Clustering Approach
%J International Journal of Computer Applications
%@ 0975-8887
%V 129
%N 4
%P 36-40
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In day today life huge amount of electronic data is generated from various resources. Such data is literally large and not easy to work with for storage and retrieval. This type of data can be treated with various efficient techniques for cleaning, compression and sorting of data. Preprocessing can be used to remove basic English stop-words from data making it compact and easy for further processing; later dimensionality reduction techniques make data more efficient and specific. This data later can be clustered for better information retrieval. This paper elaborates the various dimensionality reduction and clustering techniques applied on sample dataset C50test of 2500 documents giving promising results, their comparison and better approach for relevant information retrieval.

References
  1. V. Srividhya, R. Anitha , " Evaluating Preprocessing Techniques in Text Categorization ",ISSN 0974-0767,International Journal of Computer Science and Application Issue 2010.
  2. Nguyen Hung Son ,"Data Cleaning and Data Preprocessing".
  3. Lei Yu Binghamton University, Jieping Ye ,Huan Liu ,Arizona State University, “Dimensionality Reduction for data mining-Techniques, Applications and Trends”.
  4. Ch. Aswani Kumar ,"Analysis of Unsupervised Dimensionality Reduction Techniques" , ComSIS Vol. 6, No. 2, December 2009.
  5. C.Ramasubramanian, R.Ramya, "Effective Pre-Processing Activities in Text Mining using Improved Porter’s Stemming Algorithm" , International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 12, December 2013.
  6. Rui Tang, Simon Fong, Xin-She Yang, Suash Deb,” Integrating nature-inspired optimization algorithms to k-means clustering”, 978-1-4673-2430-4/12/$31.00 ©2012 IEEE.
  7. Carlos Cobos, Henry Muñoz-Collazos, RicharUrbano-Muñoz, Martha Mendoza, Elizabeth Leónc, Enrique Herrera-Viedma “Clustering Of Web Search Results Based On The Cuckoo Search Algorithm And Balanced Bayesian Information Criterion ” ELSEVIER Publication, 2014 Elsevier Inc. All rights reserved ,21 May 2014
  8. Agnihotri, D.; Verma, K.; Tripathi, P., "Pattern and Cluster Mining on Text Data," Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on, vol., no., pp.428,432, 7-9 April 2014.
  9. Rasmus Elsborg Madsen, Lars Kai Hansen and Ole Winther ,"Singular Value Decomposition and Principal Component Analysis" , February 2004.
  10. https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Dimensionality_Reduction/Singular_Value_Decomposition
  11. https://sites.google.com/site/dataclusteringalgorithms/k-means-clustering-algorithm
  12. http://archive.ics.uci.edu/ml/datasets/Reuter_50_50
Index Terms

Computer Science
Information Sciences

Keywords

High Dimensional Datasets Dimensionality reduction SVD PCA Clustering K-means.