We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset

by Anagha N. Chaudhari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 128 - Number 2
Year of Publication: 2015
Authors: Anagha N. Chaudhari
10.5120/ijca2015906459

Anagha N. Chaudhari . A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset. International Journal of Computer Applications. 128, 2 ( October 2015), 48-53. DOI=10.5120/ijca2015906459

@article{ 10.5120/ijca2015906459,
author = { Anagha N. Chaudhari },
title = { A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset },
journal = { International Journal of Computer Applications },
issue_date = { October 2015 },
volume = { 128 },
number = { 2 },
month = { October },
year = { 2015 },
issn = { 0975-8887 },
pages = { 48-53 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume128/number2/22849-2015906459/ },
doi = { 10.5120/ijca2015906459 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:20:14.823075+05:30
%A Anagha N. Chaudhari
%T A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset
%J International Journal of Computer Applications
%@ 0975-8887
%V 128
%N 2
%P 48-53
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In day to day life huge amount of electronic data is generated from various resources. Such data is literally large and not easy to work with for storage and retrieval. This type of data can be treated with various efficient techniques for cleaning, compression and sorting of data. Preprocessing can be used to remove basic English stop-words from data making it compact and easy for further processing; later dimensionality reduction techniques make data more efficient and specific. This data later can be clustered for better information retrieval. This paper elaborates the various dimensionality reduction and clustering techniques applied on sample dataset C50test of 2500 documents giving promising results, their comparison and better approach for relevant information retrieval.

References
  1. V. Srividhya, R. Anitha , " Evaluating Preprocessing Techniques in Text Categorization ",ISSN 0974-0767,International Journal of Computer Science and Application Issue 2010
  2. Nguyen Hung Son, "Data Cleaning and Data Preprocessing".
  3. Lei Yu Binghamton University, JiepingYe,Huan Liu ,Arizona State University, “Dimensionality Reduction for datamining-Techniques, Applications and Trends”.
  4. Ch. Aswani Kumar, "Analysis of Unsupervised Dimensionality Reduction Techniques", ComSIS Vol. 6, No. 2, December 2009.
  5. C.Ramasubramanian, R.Ramya, "Effective Pre-Processing Activities in Text Mining using Improved Porter’s Stemming Algorithm", International Journal of Advanced Research in Computer and Communication EngineeringVol. 2, Issue 12, December 2013.
  6. Rui Tang, Simon Fong, Xin-She Yang, Suash Deb,” Integrating nature-inspired optimization algorithms to k-means clustering”, 978-1-4673-2430-4/12/$31.00 ©2012 IEEE.
  7. Carlos Cobos, Henry Muñoz-Collazos, RicharUrbano-Muñoz, Martha Mendoza, Elizabeth Leónc, Enrique Herrera-Viedma “Clustering Of Web Search Results Based On The Cuckoo Search Algorithm And Balanced Bayesian Information Criterion ” ELSEVIER Publication, 2014 Elsevier Inc. All rights reserved ,21 May 2014
  8. Agnihotri, D.; Verma, K.; Tripathi, P., "Pattern and Cluster Mining on Text Data," Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on, vol., no., pp.428,432, 7-9 April 2014
  9. Patil, L.H.; Atique, M., "A novel approach for feature selection method TF-IDF in document clustering," Advance Computing Conference (IACC), 2013 IEEE 3rd International, vol., no., pp.858,862, 22-23 Feb. 2013
  10. RasmusElsborg Madsen, Lars Kai Hansen and Ole Winther,"Singular Value Decomposition andPrincipal Component Analysis",February 2004.
  11. https://www.irisa.fr/sage/bernard/publis/SVD-Chapter06.pdf
  12. https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Dimensionality_Reduction/Singular_Value_Decomposition
  13. https://sites.google.com/site/dataclusteringalgorithms/k-means-clustering-algorithm
  14. http://archive.ics.uci.edu/ml/datasets/Reuter_50_50
  15. Sumit Goswami, Mayank Singh Shishodia; “A fuzzy based approach to stylometric analysis of blogger‟s age and gender”; HIS 2012: 47-5
  16. Ross, T. J. (2010); “Fuzzy Logic with Engineering Applications”, Third Edition, John Wiley & Sons, Ltd, Chichester, UK.
Index Terms

Computer Science
Information Sciences

Keywords

High Dimensional Datasets Dimensionality reduction SVD PCA Clustering K-means Fuzzy Clustering Method.