A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset

Anagha N. Chaudhari

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset

by Anagha N. Chaudhari

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 128 - Number 2

Year of Publication: 2015

Authors: Anagha N. Chaudhari

10.5120/ijca2015906459

Anagha N. Chaudhari . A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset. International Journal of Computer Applications. 128, 2 ( October 2015), 48-53. DOI=10.5120/ijca2015906459

@article{ 10.5120/ijca2015906459,

author = { Anagha N. Chaudhari },

title = { A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset },

journal = { International Journal of Computer Applications },

issue_date = { October 2015 },

volume = { 128 },

number = { 2 },

month = { October },

year = { 2015 },

issn = { 0975-8887 },

pages = { 48-53 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume128/number2/22849-2015906459/ },

doi = { 10.5120/ijca2015906459 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:20:14.823075+05:30

%A Anagha N. Chaudhari

%T A Novel Approach for Development of an Expert IR System using Dimensionality Reduction Techniques and Clustering Approaches for High Dimensionality Dataset

%J International Journal of Computer Applications

%@ 0975-8887

%V 128

%N 2

%P 48-53

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In day to day life huge amount of electronic data is generated from various resources. Such data is literally large and not easy to work with for storage and retrieval. This type of data can be treated with various efficient techniques for cleaning, compression and sorting of data. Preprocessing can be used to remove basic English stop-words from data making it compact and easy for further processing; later dimensionality reduction techniques make data more efficient and specific. This data later can be clustered for better information retrieval. This paper elaborates the various dimensionality reduction and clustering techniques applied on sample dataset C50test of 2500 documents giving promising results, their comparison and better approach for relevant information retrieval.

References

V. Srividhya, R. Anitha , " Evaluating Preprocessing Techniques in Text Categorization ",ISSN 0974-0767,International Journal of Computer Science and Application Issue 2010
Nguyen Hung Son, "Data Cleaning and Data Preprocessing".
Lei Yu Binghamton University, JiepingYe,Huan Liu ,Arizona State University, “Dimensionality Reduction for datamining-Techniques, Applications and Trends”.
Ch. Aswani Kumar, "Analysis of Unsupervised Dimensionality Reduction Techniques", ComSIS Vol. 6, No. 2, December 2009.
C.Ramasubramanian, R.Ramya, "Effective Pre-Processing Activities in Text Mining using Improved Porter’s Stemming Algorithm", International Journal of Advanced Research in Computer and Communication EngineeringVol. 2, Issue 12, December 2013.
Rui Tang, Simon Fong, Xin-She Yang, Suash Deb,” Integrating nature-inspired optimization algorithms to k-means clustering”, 978-1-4673-2430-4/12/$31.00 ©2012 IEEE.
Carlos Cobos, Henry Muñoz-Collazos, RicharUrbano-Muñoz, Martha Mendoza, Elizabeth Leónc, Enrique Herrera-Viedma “Clustering Of Web Search Results Based On The Cuckoo Search Algorithm And Balanced Bayesian Information Criterion ” ELSEVIER Publication, 2014 Elsevier Inc. All rights reserved ,21 May 2014
Agnihotri, D.; Verma, K.; Tripathi, P., "Pattern and Cluster Mining on Text Data," Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on, vol., no., pp.428,432, 7-9 April 2014
Patil, L.H.; Atique, M., "A novel approach for feature selection method TF-IDF in document clustering," Advance Computing Conference (IACC), 2013 IEEE 3rd International, vol., no., pp.858,862, 22-23 Feb. 2013
RasmusElsborg Madsen, Lars Kai Hansen and Ole Winther,"Singular Value Decomposition andPrincipal Component Analysis",February 2004.
https://www.irisa.fr/sage/bernard/publis/SVD-Chapter06.pdf
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Dimensionality_Reduction/Singular_Value_Decomposition
https://sites.google.com/site/dataclusteringalgorithms/k-means-clustering-algorithm
http://archive.ics.uci.edu/ml/datasets/Reuter_50_50
Sumit Goswami, Mayank Singh Shishodia; “A fuzzy based approach to stylometric analysis of blogger‟s age and gender”; HIS 2012: 47-5
Ross, T. J. (2010); “Fuzzy Logic with Engineering Applications”, Third Edition, John Wiley & Sons, Ltd, Chichester, UK.

Index Terms

Computer Science

Information Sciences

Keywords

High Dimensional Datasets Dimensionality reduction SVD PCA Clustering K-means Fuzzy Clustering Method.