Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA)

R. Ranga Raj; M. Punithavalli

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA)

by R. Ranga Raj, M. Punithavalli

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 59 - Number 20

Year of Publication: 2012

Authors: R. Ranga Raj, M. Punithavalli

10.5120/9816-4363

R. Ranga Raj, M. Punithavalli . Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA). International Journal of Computer Applications. 59, 20 ( December 2012), 4-8. DOI=10.5120/9816-4363

@article{ 10.5120/9816-4363,

author = { R. Ranga Raj, M. Punithavalli },

title = { Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA) },

journal = { International Journal of Computer Applications },

issue_date = { December 2012 },

volume = { 59 },

number = { 20 },

month = { December },

year = { 2012 },

issn = { 0975-8887 },

pages = { 4-8 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume59/number20/9816-4363/ },

doi = { 10.5120/9816-4363 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:04:45.874403+05:30

%A R. Ranga Raj

%A M. Punithavalli

%T Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA)

%J International Journal of Computer Applications

%@ 0975-8887

%V 59

%N 20

%P 4-8

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Due to tremendous increase in number of documents, clustering of such document is difficult one. Document Clustering is the process of grouping related documents from the large collection of database. The mining of such related documents from the database which are unlabelled is a challenging one. To overcome this process, clustering is used to filter the unlabelled documents from the large collection of database. In this paper, a new concept is introduced for the document clustering by using k-means Enhanced Approach algorithm [1] with the Dictionary Defined Lexical Analyzer (DDLA). Basically K-Mean algorithm clusters the numeric values efficiently. But with the inclusion of DDLA the characters, words and sentences can also be clustered. Based on the weights, documents are clustered [7] by using bisecting k-means algorithm [1, 2] and topic detection method. The discovery of meaningful labels for the document is based on semantic similarity [8]. The efficient clustering of unlabeled documents with enhanced K-Mean algorithm and DDLA is one of the techniques which make clustering in an easiest way.

References

"Improving the accuracy and efficiency of K-Mean Clustering Algorithm", by K. A. Abdul Nazeer, M. P. Sebastian. Proceeding of the world congress on Engineering 2009 vol I WCE 2009, July 1-3, 2009, London, U. K.
Korean Text Extraction by "Local Color Quantization and K-means Clustering" In Natural Scene Anh-Nga Lai*, KeonHee Park, Manoj Kumar, GueeSang Lee*Department of Computer Science, Chonnam National University, 500-757 Gwangju, Korea ltanhnga@gmail. com, gslee@chonnam. ac. kr
"Cluster Analysis for Gene Expression Data," Daxin Jiang, Chum Tong and Aidong Zhang, IEEE Transactions on Data and Knowledge Engineering, 16(11): 1370-1386, 2004.
"Fast Document Clustering Based on Weighted Comparative Advantage"Jie Ji Intelligent System Lab The University of Aizu Aizuwakamatsu, Fukushima, Japan d8102102@u-aizu. ac. jp
"A Comparison of Document Clustering Techniques",Michael Steinbach,George Karypis. Department of Computer Science University of Minnesota Technical Report #00-034 steinbac, karypis, kumar@cs. umn. edu Vipin Kumar
"Clustering Of Image Data Set Using K-Means and Fuzzy K-Means Algorithms" Vinod Kumar Dehariya I. T dept. S. A. T. I Vidisha (M. P), India Vidisha (M. P), India Vidisha (M. P), Indiavkdworld@yahoo. com.
"Document Clustering in Correlation Similarity Measure Space" Taiping Zhang; Yuan Yan Tang; Bin Fang; Yong Xiang Knowledge and Data Engineering, IEEE Transactions on Volume: 24 ,,2012
"A Web Search Engine-Based Approach to Measure Semantic Similarity between Words" Bollegala, D. ; Matsuo, Y. ; Ishizuka, M. Knowledge and Data Engineering, IEEE Transactions on Volume: 23 ,,2011
"Spoken Document Retrieval With Unsupervised Query Modeling Techniques Chen", B. ; Kuan-Yu Chen; Pei-Ning Chen; Yi-Wen Chen Audio, Speech, and Language Processing, IEEE Transactions on Volume: 20 , Issue: 9 ,2012
"Data Extraction for Deep Web Using WordNet Jer Lang Hong Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on Volume: 41 , Issue: 6 ,2011
"Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination" Muscariello, A. ; Gravier, G. ; Bimbot, F. Audio, Speech, and Language Processing, IEEE Transactions on Volume: 20 , Issue: 7,2012
Automatic Discovery of Personal Name Aliases from the Web Bollegala, D. ; Matsuo, Y. ; Ishizuka, M. Knowledge and Data Engineering, IEEE Transactions on Volume: 23 , Issue: 6 ,2011

Index Terms

Computer Science

Information Sciences

Keywords

Clustering K-Means Enhanced Approach Algorithm Lexical Analyzer Defined Dictionary DDLA.