CFP last date
20 December 2024
Reseach Article

Efficient Clustering Approach using Statistical Method of Expectation-Maximization

by P.srinivasa Rao, K.sivarama Krishna, Nagesh Vadaparthi, S.vani Kumari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 46 - Number 12
Year of Publication: 2012
Authors: P.srinivasa Rao, K.sivarama Krishna, Nagesh Vadaparthi, S.vani Kumari
10.5120/6958-9305

P.srinivasa Rao, K.sivarama Krishna, Nagesh Vadaparthi, S.vani Kumari . Efficient Clustering Approach using Statistical Method of Expectation-Maximization. International Journal of Computer Applications. 46, 12 ( May 2012), 1-7. DOI=10.5120/6958-9305

@article{ 10.5120/6958-9305,
author = { P.srinivasa Rao, K.sivarama Krishna, Nagesh Vadaparthi, S.vani Kumari },
title = { Efficient Clustering Approach using Statistical Method of Expectation-Maximization },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 46 },
number = { 12 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume46/number12/6958-9305/ },
doi = { 10.5120/6958-9305 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:39:31.849049+05:30
%A P.srinivasa Rao
%A K.sivarama Krishna
%A Nagesh Vadaparthi
%A S.vani Kumari
%T Efficient Clustering Approach using Statistical Method of Expectation-Maximization
%J International Journal of Computer Applications
%@ 0975-8887
%V 46
%N 12
%P 1-7
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is the activity of grouping objects in a dataset based on certain similarity. Available reports on clustering present several algorithms for obtaining effective clusters. Among the existing clustering techniques, hierarchical clustering is one of the widely preferred algorithms. Though there are many algorithms existing,K-Means for hierarchical clustering stand top. But still it is observed that the K-Means algorithm has number of limitations like initialization of parameters. To overcome this limitation, we propose the utilization of E-M algorithm. The K-Means algorithm is implemented by using measure of Cosine similarity and Expectation-Maximization(E-M) with Gaussian Mixture Model. The proposed method has two steps. In first step, the K-Means and E-M methods are combined to partition the input dataset into several smaller sub clusters. In the second step, sub clusters are merged continuously based on maximized Gaussian measure.

References
  1. SimilarityMeasures for text document clustering by Anna Huang
  2. Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval,IEEE, ieeexplore. iee. org
  3. M. Goto, T. Ishida, S. Hirasawa: "Statistical Evaluation of Measure and Distance on Document Classification Problems in Text Mining", IEEE International Conference on Computer and Information Technology, 2007
  4. Expectation–maximization algorithm From Wikipedia, the free encyclopedia.
  5. Robert Hogg, Joseph McKean and Allen Craig. Introductionto Mathematical Statistics. pp. 359–364. Upper Saddle River, NJ: Pearson Prentice Hall, 2005.
  6. David J. C. MacKay,The on-line textbook: Information Theory, Inference, and Learning Algorithm.
  7. ShuhuaRen AlinFanSch. of Inf. Sci. & Eng. , Dalian Polytech. Univ. , Dalian, China: K-means clustering algorithm based on coefficient of variation.
  8. Momin, B. F. ; Kulkarni, P. J. ; Chau-dhari, A,;Web Document Clustering Using Document Index Graph.
  9. Mikawa, K. ; Ishida, T. ; Goto, M. ; Dept. of Creative Sci. & Eng. , Waseda Univ. , Tokyo, Japan. ; A proposal of extended cosine measure for distance metric learning in text classification.
  10. ELdesoky, A. E. Saleh, M. Sakr, N. A. Dept. of Comput. & Syst. , Mansoura Univ. , Mansoura; Novel similarity measure for document clustering based on topic phrases.
  11. H. Chin, X. Deng,"Efficient phrase-based document similarity for clustering".
Index Terms

Computer Science
Information Sciences

Keywords

K-means Expectation-maximization Gaussian Mixture Model Clustering Similarity Measure