International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 176 - Number 20 |
Year of Publication: 2020 |
Authors: K. SanthiSree, V. Vineela, Y. Ambica, Ch. Anitha |
10.5120/ijca2020920151 |
K. SanthiSree, V. Vineela, Y. Ambica, Ch. Anitha . Case Study: Enhanced Clustering Technique on Sequential Data Streams using Optics and Chameleon. International Journal of Computer Applications. 176, 20 ( May 2020), 1-5. DOI=10.5120/ijca2020920151
Huge data is getting accumulated every second in the real world .Clustering on web usage data is useful to identify what users are exactly looking for on the world wide web, like user traversals, users behavior and their characteristics, which helps for Web personalization. Clustering web sessions is to group them based on similarity and consists of minimizing the Intra-cluster similarity and maximizing the Inter-group similarity. In the past there exist multiple similarity measures like Euclidean, Jaccard ,Cosine , Manhattan, Minkowski, and many to measure similarity between web patterns. In this paper, we enhanced Chameleon Clustering Algorithm(CCA) based on CHAMELEON. Experiments are performed on MSNBC.COM website (free online news channel), on sequential data streams in the context of clustering in the domain of Web usage mining. Clustering in data mining is a discovery process that groups a set of data such that the intra-cluster similarity is maximized and the inter-cluster similarity is minimized. Existing clustering algorithms, such as K-means, PAM, CLARANS, DBSCAN, CURE, and ROCK are designed to find clusters that fit some static models Specially, we present a detailed comparison of OPTICS and CHAMELEON and the results illustrate that CHAMELEON is much more suitable for clustering the dynamic datasets. The Inter-cluster and Intra-cluster distances are computed using Average Levenshtein Distance (ALD) to demonstrate the usefulness of the proposed approach in the context of web usage mining. This new enhanced (CHAMELEON algorithm)has good results when compared with existing OPTICS clustering technique , and provided good time requirements of the newly developed algorithms.