CFP last date
20 January 2025
Reseach Article

Comparison of Clustering Methods over a Hidden Web Data using Stratification

by G. Jaya Suma, R. Manjula
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 75 - Number 17
Year of Publication: 2013
Authors: G. Jaya Suma, R. Manjula
10.5120/13206-0779

G. Jaya Suma, R. Manjula . Comparison of Clustering Methods over a Hidden Web Data using Stratification. International Journal of Computer Applications. 75, 17 ( August 2013), 46-51. DOI=10.5120/13206-0779

@article{ 10.5120/13206-0779,
author = { G. Jaya Suma, R. Manjula },
title = { Comparison of Clustering Methods over a Hidden Web Data using Stratification },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 75 },
number = { 17 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 46-51 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume75/number17/13206-0779/ },
doi = { 10.5120/13206-0779 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:44:33.251074+05:30
%A G. Jaya Suma
%A R. Manjula
%T Comparison of Clustering Methods over a Hidden Web Data using Stratification
%J International Journal of Computer Applications
%@ 0975-8887
%V 75
%N 17
%P 46-51
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper's centre of attention is on the problem of data mining (in general) and clustering (in specific) on a hidden web data. We know that data mining is a process that analyzes and extracts knowledge from large amounts of data which provides useful information to users. Hidden or deep web data is the database located at remote system . So, to access such data, we need query interface or HTML forms. Clustering such type of data is difficult as it is limited to indirect access through query interface and requires more time to access. A novel methodology stratified clustering introduced through sampling of datasets. The samples can only be obtained by submitting queries. It is required to apply efficient sampling method to reduce time consumption and number of queries required to access deep web data. This paper proposes series of steps to accomplish the task. 1) the space of input attributes are categorized into stratum that represents the association between input and output attributes. 2) Efficient sampling method proposed to obtain high estimation accuracy . 3) the samples obtained are used by two clustering methods, stratified k-means clustering and hierarchical clustering. The estimation accuracy of cluster centers of deep web data are compared for simple random sampling against stratified sampling and k-means clustering method against hierarchical clustering method.

References
  1. Hannu Toivonen. Sampling large databases for association rules. In The VLDB Journal, pages 134-145. Morgan Kaufmann, 1996.
  2. L. Kaufman and P. J. Rousseeuw. Finding Groups in Data an Introduction to Cluster Analysis. Wiley InterScience, Newyork, 1990.
  3. A. Kementsietsidis, F. Neven, D. Van de Craen, and S. Vansummeren. Scalable multi-query optimization for exploratory queries over federated scientific database, VLDB Endowment, 1:16-27, 2008.
  4. M. K. Bergman. The Deep Web: Surfacing Hidden Value, Journal of Electronic Publishing, 7, 2001.
  5. D. Braga, S. Ceri, F. Daniel, and D. Martinenghi. Optimization of Multi-domain Queries on the web. VLDB Endownment, 1:562-673, 2008.
  6. U. Srivastava, k. Munagala, J. Widom, and R. Motwani. Query Optimization over web services. In Proceedings of the 32nd VLDB Endownment, pages 255-366, 2006.
  7. Bharat Chaudhari, Manan Parikh. A Comparative study of clustering algorithms using Weka Tools. In Proceedings of International Journal IJAIEM, 2012
  8. Tantan Liu and Gagan Agarwal. Stratified k-means Clustering over a Deep web Data Source. In Proceedings of the 18th ACM SIGKDD International Conference, pages 1113-1121, 2012.
  9. Tantan Liu and Gagan Agarwal. Stratification based Hierarchical Clustering on Deep Web Data source. In Proceedings of 12th SIAM International Conference. Pages 70-81, 2012.
  10. W. Cochran. Sampling Techniques. Wiley and Sons, 1977.
Index Terms

Computer Science
Information Sciences

Keywords

Stratification Stratified Sampling Stratified k-means Clustering Stratified Hierarchical clustering