Comparison of Clustering Methods over a Hidden Web Data using Stratification

G. Jaya Suma; R. Manjula

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Comparison of Clustering Methods over a Hidden Web Data using Stratification

by G. Jaya Suma, R. Manjula

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 75 - Number 17

Year of Publication: 2013

Authors: G. Jaya Suma, R. Manjula

10.5120/13206-0779

G. Jaya Suma, R. Manjula . Comparison of Clustering Methods over a Hidden Web Data using Stratification. International Journal of Computer Applications. 75, 17 ( August 2013), 46-51. DOI=10.5120/13206-0779

@article{ 10.5120/13206-0779,

author = { G. Jaya Suma, R. Manjula },

title = { Comparison of Clustering Methods over a Hidden Web Data using Stratification },

journal = { International Journal of Computer Applications },

issue_date = { August 2013 },

volume = { 75 },

number = { 17 },

month = { August },

year = { 2013 },

issn = { 0975-8887 },

pages = { 46-51 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume75/number17/13206-0779/ },

doi = { 10.5120/13206-0779 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:44:33.251074+05:30

%A G. Jaya Suma

%A R. Manjula

%T Comparison of Clustering Methods over a Hidden Web Data using Stratification

%J International Journal of Computer Applications

%@ 0975-8887

%V 75

%N 17

%P 46-51

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper's centre of attention is on the problem of data mining (in general) and clustering (in specific) on a hidden web data. We know that data mining is a process that analyzes and extracts knowledge from large amounts of data which provides useful information to users. Hidden or deep web data is the database located at remote system . So, to access such data, we need query interface or HTML forms. Clustering such type of data is difficult as it is limited to indirect access through query interface and requires more time to access. A novel methodology stratified clustering introduced through sampling of datasets. The samples can only be obtained by submitting queries. It is required to apply efficient sampling method to reduce time consumption and number of queries required to access deep web data. This paper proposes series of steps to accomplish the task. 1) the space of input attributes are categorized into stratum that represents the association between input and output attributes. 2) Efficient sampling method proposed to obtain high estimation accuracy . 3) the samples obtained are used by two clustering methods, stratified k-means clustering and hierarchical clustering. The estimation accuracy of cluster centers of deep web data are compared for simple random sampling against stratified sampling and k-means clustering method against hierarchical clustering method.

References

Hannu Toivonen. Sampling large databases for association rules. In The VLDB Journal, pages 134-145. Morgan Kaufmann, 1996.
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data an Introduction to Cluster Analysis. Wiley InterScience, Newyork, 1990.
A. Kementsietsidis, F. Neven, D. Van de Craen, and S. Vansummeren. Scalable multi-query optimization for exploratory queries over federated scientific database, VLDB Endowment, 1:16-27, 2008.
M. K. Bergman. The Deep Web: Surfacing Hidden Value, Journal of Electronic Publishing, 7, 2001.
D. Braga, S. Ceri, F. Daniel, and D. Martinenghi. Optimization of Multi-domain Queries on the web. VLDB Endownment, 1:562-673, 2008.
U. Srivastava, k. Munagala, J. Widom, and R. Motwani. Query Optimization over web services. In Proceedings of the 32nd VLDB Endownment, pages 255-366, 2006.
Bharat Chaudhari, Manan Parikh. A Comparative study of clustering algorithms using Weka Tools. In Proceedings of International Journal IJAIEM, 2012
Tantan Liu and Gagan Agarwal. Stratified k-means Clustering over a Deep web Data Source. In Proceedings of the 18th ACM SIGKDD International Conference, pages 1113-1121, 2012.
Tantan Liu and Gagan Agarwal. Stratification based Hierarchical Clustering on Deep Web Data source. In Proceedings of 12th SIAM International Conference. Pages 70-81, 2012.
W. Cochran. Sampling Techniques. Wiley and Sons, 1977.

Index Terms

Computer Science

Information Sciences

Keywords

Stratification Stratified Sampling Stratified k-means Clustering Stratified Hierarchical clustering