Comparison of Keyword based Clustering of Web Documents by using OPENSTACK 4J and by Traditional Method

Shiza Anand; Pradeep Pant; Mukesh Rawat

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Comparison of Keyword based Clustering of Web Documents by using OPENSTACK 4J and by Traditional Method

by Shiza Anand, Pradeep Pant, Mukesh Rawat

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 156 - Number 9

Year of Publication: 2016

Authors: Shiza Anand, Pradeep Pant, Mukesh Rawat

10.5120/ijca2016912583

Shiza Anand, Pradeep Pant, Mukesh Rawat . Comparison of Keyword based Clustering of Web Documents by using OPENSTACK 4J and by Traditional Method. International Journal of Computer Applications. 156, 9 ( Dec 2016), 39-45. DOI=10.5120/ijca2016912583

@article{ 10.5120/ijca2016912583,

author = { Shiza Anand, Pradeep Pant, Mukesh Rawat },

title = { Comparison of Keyword based Clustering of Web Documents by using OPENSTACK 4J and by Traditional Method },

journal = { International Journal of Computer Applications },

issue_date = { Dec 2016 },

volume = { 156 },

number = { 9 },

month = { Dec },

year = { 2016 },

issn = { 0975-8887 },

pages = { 39-45 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume156/number9/26741-2016912583/ },

doi = { 10.5120/ijca2016912583 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:02:12.106574+05:30

%A Shiza Anand

%A Pradeep Pant

%A Mukesh Rawat

%T Comparison of Keyword based Clustering of Web Documents by using OPENSTACK 4J and by Traditional Method

%J International Journal of Computer Applications

%@ 0975-8887

%V 156

%N 9

%P 39-45

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

As the number of hypertext documents are increasing continuously day by day on world wide web. Therefore, clustering methods will be required to bind documents into the clusters (repositories) according to the similarity lying between the documents. Various clustering methods exist such as: Hierarchical Based, K-means, Fuzzy Logic Based, Centroid Based etc. These keyword based clustering methods takes much more amount of time for creating containers and putting documents in their respective containers. These traditional methods use File Handling techniques of different programming languages for creating repositories and transferring web documents into these containers. In contrast, openstack4j SDK is a new technique for creating containers and shifting web documents into these containers according to the similarity in much more less amount of time as compared to the traditional methods. Another benefit of this technique is that this SDK understands and reads all types of files such as jpg, html, pdf, doc etc. This paper compares the time required for clustering of documents by using openstack4j and by traditional methods and suggests various search engines to adopt this technique for clustering so that they give result to the user queries in less amount of time.

References

Kevin Jackson and Cody Bunch, “OpenStack Cloud Computing Cookbook”, Second Edition, 2001, Page No.400
Tom Fifield, Diane Fleming & Joe Topjian, “OpenStack Operations Guide” by O’Reilly Publications, @itarchitectkev, openstack.prov12n.com
John Rhoton,Jan De Clercq, Franz Novak, “OpenStack Cloud Computing”, Architecture Guide, 2014 Edition, Recursive Press Publications, March11, 2014
Dan Radez, “OpenStack Essentials”,PACKT Publishing, 2012, www,PacktPub.com
Charu C.Aggarwal, “Data Clustering Algorithms & Applications”, January1, 2013.
Junjie Wu, “Adances In K-Means Clustering”, January1, 2012.
Sewell, Grandville, and P.J. Rousseau, “Finding groups in data: An introduction to cluster analysis”, 1990,2005, Page no,223
Ipeirotis, P., Gravano, L. & Mehran, S. (2001), ‘Probe, count, and classify: categorizing web databases’, ACM SIGMOD 30(2), 67 – 78.
Omar Khedher, ” Mastering OpenStack”, Packt Publishing, www,PacktPub.com
IBM-Object Storage, www.ibm.com/object-storage/ , Date: 18.05.2016, Time: 11.30am
[11 Object Storage- IBM Bluemix, https://console.ng.bluemix.net/object-storage/ Date:29.05.2016, Time: 10.00am.
I.Ceema, M.Kavitha, G.Renukadevi, G.sripriya, S. RajeshKumar, “Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation”, International Journal of Advanced Research in Computer Science and Electronics Engineering, Volume 1, Issue 5, November 2012, ISSN: 2277 – 9043,
Wei Xu, Xin Liu, Yihong Gong, “Document Clustering Based On Non-negative Matrix Factorization”,
Rajendra Kumar Roul, Omanwar Rohit Devanand, S. K. Sahay, ”Web Document Clustering and Ranking using Tf-Idf based Apriori Approach”,
Oren Zamir and Oren Etzioni,“Web Document Clustering: A Feasibility Demonstration”, SIGIR’98, Melbourne, Australia 1998 ACM 1-58113-015-5 8/98,
Hua-Jun Zeng, Qi-Cai He, Zheng Chen, Wei-Ying Ma, Jinwen Ma, “Learning to Cluster Web Search Results”, SIGIR’04, July 25–29, 2004, Sheffield, South Yorkshire, UK., ACM 1-58113-881-4/04/0007,
Yongzheng Zhang Evangelos Milios Nur Zincir-Heywood, “A Comparison of Word- and Term-based Methods for Automatic Web Site Summarization”, WWW2004, May 17–22, 2004, New York, NY USA, ACM,
Mohammad Rezaei and Pasi Fränti, “Matching Similarity for Keyword-based Clustering”, adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011.

Index Terms

Computer Science

Information Sciences

Keywords

Clustering openstack4j K-Means centroid based document-matching