Notification: Our email services are now fully restored after a brief, temporary outage caused by a denial-of-service (DoS) attack. If you sent an email on Dec 6 and haven't received a response, please resend your email.
CFP last date
20 December 2024
Reseach Article

Web Document Clustering using Proposed Similarity Measure

Published on December 2014 by P. H. Govardhan, K. P. Wagh, P. N. Chatur
National Conference on Emerging Trends in Computer Technology
Foundation of Computer Science USA
NCETCT - Number 2
December 2014
Authors: P. H. Govardhan, K. P. Wagh, P. N. Chatur
bd3787f5-422d-4d30-a04b-c0fade35f6d3

P. H. Govardhan, K. P. Wagh, P. N. Chatur . Web Document Clustering using Proposed Similarity Measure. National Conference on Emerging Trends in Computer Technology. NCETCT, 2 (December 2014), 15-18.

@article{
author = { P. H. Govardhan, K. P. Wagh, P. N. Chatur },
title = { Web Document Clustering using Proposed Similarity Measure },
journal = { National Conference on Emerging Trends in Computer Technology },
issue_date = { December 2014 },
volume = { NCETCT },
number = { 2 },
month = { December },
year = { 2014 },
issn = 0975-8887,
pages = { 15-18 },
numpages = 4,
url = { /proceedings/ncetct/number2/19088-4022/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Emerging Trends in Computer Technology
%A P. H. Govardhan
%A K. P. Wagh
%A P. N. Chatur
%T Web Document Clustering using Proposed Similarity Measure
%J National Conference on Emerging Trends in Computer Technology
%@ 0975-8887
%V NCETCT
%N 2
%P 15-18
%D 2014
%I International Journal of Computer Applications
Abstract

Recent advance research in data warehousing and data mining emerges various types of information sources. Web documents are the most useful information resources in this era. Efficient uses of these resources are most important for knowledge discovery. Bunch of documents providing related information is to be grouped in one cluster. Finding the similarity between documents is tedious task. There are various similarity measures introduced earlier to solve the problems related to clustering. Proposing new similarity measure to get better results of clustering is reason behind this paper work. As before concern to previous research, there is no consideration of present and absent features in documents. Proposed similarity measure concentrates on both present and absent features in the documents. Concentrating on similarity measure will help to mining technique.

References
  1. Yung-Shen Lin, Jung-Yi Jiang and Shie-Jue Lee," A Similarity Measure for Text Classification and Clustering", IEEE Transactions On Knowledge And Data Engineering, 2013.
  2. Gaddam Saidi Reddy and Dr. R. V. Krishnaiah," Clustering Algorithm with a Novel Similarity Measure", IOSR Journal of Computer Engineering (IOSRJCE),Vol. 4, No. 6, pp. 37-42, Sep-Oct. 2012.
  3. Shady Shehata, Fakhri Karray, and Mohamed S. Kamel, "An Efficient Concept-Based Mining Model for Enhancing Text Clustering", IEEE Transactions On Knowledge And Data Engineering, Vol. 22, No. 10, October 2010.
  4. Anna Huang, Department of Computer Science, The University of Waikato, Hamilton, New Zealand," Similarity Measures for Text Document Clustering", New Zealand Computer Science Research Student Conference (NZCSRSC), Christchurch, New Zealand, April 2008.
  5. H. Chim and X. Deng, "Efficient phrase-based document similarity for clustering", IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 9, pp. 1217 – 1229, 2008.
  6. Yanhong Zhai and Bing Liu, "Web Data Extraction Based on Partial Tree Alignment", International World Wide Web Conference Committee (IW3C2), ACM 1-59593-046, 9/05/2005.
  7. J. Kogan, M. Teboulle and C. K. Nicholas, "Data driven similarity measures for k-means like clustering algorithms", Information Retrieval, Vol. 8, No. 2, pp. 331–349, 2005.
  8. S. Dhillon, J. Kogan and C. Nicholas, " Feature Selection and Document Clustering", In Berry MW Ed. A Comprehensive Survey of Text Mining, 2003.
  9. Syed Masum Emran and Nong Ye, "Robustness of Canberra Metric in ComputerIntrusion Detection", IEEE Workshop onInformation Assurance and Security United States Military Academy, West Point, NY, 5-6 June, 2001.
  10. Alexander Strehl, Joydeep Ghosh, and Raymond Mooney,"Impact of Similarity Measures on Web-page Clustering", Workshop of Artificial Intelligence for Web Search, July 2000.
Index Terms

Computer Science
Information Sciences

Keywords

Cluster Document Vector Inverse Document Frequency Similarity Measure Term Frequency Web Document.