We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Web Page Clustering using Latent Semantic Analysis

Published on March 2012 by Lalit A. Patil, S M. Kamalapur, Dhananjay Kanade
International Conference in Computational Intelligence
Foundation of Computer Science USA
ICCIA - Number 6
March 2012
Authors: Lalit A. Patil, S M. Kamalapur, Dhananjay Kanade
5300f4ae-d677-42f3-94f4-ea1dafbd9471

Lalit A. Patil, S M. Kamalapur, Dhananjay Kanade . Web Page Clustering using Latent Semantic Analysis. International Conference in Computational Intelligence. ICCIA, 6 (March 2012), 21-25.

@article{
author = { Lalit A. Patil, S M. Kamalapur, Dhananjay Kanade },
title = { Web Page Clustering using Latent Semantic Analysis },
journal = { International Conference in Computational Intelligence },
issue_date = { March 2012 },
volume = { ICCIA },
number = { 6 },
month = { March },
year = { 2012 },
issn = 0975-8887,
pages = { 21-25 },
numpages = 5,
url = { /proceedings/iccia/number6/5135-1047/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference in Computational Intelligence
%A Lalit A. Patil
%A S M. Kamalapur
%A Dhananjay Kanade
%T Web Page Clustering using Latent Semantic Analysis
%J International Conference in Computational Intelligence
%@ 0975-8887
%V ICCIA
%N 6
%P 21-25
%D 2012
%I International Journal of Computer Applications
Abstract

Web mining techniques such as clustering help to organize the web content into appropriate subject based categories so that their efficient search and retrieval becomes manageable. Traditional WebPages clustering typically uses only the page content (usually the page text) in an appropriate feature vector representation such as Bags of words, termfrequency /inverse document frequency ,etc. and then applies standard clustering algorithms(e.g. K-means, Suffix tree, Query directed clustering). For example, Users can provide captions for images on the internet, provide tags to WebPages and other media content they regularly browse on the internet, etc. Therefore such user – generated content can provide useful information in various form such as meta-data or in more explicit ways such as tags. Typically, WebPages clustering algorithms only use feature extracted from the page text. However, the advent also social –bookmaking websites, such as StumbleUpon and Delicious has led to a huge amount of usergenerated content such as the information that is associated with the WebPages. In multi-view learning, the feature can be split into two subset alone is sufficient for learning. Here as for, unsupervised learning algorithms, multiple views of the data can often help in extracting better features. Canonical Correlation Analysis (CCA) is an unsupervised feature extraction technique for finding dependencies between two (or more) views of the data by maximizing the correlations between the views in a shared subspace. But the drawbacks of the CCA is it gives The first approach is based on an annotation based probabilistic latent semantic analysis (LSA) over document-word and tagword co-occurrence matrices

References
  1. Anusua Trivedi, Piyush Rai, Scott L. DuVall “Exploiting Tag and Word Correlations for Improved Webpage Clustering “SMUC’10, October 30, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM.
  2. S. Poomagal, Dr. T. Hamsapriya, “K-means for Search Results clustering using URL and Tag contents “978-1-61284- 764-1/11/$26.00 ©2011 IEEE.
  3. Lu, C., Chen, X., and Park, E. K. Exploit the tripartite network of social tagging for web clustering. In CIKM ’09 (2009), pp. 1545–1548.
  4. Ramage, D., Heymann, P., Manning, C. D., and Garcia- Molina, H. Clustering the tagged web. In WSDM ’09 (2009)
  5. Kakade, S. M., and Foster, D. P. Multi-view regression via canonical correlation analysis. In COLT’07 (2007)
  6. Ando, R. K., and Zhang, T. Two-view feature generation model for semi-supervised learning. In ICML ’07 (2007)
  7. Bach, F. R., and Jordan, M. I. Kernel independent component analysis. Journal of Machine Learning Research 3 (2003)
  8. Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., and Su, Z. Optimizing web search using social annotations. In WWW ’07 (2007)
  9. Bickel, S., and Scheffer, T. Multi-view clustering. In ICDM ’04 (Washington, DC, USA, 2004), IEEE Computer Society,
  10. Blaschko, M. B., and Lampert, C. H. Correlational spectral clustering. In CVPR (2008).
  11. http://www.stumbleupon.com
  12. http://www.delicious.com.
Index Terms

Computer Science
Information Sciences

Keywords

Canonical Correlation Analysis probabilistic latent semantic analysis term-frequency Web page clustering