CFP last date
20 December 2024
Reseach Article

An evolved classification and clustering approach for the detection of web spam

Published on March 2012 by Sumedha S. Parshurame
2nd National Conference on Innovative Paradigms in Engineering and Technology (NCIPET 2013)
Foundation of Computer Science USA
NCIPET - Number 13
March 2012
Authors: Sumedha S. Parshurame
0a24c7e9-1269-44e8-8208-a4e773ef7e3a

Sumedha S. Parshurame . An evolved classification and clustering approach for the detection of web spam. 2nd National Conference on Innovative Paradigms in Engineering and Technology (NCIPET 2013). NCIPET, 13 (March 2012), 26-29.

@article{
author = { Sumedha S. Parshurame },
title = { An evolved classification and clustering approach for the detection of web spam },
journal = { 2nd National Conference on Innovative Paradigms in Engineering and Technology (NCIPET 2013) },
issue_date = { March 2012 },
volume = { NCIPET },
number = { 13 },
month = { March },
year = { 2012 },
issn = 0975-8887,
pages = { 26-29 },
numpages = 4,
url = { /proceedings/ncipet/number13/5291-1103/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 2nd National Conference on Innovative Paradigms in Engineering and Technology (NCIPET 2013)
%A Sumedha S. Parshurame
%T An evolved classification and clustering approach for the detection of web spam
%J 2nd National Conference on Innovative Paradigms in Engineering and Technology (NCIPET 2013)
%@ 0975-8887
%V NCIPET
%N 13
%P 26-29
%D 2012
%I International Journal of Computer Applications
Abstract

Web spam denotes the manipulation of web pages with the sole intent to raise their position in search engine rankings. Since a better position in the rankings directly and positively affects the number of visits to a site, attackers use different techniques to boost their pages to higher ranks. In the best case, web spam pages are a nuisance that provide undeserved advertisement revenues to the page owners. In the worst case, these pages pose a threat to Internet users by hosting malicious content and launching drive-by attacks against unsuspecting victims. When successful, these drive-by attacks then install malware on the victims machines. In this paper we introduce a clustering and classification approach to detect spam web pages in the list of results that are returned by a search engine. Initially, we apply K-nearest neighbor approach for clustering. And then we will apply K-means classification over those links for categorizing them as either spam or non-spam links.

References
  1. Associate Professor & Head, Computer Science Department,Vellalar college for women, Erode, “Link spam detection using fuzzy c-means clustering” International Journal of Next-Generation Networks (IJNGN) Vol.2, No.4, December 2010.
  2. Van Lam Le, Ian Welch, Xiaoying Gao, Peter Komisarczuk School of Engineering and Computer Science, Victoria University of Wellington P.O. Box 600, Wellington 6140, New Zealand “Two-Stage Classification Model to Detect Malicious Web Pages” 2011 International Conference on Advanced Information Networking and Applications
  3. Lourdes Araujo and Juan Martinez-Romo “Web Spam Detection: New Classification eatures Based on Qualified Link Analysis nd Language Models”ieee transactions on information forensics and security, vol. 5, no. 3, september 2010 581
  4. Thomas Largillier, Sylvain Peyronnet “Lightweight Clustering Methods for Webspam Demotion” 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
  5. Chakrit Likitkhajorn, Athasit Surarerks, Arnon Rungsawang “A Novel Approach for Spam Detection Using Boosting Pages” 20ll Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)
  6. Dr.S.K.Jayanthi1 and Ms.S.Sasikala“link spam detection based on dbspamclustwith fuzzy c-means clustering” International Journal of Next-Generation Networks (IJNGN) Vol.2, No.4, December 2010.
Index Terms

Computer Science
Information Sciences

Keywords

Data mining K-nearest neighbor K-means algorithm Spam and Non-spam links Search Engine