International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 68 - Number 18 |
Year of Publication: 2013 |
Authors: Chirag Nathwani, Viralkumar Prajapati, Deven Agravat |
10.5120/11680-6493 |
Chirag Nathwani, Viralkumar Prajapati, Deven Agravat . Comparative Study of Web Spam Detection using Data Mining. International Journal of Computer Applications. 68, 18 ( April 2013), 26-29. DOI=10.5120/11680-6493
Today World Wide Web has become one of best sources of information which is result of faster working of search engines. Web spam attempts to sway search engine algorithm in order to boost the page ranking of specific web pages in search engine results than they deserve. One way to detect web spam is using classification that is learning a classification model for classifying web pages to spam or non-spam. Comparative and empirical analysis of web spam detection using data mining techniques like LAD Tree, JRIP, J48 and Random Forest have been presented in this paper. Experiments were carried out on 3 feature sets of standard dataset WEB SPAM UK-2007. Overall results say that Random forest works well with content based features and transformed link based features however LAD tree was found best among 4 in link based features. But, while thinking about time efficiency LAD Tree was found much more time consuming as compare other 3 classification techniques.