CFP last date
20 January 2025
Reseach Article

Web Mining Techniques to Block Spam Web Sites

by Esraa M. EL-Mohdy, A. F. El-Gamal, Hanan E. Abdelkader
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 181 - Number 8
Year of Publication: 2018
Authors: Esraa M. EL-Mohdy, A. F. El-Gamal, Hanan E. Abdelkader
10.5120/ijca2018917622

Esraa M. EL-Mohdy, A. F. El-Gamal, Hanan E. Abdelkader . Web Mining Techniques to Block Spam Web Sites. International Journal of Computer Applications. 181, 8 ( Aug 2018), 36-42. DOI=10.5120/ijca2018917622

@article{ 10.5120/ijca2018917622,
author = { Esraa M. EL-Mohdy, A. F. El-Gamal, Hanan E. Abdelkader },
title = { Web Mining Techniques to Block Spam Web Sites },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2018 },
volume = { 181 },
number = { 8 },
month = { Aug },
year = { 2018 },
issn = { 0975-8887 },
pages = { 36-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume181/number8/29794-2018917622/ },
doi = { 10.5120/ijca2018917622 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:05:25.383010+05:30
%A Esraa M. EL-Mohdy
%A A. F. El-Gamal
%A Hanan E. Abdelkader
%T Web Mining Techniques to Block Spam Web Sites
%J International Journal of Computer Applications
%@ 0975-8887
%V 181
%N 8
%P 36-42
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The aim of this paper is to introduce a system based on web mining techniques to prevent spamming web pages. The system relies on content analysis, used features are Uniform Resource Locator(URL), Number of words in page Title, Globally Popular Keywords(GPK) and N-GRAM. The proposed system used Decision Tree(DT) rules ; which is the best classifier to detect Web spam content. It produces accuracy of .97 % in detecting spam web sites.

References
  1. Mohammed A. Saleh, Hesham N. El mahdy and Talal Saleh, 2014, "Improvement of Arabic Spam Web Pages Detection using New Robust Features" , Journal of Computer Engineering , Vol.16, Issue 2,pp24-35.
  2. Alexandros Ntoulas , Marc Najork , Mark Manasse and Dennis Fetterly , 2006 , " Detecting Spam Web Pages through Content Analysis " , the International World Wide Web Conference Committee, ACM 1-59593-323-9/06/0005, pp1-10.
  3. D. Saraswathi and A. Vijaya ,2016, " Search Engine Spam Detection using an Integrated Hybrid Genetic Algorithm based Decision Tree " , International Journal of Computer Applications, Volume 133 – No.10,pp 20-27
  4. Sumit Sahu, Bharti Dongre and Rajesh Vadhwani, 2011," Web Spam Detection Using Different Features", International Journal of Soft Computing and Engineering, Volume-1, Issue-3, pp 70-73.
  5. Apostolis Zarras, Antonis Papadogiannakis, Sotiris Ioannidis and Thorsten Holz ,2015," Revealing the Relationship Network Behind Link Spam “, online at https://www.ics.forth.gr/_publications/zarasPST2015
  6. Maria Soledad Pera," A Structural, Content Similarity Measure for Detecting Spam Documents on the Web", https://pdfs.semanticscholar.org/.../2836c81e3c0d1802e76f1acee604.
  7. Mohammed N. Al-Kabi, Heider A. Wahsheh and Izzat M. Alsmadi, 2014, " An Online Arabic Web Spam Detection System", International Journal of Advanced Computer Science and Applications, Vol. 5, No.2, pp 105-110.
  8. Heider A. Wahsheh, Mohammed N. Al-Kabi and Izzat M. Alsmadi ,2013, " A link and Content Hybrid Approach for Arabic Web Spam Detection ", Intelligent Systems and Applications, Published Online December 2012 in MECS (http://www.mecs-press.org/), pp30-43.
  9. Mohammed N. Al-Kabi, Izzat M. Alsmadi and Heider A. Wahsheh, 2015, " Evaluation of Spam Impact on Arabic Websites Popularity", Journal of King Saud University – Computer and Information Sciences, pp 222–229.
  10. Tarek Amr Abdallah and Beatriz de La Iglesia ,2015, "URL-Based Web Page Classification: With n-Gram Language Models",Springer International Publishing Switzerland, CCIS 553, pp. 19–33.
  11. R. Jaramh, T. Saleh, S. Khattab, and I. Farag,2011, “Detecting Arabic spam web pages using content analysis,” International Journal of Reviews in Computing,vol.6, pp.1–8.
  12. Meenakshi and Geetika, 2014, “Survey on Classification Methods using WEKA”,International Journal of Computer Applications, vol. 86,no.18 , pp. 16–19.
  13. Boris Neubert, Sören Pirk, Oliver Deussen and Carsten Dachsbacher ,2010, "Precision and Recall as Appearance Space Quality Measure for Simplified Aggregate Details", Eurographics Symposium on Rendering.
  14. Simone Bassis‏، Anna Esposito‏، Francesco Carlo Morabito and Eros Pasero‏,2015,"Advances in Neural Networks: Computational Intelligence for ICT", Springer , pp.219.
  15. Powers, David M W ,2011, "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation", Journal of Machine Learning Technologies. vol.2 ,pp. 37–63.
  16. Vili Podgorelec, Ivan Rozman and Peter Kokol,2002," Decision Trees: An Overview and Their Use in Medicine", Journal of Medical Systems, DOI: 10.1023/A:1016409317640 ,pp.1-21.
  17. David Sundby,2009, “Spelling correction using N-grams”, http://fileadmin.cs.lth.se/cs/education/EDA171/Reports/2009/david.pdf
  18. David M.W. Powers,2014," What the F-measure doesn't measure",https://www.researchgate.net/publication/273761233_What_the_F-measure_doesn%27t_measure.
Index Terms

Computer Science
Information Sciences

Keywords

Web Mining Spam Web Sites Decision Tree.