CFP last date
20 December 2024
Reseach Article

Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC

by Saima Haseeb, Mahak Motwani, Amit Saxena
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 74 - Number 17
Year of Publication: 2013
Authors: Saima Haseeb, Mahak Motwani, Amit Saxena
10.5120/12975-9567

Saima Haseeb, Mahak Motwani, Amit Saxena . Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC. International Journal of Computer Applications. 74, 17 ( July 2013), 9-14. DOI=10.5120/12975-9567

@article{ 10.5120/12975-9567,
author = { Saima Haseeb, Mahak Motwani, Amit Saxena },
title = { Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 74 },
number = { 17 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 9-14 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume74/number17/12975-9567/ },
doi = { 10.5120/12975-9567 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:42:59.799815+05:30
%A Saima Haseeb
%A Mahak Motwani
%A Amit Saxena
%T Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC
%J International Journal of Computer Applications
%@ 0975-8887
%V 74
%N 17
%P 9-14
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the rapid growth of Internet, E-mail, with its convenient and efficient characteristics, has become an important means of communication in people's life. It reduces the cost of communication. It comes with Spam. Spam emails, also known as 'junk e-mails', are unsolicited one's sent in bulk with hidden or forged identity of the sender, address, and header information. It is vital to pursue more effective spam filtering approaches to maintain normal operations of e-mail systems and to protect the interests of email users. In this paper we developed a Spam filter based on Bayesian filtering method using Aho-corasick and PFAC string matching algorithm. This filter developed an improved version of spam filter based on traditional Bayesian spam filtering to improve spam filtering efficiency, and to reduce chances of misjudgement of malignant spam. For further improvement of Spam filtering process we are transform the filter in to parallel spam filter on GPGPU's by using PFAC Algorithm.

References
  1. Wu, Y. L. , "Using Visual Features For Anti-Spam Filtering, "2005 IEEE International Conference on Image Processing (ICIP2005), pp. 509–512, 2005. Postini : Email Monitoring + Email Filtering Blog. http://www. dicontas. co. uk/blog/quick-facts/emailspam-trafficrockets/65/.
  2. Toshihiro Tabata, "SPAM mail filtering : commentary of Bayesian filter, " The journal of Information Science and Technology Association, Vol. 56, No. 10, pp. 464-468, 2006.
  3. http://www. cs. nmt. edu/~janbob/SPAM, Spam corpus, SMS corpus,
  4. http://www. comp. nus. edu. sg/~rpnlpir/downloads/corpora/smsCorpus/
  5. Amayri O, Bouguil N (2009). Online Spam Filtering Using Support Vector Machines. IEEE. , pp. 337- 340.
  6. C. Pu, S. Webb, O. Kolesnikov, W. Lee, and R. Lipton. Towards the Integration of Diverse Spam Filtering Techniques. In Proc. of IEEE International Conference on Granular Computing, pages 7 – 10, 2006.
  7. I. Androutsopoulos and et. , "An Evaluation of Naïve Bayesian Anti-Spam Filtering", 11th EurpoeanConference on Machine Learning, pp 9-17, Barcelona, Spain, June 2000
  8. Paul Graham, "Better Bayesian Filter" ,http://www. paulgraham. com/better. htm
  9. A. V. Aho and M. J. Corasick, "Efficient String Matching: An aid Bibliographic search". In Communication of the ACM Vol. 18, issues 6, pp. -333-340, 1975.
  10. Cheng-Hung Lin and Shih-Chieh-Chang," Efficient pattern matching algorithm for memory architecture", Vol. 19, issue 1, pp. 33-41, January 2011.
  11. Chengguo Chang and Hui Wang," Comparison of Two-Dimensional String Matching Algorithms"In the proc. International Conference on Computer Science and Electronics Engineering (ICCSEE), Vol. 3, pp. 608-611,march 2012.
  12. Raphael Clifford, Markus Jalsenius, Ely Porat and Benjamin Sach,"Pattern matching in multiple stream", in the proc. 23rd Annual conference on Combinatorial Pattern Matching, pp. 97-109,2012.
  13. R. Takahashi, U. Inoue, "Parallel Text Matching Using GPGPU", in the proc. 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 242-246, Aug. 2012.
  14. C. Lin, et al. , "Accelerating String Matching Using Multi-Threaded Algorithm on GPU," Proc. IEEE Global Telecommunications Conf. , pp. 1-5, 2010.
  15. J. D. Owens, et al. , "A Survey of General-Purpose Computation on Graphics Hardware," Computer Graphics forum, Vol. 26, No. 1, pp. 80-113, 2007.
  16. C. Lin, C. Liu, L. Chien, and S. Chang," Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs", IEEE Transactions on computers, vol. pp, issue 1.
  17. ZhaXinyan and S. Sahni," Multipattern string matching on a GPU", In the proc. IEEE conference on Computers and Communications (ISCC), pp. 277-282, July 2011.
  18. Tran Nhat-Phuong, Lee Myungho, Hong Sugwon and Minho Shin," Memory Efficient Parallelization for Aho-Corasick Algorithm on a GPU", IEEE 14th International Conference on High Performance Computing and Communication, pp. 432-438, June 2012.
  19. Jungwon Kim, Honggyu Kim, Joo Hwan Lee and Jaejin Lee," Achieving a single compute device image in OpenCL for multiple GPUs", Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, pp. 277-288,2011.
  20. NVIDIA, "CUDA Best Practices Guide: NVIDIA CUDA C Programming Best Practices Guide – CUDA Toolkit 4. 0", May, 2011
  21. Xinyan Zha and Sartaj Sahni," GPU-to-GPU and Host-to-Host Multipattern String Matching on a GPU", IEEE Transactions on Computers, Volume 62, Issue 6, pp. 1156-1169,2013
  22. J. E. Stone, D. Gohara, and G. Shi, "OpenCl: A parallel programming standard for heterogeneous computing systems, "Computing in Science Engineering,vol. 12,no. 3,pp. 66-73,2010.
  23. HyeranJeon, Xia Yinglong and V. K. Prasanna," Parallel Exact Inference on a CPU-GPGPU Heterogeneous System", In the proc. 39th International Conference on parallel Processing (ICPP), pp. 61-70,Sept. 2010.
  24. Liang Hu, CheXilong and XieZhenzhen,"GPGPU cloud: A paradigm for general purpose computing", Tsinghua Science and Technology, Vol. 18, issue 1, pp. 22-23, Feb. 2013.
  25. M. C. Schatz and C. Trapnell, "Fast Exact String Matching on the GPU," Technical report
Index Terms

Computer Science
Information Sciences

Keywords

Spam Filter Bayesian Spam Filter Aho-Corasick PFAC