CFP last date
20 December 2024
Reseach Article

Apache Spark based Big Data Analytics for Social Network Cybercrime Forensics

by Simon Mulwa Kiio, Elisha O. Abade
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 179 - Number 8
Year of Publication: 2017
Authors: Simon Mulwa Kiio, Elisha O. Abade
10.5120/ijca2017916021

Simon Mulwa Kiio, Elisha O. Abade . Apache Spark based Big Data Analytics for Social Network Cybercrime Forensics. International Journal of Computer Applications. 179, 8 ( Dec 2017), 24-33. DOI=10.5120/ijca2017916021

@article{ 10.5120/ijca2017916021,
author = { Simon Mulwa Kiio, Elisha O. Abade },
title = { Apache Spark based Big Data Analytics for Social Network Cybercrime Forensics },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2017 },
volume = { 179 },
number = { 8 },
month = { Dec },
year = { 2017 },
issn = { 0975-8887 },
pages = { 24-33 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume179/number8/28758-2017916021/ },
doi = { 10.5120/ijca2017916021 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:54:49.550180+05:30
%A Simon Mulwa Kiio
%A Elisha O. Abade
%T Apache Spark based Big Data Analytics for Social Network Cybercrime Forensics
%J International Journal of Computer Applications
%@ 0975-8887
%V 179
%N 8
%P 24-33
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The anonymity of social networks makes its attractive for cyber criminals to mask their criminal activities online posing a challenge to law enforcers in tracking and uncovering the perpetrators as most evidence is hidden within big data. With this ever-increasing volume of data, forensic analyst faces challenges in investigations involving huge data volumes while at the same time limited by computer processor, memory and storage resources of a single computer node. With increased social media data and the high rate of production, it has become difficult to collect, store and analyze such big data using traditional forensic tools. This study involved the application of apache spark and big data analytic in forensic analysis of social network cybercrimes such as hate speech, cyberbullying and demonstrated the application of data analytics in supplementing the challenges of traditional forensic tools in investigations involving Big Data. The study developed an apache spark based forensic tool to stream and analysis social media data for hate speech and cyberbully cybercrimes while diving to investigate relevant artifacts found on Twitter social network and ways to collect, preserve and ensure authenticity of the evidence. The study employed Naïve Bayes algorithm within Spark ML API to automatically classify and categorize hate speech and cyberbullying found within Twitter social media. The study showed that by generating SHA-256 Hash key for each tweet item within DStreams and storing tweet data together with corresponding Hash key in MongoDB can be used in tweet evidence preservation and authentication. Again, by streaming full tweet Account metadata, the study revealed that such metadata can be used in authenticating the creator, source, date and time for a given hate speech tweet.

References
  1. Edwards, D., Computer Forensic Timeline Analysis with Tapestry. 2011, SANS Gold Paper accepted November.
  2. Press, E.-C., Computer Forensics: Investigation procedures and response. Course Technology Cengage learning, USA, 2010.
  3. Johnsen, J.W., Algorithms and Methods for Organised Cybercrime Analysis. 2016.
  4. Gupta, R. and H. Brooks, Using Social Media for Global Security. 2013: John Wiley & Sons.
  5. Berman, J.J., Principles of big data: preparing, sharing, and analyzing complex information. 2013: Newnes.
  6. Wijeratne, S., et al. Analyzing the social media footprint of street gangs. in Intelligence and Security Informatics (ISI), 2015 IEEE International Conference on. 2015. IEEE.
  7. Agarwal, S. and A. Sureka, Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats. arXiv preprint arXiv:1511.06858, 2015.
  8. Chen, Y., et al. Detecting offensive language in social media to protect adolescent online safety. in Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom). 2012. IEEE.
  9. Baesens, B., V. Van Vlasselaer, and W. Verbeke, Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection. 2015: John Wiley & Sons.
  10. Patzakis, J. Overcoming Potential Legal Challenges to the Authentication of Social Media Evidence. 2012 [cited 2016 17/12/2016]; Available from: https://articles.forensicfocus.com/2012/04/02/overcoming-potential-legal-challenges-to-the-authentication-of-social-media-evidence/.
  11. Ncr, P.C., et al., CRISP-DM 1.0. 1999.
Index Terms

Computer Science
Information Sciences

Keywords

Big data forensics Social network forensics Hate speech Big data analytics Mongodb Apache Spark Social network cybercrimes Spark Streaming