CFP last date
20 January 2025
Reseach Article

Improved Feature Selection for Better Classification in Twitter

by Saumya Goyal, Shabnam Parveen
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 122 - Number 1
Year of Publication: 2015
Authors: Saumya Goyal, Shabnam Parveen
10.5120/21664-4737

Saumya Goyal, Shabnam Parveen . Improved Feature Selection for Better Classification in Twitter. International Journal of Computer Applications. 122, 1 ( July 2015), 13-18. DOI=10.5120/21664-4737

@article{ 10.5120/21664-4737,
author = { Saumya Goyal, Shabnam Parveen },
title = { Improved Feature Selection for Better Classification in Twitter },
journal = { International Journal of Computer Applications },
issue_date = { July 2015 },
volume = { 122 },
number = { 1 },
month = { July },
year = { 2015 },
issn = { 0975-8887 },
pages = { 13-18 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume122/number1/21664-4737/ },
doi = { 10.5120/21664-4737 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:09:25.937813+05:30
%A Saumya Goyal
%A Shabnam Parveen
%T Improved Feature Selection for Better Classification in Twitter
%J International Journal of Computer Applications
%@ 0975-8887
%V 122
%N 1
%P 13-18
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Social networks are widely used as a communication tools by millions of people and their friends. In today's era everybody is online and use social network for interaction, to gain knowledge, for business purpose, politics and many more. But along with positive approach of using these tools some infect many negative approaches are also applied on these tools for executing malwares and spam messages. Spam on twitter has become one of the most trending topics of research in recent years. And many researchers have done work on it but make some very complex structure to detect spam but still cannot achieve that level of accuracy in detection. So to gain the greater level of accuracy and to reduce the complexity of structure this work proposes a simplified model to detect the spam tweets which are spread by unauthorised users or by spammers. And this is analysed by feature extraction and applying classifiers. The text and content attribute features are extracted by pre-processing and forming a feature vector matrix. Moreover K-nearest neighbour (KNN) and decision tree two classifier algorithms are applied to show the comparative results. The results are evaluated with False positive rate (FPR), F- measures, True positive rate (TPR) and accuracy with improved detection results.

References
  1. Saini Jacob Soman, Dr. S. Murugappan, "Detecting Malicious Tweets in Trending Topics using Clustering and Classification", 2014 International Conference on Recent Trends in Information Technology, IEEE
  2. Mashable. Twitter now has more than 200 million monthly active users. [Online]. Available:http://mashable. com/2012/12/18/twitter-200-million-active-users/
  3. Washington Post. Twitter turns 7: Users send over 400 million tweets per day. [Online]. Available: http://articles. washingtonpost. com/2013-03-21/ business/37889387_1_tweets-jack-dorsey-twitter.
  4. Il-Chul Moon,Dongwoo Kim, Yohan Jo and Alice O, "Analysis of twitter lists as a potential source for discovering latent characteristics of users," in CHI 2010 Workshop on Microblogging: What and How Can We Learn From It?, 2010.
  5. Neetu Sharma, GaganpreetKaur,et all, "Survey on Text Classification (Spam) Using Machine Learning", (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (4) , 2014, 5098-5102
  6. Gordon V. Cormack, David R. Cheriton, "Email Spam Filtering: A Systematic Review", Foundations and Trends ®in Information Retrieval Vol. 1, No. 4 (2006) 335–455©2008.
  7. C. Yang, R. C. Harkreader, and G. Gu, "Die free or live hard? Empirical evaluation and new design for fighting evolving twitter spammers", in 14th International Symposium, (RAID 2011), CA, USA, Proceedings in LNCS Series, Springer, Vol. 6961, pp. 318–337, 2011.
  8. N. Villeneuve, "Koobface: Inside a crimeware network", Munk School of Global Affairs, (JR04-2010), 2010.
  9. K. J. Nishanth, V. Ravi, N. Ankaiah, and I. Bose, "Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts", in Expert Systems with Applications, Vol. 39, Issue 12, pp. 10583–10589, 2012.
  10. V. Ramanathan, and H. Wechsler, "phishGILLNET–phishing detection using probabilistic latent semantic analysis", in EURASIP Journal on Information Security, 2012.
  11. J. Nazario, "Twitter-based botnet command channel", [Online]. Available:http://asert. arbornetworks. com/2009/08/twitter-based-botnetcommandchannel
  12. Saini Jacob Soman, Dr. S. Murugappan, "Bayesian Probabilistic Tensor Factorization for Malicious Tweets in Trending Topics", 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), IEEE
  13. Grant Stafford, Louis Lei Yu, "An Evaluation of the Effect of Spam on Twitter Trending Topics", 978-0-7695-5137-1/13 © 2013 IEEE
  14. Y. Zhou and Z. -W. Cao, "Research on the construction and filter method of stop-word list in text preprocessing," in Proc. 4th ICICTA, Shenzhen, China, 2011, vol. 1, pp. 217–221.
  15. W. Francis and H. Kucera, "Frequency analysis of English usage: Lexicon and grammar," J. English Linguistics, vol. 18, no. 1, pp. 64–70, Apr. 1982.
Index Terms

Computer Science
Information Sciences

Keywords

K-nearest neighbour Decision tree classifier algorithm Pre-processing Social network Spam detection