Improved Feature Selection for Better Classification in Twitter

Saumya Goyal; Shabnam Parveen

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Global Positioning System for Object Tracking

January

2015

Case Tool: Fast Interconnections with New 3-Disjoint Paths MIN Simulation Module

April

2011

Using Clustering Approach Privacy Preserving Update to Anonymous and Confidential Databases

April

2015

Graphical Analysis of Kampe De Feriet’s Series with Implementation of MATLAB

December

2012

Reseach Article

Improved Feature Selection for Better Classification in Twitter

by Saumya Goyal, Shabnam Parveen

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 122 - Number 1

Year of Publication: 2015

Authors: Saumya Goyal, Shabnam Parveen

10.5120/21664-4737

Saumya Goyal, Shabnam Parveen . Improved Feature Selection for Better Classification in Twitter. International Journal of Computer Applications. 122, 1 ( July 2015), 13-18. DOI=10.5120/21664-4737

@article{ 10.5120/21664-4737,

author = { Saumya Goyal, Shabnam Parveen },

title = { Improved Feature Selection for Better Classification in Twitter },

journal = { International Journal of Computer Applications },

issue_date = { July 2015 },

volume = { 122 },

number = { 1 },

month = { July },

year = { 2015 },

issn = { 0975-8887 },

pages = { 13-18 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume122/number1/21664-4737/ },

doi = { 10.5120/21664-4737 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:09:25.937813+05:30

%A Saumya Goyal

%A Shabnam Parveen

%T Improved Feature Selection for Better Classification in Twitter

%J International Journal of Computer Applications

%@ 0975-8887

%V 122

%N 1

%P 13-18

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Social networks are widely used as a communication tools by millions of people and their friends. In today's era everybody is online and use social network for interaction, to gain knowledge, for business purpose, politics and many more. But along with positive approach of using these tools some infect many negative approaches are also applied on these tools for executing malwares and spam messages. Spam on twitter has become one of the most trending topics of research in recent years. And many researchers have done work on it but make some very complex structure to detect spam but still cannot achieve that level of accuracy in detection. So to gain the greater level of accuracy and to reduce the complexity of structure this work proposes a simplified model to detect the spam tweets which are spread by unauthorised users or by spammers. And this is analysed by feature extraction and applying classifiers. The text and content attribute features are extracted by pre-processing and forming a feature vector matrix. Moreover K-nearest neighbour (KNN) and decision tree two classifier algorithms are applied to show the comparative results. The results are evaluated with False positive rate (FPR), F- measures, True positive rate (TPR) and accuracy with improved detection results.

References

Saini Jacob Soman, Dr. S. Murugappan, "Detecting Malicious Tweets in Trending Topics using Clustering and Classification", 2014 International Conference on Recent Trends in Information Technology, IEEE
Mashable. Twitter now has more than 200 million monthly active users. [Online]. Available:http://mashable. com/2012/12/18/twitter-200-million-active-users/
Washington Post. Twitter turns 7: Users send over 400 million tweets per day. [Online]. Available: http://articles. washingtonpost. com/2013-03-21/ business/37889387_1_tweets-jack-dorsey-twitter.
Il-Chul Moon,Dongwoo Kim, Yohan Jo and Alice O, "Analysis of twitter lists as a potential source for discovering latent characteristics of users," in CHI 2010 Workshop on Microblogging: What and How Can We Learn From It?, 2010.
Neetu Sharma, GaganpreetKaur,et all, "Survey on Text Classification (Spam) Using Machine Learning", (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (4) , 2014, 5098-5102
Gordon V. Cormack, David R. Cheriton, "Email Spam Filtering: A Systematic Review", Foundations and Trends ®in Information Retrieval Vol. 1, No. 4 (2006) 335–455©2008.
C. Yang, R. C. Harkreader, and G. Gu, "Die free or live hard? Empirical evaluation and new design for fighting evolving twitter spammers", in 14th International Symposium, (RAID 2011), CA, USA, Proceedings in LNCS Series, Springer, Vol. 6961, pp. 318–337, 2011.
N. Villeneuve, "Koobface: Inside a crimeware network", Munk School of Global Affairs, (JR04-2010), 2010.
K. J. Nishanth, V. Ravi, N. Ankaiah, and I. Bose, "Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts", in Expert Systems with Applications, Vol. 39, Issue 12, pp. 10583–10589, 2012.
V. Ramanathan, and H. Wechsler, "phishGILLNET–phishing detection using probabilistic latent semantic analysis", in EURASIP Journal on Information Security, 2012.
J. Nazario, "Twitter-based botnet command channel", [Online]. Available:http://asert. arbornetworks. com/2009/08/twitter-based-botnetcommandchannel
Saini Jacob Soman, Dr. S. Murugappan, "Bayesian Probabilistic Tensor Factorization for Malicious Tweets in Trending Topics", 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), IEEE
Grant Stafford, Louis Lei Yu, "An Evaluation of the Effect of Spam on Twitter Trending Topics", 978-0-7695-5137-1/13 © 2013 IEEE
Y. Zhou and Z. -W. Cao, "Research on the construction and filter method of stop-word list in text preprocessing," in Proc. 4th ICICTA, Shenzhen, China, 2011, vol. 1, pp. 217–221.
W. Francis and H. Kucera, "Frequency analysis of English usage: Lexicon and grammar," J. English Linguistics, vol. 18, no. 1, pp. 64–70, Apr. 1982.

Index Terms

Computer Science

Information Sciences

Keywords

K-nearest neighbour Decision tree classifier algorithm Pre-processing Social network Spam detection