Twitter Texts’ Quality Classification using Data Mining and Neural Networks

Ftoon Kedwan; Chanderdhar Sharma

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Data Mining using Modified GFMM Neural Network

April

2015

Monitoring System using GSM

May

2015

ON Tiling Patterns Involving Islamic Stars with an Odd Number of Vertices

March

2013

Design and Implementation of Scalable, Fully Distributed Web Crawler for a Web Search Engine

February

2011

Reseach Article

Twitter Texts’ Quality Classification using Data Mining and Neural Networks

by Ftoon Kedwan, Chanderdhar Sharma

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 178 - Number 32

Year of Publication: 2019

Authors: Ftoon Kedwan, Chanderdhar Sharma

10.5120/ijca2019919167

Ftoon Kedwan, Chanderdhar Sharma . Twitter Texts’ Quality Classification using Data Mining and Neural Networks. International Journal of Computer Applications. 178, 32 ( Jul 2019), 19-27. DOI=10.5120/ijca2019919167

@article{ 10.5120/ijca2019919167,

author = { Ftoon Kedwan, Chanderdhar Sharma },

title = { Twitter Texts’ Quality Classification using Data Mining and Neural Networks },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2019 },

volume = { 178 },

number = { 32 },

month = { Jul },

year = { 2019 },

issn = { 0975-8887 },

pages = { 19-27 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume178/number32/30743-2019919167/ },

doi = { 10.5120/ijca2019919167 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:52:01.224559+05:30

%A Ftoon Kedwan

%A Chanderdhar Sharma

%T Twitter Texts’ Quality Classification using Data Mining and Neural Networks

%J International Journal of Computer Applications

%@ 0975-8887

%V 178

%N 32

%P 19-27

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Purpose: This is an attempt to classify the level of noise in twitter texts which is part of social media data analytics problem. Estimations in recent machine learning & data feeding algorithms researches’ assumptions consider high data quality in social media texts, while they actually lack data accuracy, completeness, and overall quality which leads to the principle of “Garbage In Garbage Out” resulting in bizarre statistical findings. The aim of this project is to predict and classify Twitter data noise levels using a labelled dataset. Methodology: After data cleaning, a clustering technique was used to find the major dimensions in the data imported, and a dimension reduction algorithm was ran using PCA Weighting and the Wight Guided Feature Selection algorithms. They resulted into 6 most significant features which were used in the implementation. An artificial neural network model was trained to predict the Tweets’ quality classes using R and RStudio. The ANN used is Neural Network (NN) and Naïve Bayes (NB) for the purpose of predicting the Twitter text quality. There will be a comparison between the 2 ANN used in terms of accuracy and precision. Findings: Three different aspects of text mining were discovered in twitter data. (1) Neural network gives surprisingly good result as compared to Naive Bayes algorithm, (2) With only 3 hidden layers, a network was created which can predict good or bad class, (3) Preprocessing of the data and implementing predictive algorithms take huge data and very high computational complexity and time. Research results show that Neural Network performs well even without Dropout layer and convolutional layers. The accuracy of the Neural Network is 99%.

References

Wu, Z., & Huang, N. E. (2009). Ensemble empirical mode decomposition: a noise-assisted data analysis method. Advances in adaptive data analysis, 1(01), 1-41.
Aggarwal, C. C., & Wang, H. (2011). Text mining in social networks. In Social network data analytics (pp. 353-378). Springer, Boston, MA.
McCallum, A., & Nigam, K. (1998, July). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (Vol. 752, No. 1, pp. 41-48). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.9324&rep=rep1&type=pdf
Joachims, T. (1999, June). Transductive inference for text classification using support vector machines. In ICML (Vol. 99, pp. 200-209). http://www1.cs.columbia.edu/~dplewis/candidacy/joachims99transductive.pdf
Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine learning, 39(2-3), 103-134. https://link.springer.com/article/10.1023/A:1007692713085
Baker, L. D., & McCallum, A. K. (1998, August). Distributional clustering of words for text classification. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 96-103). ACM. https://dl.acm.org/citation.cfm?id=290970
Göpferich, S. (1995). A pragmatic classification of LSP texts in science and technology. Target. International Journal of Translation Studies, 7(2), 305-326.
Mosquera, A., & Moreda, P. (2012, May). Smile: An informality classification tool for helping to assess quality and credibility in web 2.0 texts. In Proceedings of the ICWSM workshop: Real-Time Analysis and Mining of Social Streams (RAMSS).
Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008, February). Finding high-quality content in social media. In Proceedings of the 2008 international conference on web search and data mining (pp. 183-194). ACM.
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.
Wu, Z., & Huang, N. E. (2009). Ensemble empirical mode decomposition: a noise-assisted data analysis method. Advances in adaptive data analysis, 1(01), 1-41.
Aggarwal, C. C., & Wang, H. (2011). Text mining in social networks. In Social network data analytics (pp. 353-378). Springer, Boston, MA.
Göpferich, S. (1995). A pragmatic classification of LSP texts in science and technology. Target. International Journal of Translation Studies, 7(2), 305-326.
Mosquera, A., & Moreda, P. (2012, May). Smile: An informality classification tool for helping to assess quality and credibility in web 2.0 texts. In Proceedings of the ICWSM workshop: Real- Time Analysis and Mining of Social Streams (RAMSS). classification tasks. Information Processing & Management, 45(4), 427-437.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of machine learning research, 3(Mar), 1289-1305. http://www.jmlr.org/papers/v3/forman03a.html
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.
Clark, E., & Araki, K. (2011). Text normalization in social media: progress, problems and applications for a pre-processing system of casual English. Procedia-Social and Behavioral Sciences, 27, 2-11.
Hu, X., & Liu, H. (2012). Text analytics in social media. In Mining text data (pp. 385-414). Springer US.
Baldwin, T., Cook, P., Lui, M., MacKinlay, A., & Wang, L. (2013). How noisy social media text, how different social media sources? In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 356-364)
He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management, 33(3), 464-472.
Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008, February). Finding high- quality content in social media. In Proceedings of the 2008 international conference on web search and data mining (pp. 183-194). ACM.
Tan, A.-H. (1997). Cascade ARTMAP: Integrating neural computation and symbolic knowledge processing. IEEE Transactions on Neural Networks, 8(2), 237-250.
Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov), 45-66. http://www.jmlr.org/papers/v2/tong01a.html
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010, July). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 841-842). ACM. https://dl.acm.org/citation.cfm?id=1835643
Clark, E., & Araki, K. (2011). Text normalization in social media: progress, problems and applications for a pre-processing system of casual English. Procedia-Social and Behavioral Sciences, 27, 2-11.
Hu, X., & Liu, H. (2012). Text analytics in social media. In Mining text data (pp. 385-414). Springer US.
Baldwin, T., Cook, P., Lui, M., MacKinlay, A., & Wang, L. (2013). How noisy social media text, how diffrnt social media sources? In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 356-364).
He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management, 33(3), 464-472.

Index Terms

Computer Science

Information Sciences

Keywords

Data Mining Twitter Text Quality Twitter Data Classification Classification Algorithms Neural Network Algorithm Text Analysis.