Multi-Modal Machine Learning for Political Video Advertisement Analysis: Integrating Audio, Textual, and Visual Features

Moulik Kumar; Satish Gopalani; Pranav Gupta

Call for Paper

December Edition

IJCA solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 20 November 2025

Submit your paper

Know more

The week's pick

A Hybrid Transformer-CNN Framework with Early and Late Fusion for Robust Skin Lesion Classification

Raihan Tanvir

Random Articles

Reseach Article

Multi-Modal Machine Learning for Political Video Advertisement Analysis: Integrating Audio, Textual, and Visual Features

by Moulik Kumar, Satish Gopalani, Pranav Gupta

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 186 - Number 46

Year of Publication: 2024

Authors: Moulik Kumar, Satish Gopalani, Pranav Gupta

10.5120/ijca2024924115

Moulik Kumar, Satish Gopalani, Pranav Gupta . Multi-Modal Machine Learning for Political Video Advertisement Analysis: Integrating Audio, Textual, and Visual Features. International Journal of Computer Applications. 186, 46 ( Nov 2024), 49-55. DOI=10.5120/ijca2024924115

@article{ 10.5120/ijca2024924115,

author = { Moulik Kumar, Satish Gopalani, Pranav Gupta },

title = { Multi-Modal Machine Learning for Political Video Advertisement Analysis: Integrating Audio, Textual, and Visual Features },

journal = { International Journal of Computer Applications },

issue_date = { Nov 2024 },

volume = { 186 },

number = { 46 },

month = { Nov },

year = { 2024 },

issn = { 0975-8887 },

pages = { 49-55 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume186/number46/multi-modal-machine-learning-for-political-video-advertisement-analysis-integrating-audio-textual-and-visual-features/ },

doi = { 10.5120/ijca2024924115 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-11-08T23:09:21.282277+05:30

%A Moulik Kumar

%A Satish Gopalani

%A Pranav Gupta

%T Multi-Modal Machine Learning for Political Video Advertisement Analysis: Integrating Audio, Textual, and Visual Features

%J International Journal of Computer Applications

%@ 0975-8887

%V 186

%N 46

%P 49-55

%D 2024

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper presents a novel framework for the automated classification, tagging, and issue level sentiment analysis of video advertisements using advanced machine-learning techniques. The proposed multi-pass approach leverages audio transcription, Optical Character Recognition (OCR), and video feature extraction to achieve high accuracy in distinguishing between political and non-political content. The research introduces robust methods for candidate identification for political videos using phrase matching and fuzzy logic, as well as issue tagging and senti- ment analysis utilizing natural language processing algorithms. The system demonstrates significant improvements over existing methods, achieving 99.2% accuracy in political ad classification when combining audio and OCR data. Furthermore, the developed issue level sentiment analysis provides granular insights into the emotional tone of political messaging. This research con- tributes to the growing field of content moderation in digital ad- vertising, offering valuable insights for publishers, researchers, and policymakers in the realm of political communication.

References

W. Wu, H. Wang, Y. Ye, and Z. Zhang. A comprehensive survey of video-based action recognition using deep learning: Dataset, method, and challenge. IEEE Transactions on Neural Networks and Learning Systems, 33(9):4104–4124, 2022.
eMarketer. Worldwide digital ad spending 2021. eMarketer Insider Intelligence, 2021.
Advertising Analytics and Cross Screen Media. 2024 political advertising outlook, 2023.
S. Dhakal. Deep learning approach for political video adver- tisement classification. In Proceedings of the 2019 3rd Inter- national Conference on Deep Learning Technologies (ICDLT 2019), pages 31–35. Association for Computing Machinery, 2019.
S. B. Belhaouari, A. Alshabani, A. T. Azar, and E. Al- mazrouei. A comprehensive review of deep learning models for video classification. Applied Sciences, 13(2):890, 2023.
J. W. Grigsby and E. F. Fowler. Political advertising in the digital age: The political ideology of ads on facebook. Politi- cal Communication, 37(6):785–809, 2020.
Stand by your ad act of 2002, 47 u.s.c. § 315 note (2002), 2002.
OpenAI. Whisper: Openai’s automatic speech recognition model [computer software], 2022. https://github.com/ openai/whisper.
H. Schu¨tze, C. D. Manning, and P. Raghavan. Introduction to information retrieval, volume 39. Cambridge University Press, 2008.
A. Wendland, M. Zenere, and J. Niemann. Introduction to text classification: impact of stemming and comparing tf-idf and count vectorization as feature extraction technique. In Sys- tems, Software and Services Process Improvement: 28th Eu- ropean Conference, EuroSPI 2021, Krems, Austria, Septem- ber 1–3, 2021, Proceedings 28, pages 289–300. Springer In- ternational Publishing, 2021.
JaidedAI. Easyocr: Ready-to-use optical character recog- nition with 80+ supported languages [computer software], 2020. https://github.com/JaidedAI/EasyOCR.
M. K. Asha Paul, J. Kavitha, and P. A. Jansi Rani. Key-frame extraction techniques: A review. Recent Patents on Computer Science, 11(1):3–16, 2018.
Joe Biden. Joe biden for president - joe biden: Keep up the fight. YouTube video, 2020. Accessed: 2024-09-10.
G. Bradski. The opencv library [computer software], 2000. https://opencv.org.
L. Cuimei, Q. Zhiliang, J. Nan, and W. Jianhua. Human face detection algorithm via haar cascade classifier combined with three additional classifiers. In 2017 13th IEEE interna- tional conference on electronic measurement & instruments (ICEMI), pages 483–487. IEEE, 2017.
OpenCV Team. Face detection using haar cascades. https://docs.opencv.org/4.x/d2/d99/tutorial_ js_face_detection.html, 2023. Accessed: [Insert access date here].
T. Fushiki. Estimation of prediction error by using k- fold cross-validation. Statistics and Computing, 21:137–146, 2011.
Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. V. Jagadish. Regular expression learning for information extraction. In Proceedings of the 2008 conference on empir- ical methods in natural language processing, pages 21–30, 2008.
B. Mohit. Named entity recognition. In Natural language processing of semitic languages, pages 221–245. Springer Berlin Heidelberg, 2014.
M. Cayrol, H. Farreny, and H. Prade. Fuzzy pattern matching. Kybernetes, 11(2):103–116, 1982.
L. Yujian and L. Bo. A normalized levenshtein distance met- ric. IEEE Transactions on Pattern Analysis and Machine In- telligence, 29(6):1091–1095, 2007.
Y. Wang, J. Qin, and W. Wang. Efficient approximate entity matching using jaro-winkler distance. In International con- ference on web information systems engineering, pages 231– 239. Springer International Publishing, 2017.
C. Snae. A comparison and analysis of name matching algo- rithms. International Journal of Computer and Information Engineering, 1(1):107–112, 2007.
Federal Election Commission. Candidate and commit- tee viewer [data set]. https://www.fec.gov/data/ candidates/.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, et al. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830, 2011. H. Jelodar, Y. Wang, C. Yuan, X. Feng, X. Jiang, Y. Li, and L. Zhao. Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey. Multimedia Tools and Appli- cations, 78:15169–15211, 2019.
V. Sanh, L. Debut, J. Chaumond, and T. Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2019. https://arxiv.org/abs/1910.01108.
S. Loria. textblob documentation, 2018. https:// textblob.readthedocs.io/en/dev/.
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional net- works. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.

Index Terms

Computer Science

Information Sciences

Political advertising

Video classification

Machine learning

Natural language processing

Content moderation

Keywords

Political advertising Video classification Machine learning Natural language processing Content moderation Candidate verification Sentiment analysis