CFP last date
20 July 2026
Reseach Article

Bias Detection and Mitigation in Multimodal Large Language Models: A Comprehensive Study

by Shalini Agarwal, Kaushik Kumar, Vineet Singh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 121
Year of Publication: 2026
Authors: Shalini Agarwal, Kaushik Kumar, Vineet Singh
10.5120/ijca6f2595ad7d8e

Shalini Agarwal, Kaushik Kumar, Vineet Singh . Bias Detection and Mitigation in Multimodal Large Language Models: A Comprehensive Study. International Journal of Computer Applications. 187, 121 ( Jul 2026), 40-46. DOI=10.5120/ijca6f2595ad7d8e

@article{ 10.5120/ijca6f2595ad7d8e,
author = { Shalini Agarwal, Kaushik Kumar, Vineet Singh },
title = { Bias Detection and Mitigation in Multimodal Large Language Models: A Comprehensive Study },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2026 },
volume = { 187 },
number = { 121 },
month = { Jul },
year = { 2026 },
issn = { 0975-8887 },
pages = { 40-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number121/bias-detection-and-mitigation-in-multimodal-large-language-models-a-comprehensive-study/ },
doi = { 10.5120/ijca6f2595ad7d8e },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2026-07-01T15:02:23.068962+05:30
%A Shalini Agarwal
%A Kaushik Kumar
%A Vineet Singh
%T Bias Detection and Mitigation in Multimodal Large Language Models: A Comprehensive Study
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 121
%P 40-46
%D 2026
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The rapid advancement of multimodal large language models (LLMs) has revolutionized the field of artificial intelligence by enabling systems to process and generate content across various modalities, including text, images, and audio. However, these models inherit and potentially amplify biases present in their training data, leading to biased outputs that can perpetuate societal inequalities. This paper explores the nature and extent of biases in multimodal LLMs, focusing on how these biases manifest across different modalities and demographic groups. Through a comprehensive analysis of outputs generated by state-of-the-art multimodal LLMs, we identify specific biases related to gender, ethnicity, and social stereotypes. We introduce a novel framework for detecting these biases, combining quantitative metrics with qualitative assessments to provide a holistic understanding of the issue. Additionally, we propose and evaluate several mitigation strategies, including data augmentation, model fine-tuning, and the incorporation of ethical guidelines during the model development process. Our findings reveal that while certain biases can be mitigated through these approaches, others persist, highlighting the complexity of bias in multimodal systems. The paper concludes with recommendations for future research and the development of more equitable AI systems, emphasizing the importance of ongoing vigilance and ethical considerations in the deployment of multimodal LLMs.

References
  1. Liang, P.P., Wu, C., Morency, L. & Salakhutdinov, R.. (2021). Towards Understanding and Mitigating Social Biases in Language Models. Proceedings of the 38th International Conference on Machine Learning in Proceedings of Machine Learning Research.
  2. Muhammad Usman Hadi, Qasem Al Tashi, Rizwan Qureshi, et al. Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects. TechRxiv. November 16, 2023.
  3. K. Desai, S. Yadav and R. Murugan, "Exploring the Theoretical Dimensions and Intricate Behaviors of Large Language Models and their Multimodal Counterparts," 2024 IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT), Jabalpur, India, 2024, pp. 670-677, doi: 10.1109/CSNT60213.2024.10545720.
  4. Wang, J., Jiang, et al (2024). A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks. ArXiv. /abs/2408.01319
  5. Yupeng Chang et al (2024). A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol. 15, 3, Article 39 (June 2024)
  6. Adewumi, T., Alkhaled, L., Gurung, N., Van Boven, G., & Pagliai, I. (2024). Fairness and Bias in Multimodal AI: A Survey. ArXiv. /abs/2406.19097
  7. Hajikhani, A., & Cole, C. (2024). A critical review of large language models: Sensitivity, bias, and the path toward specialized AI. Quantitative Science Studies. Advance publication. https://doi.org/10.1162 /qss_a_00310
  8. Xu, Y., Hu, L., Zhao, J., Qiu, Z., Ye, Y., & Gu, H. (2024). A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias. ArXiv. /abs/2404.00929
  9. Bai, Z., Wang, P., Xiao, T., He, T., Han, Z., Zhang, Z., & Shou, M. Z. (2024). Hallucination of Multimodal Large Language Models: A Survey. ArXiv. /abs/2404.18930
  10. Journal, IRJET. “IRJET- Converting Text to Image Using Deep Learning.” IRJET (2021):
  11. Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., & Guo, B. (2021). Vector Quantized Diffusion Model for Text-to-Image Synthesis. ArXiv /abs/2111.14822
  12. Yu, J., et al (2022). Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. ArXiv. /abs/2206.10789
  13. Li, R., Li, W., Yang, Y., Wei, H., Jiang, J., & Bai, Q. (2022). Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation. ArXiv. /abs/2210.09549
  14. Matsumori, S., Abe, Y., Shingyouchi, K., Sugiura, K., & Imai, M. (2021). LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation. ArXiv. https://doi.org/10.1109/ACCESS.2021.3129215
  15. Chang, H., et al (2023). Muse: Text-To-Image Generation via Masked Generative Transformers. ArXiv. /abs/2301.00704
  16. Chang, Y., et al (2023). A Survey on Evaluation of Large Language Models. ArXiv. /abs/2307.03109
  17. ]. Dong, Q., Liu, Y., Ai, Q., Wu, Z., Li, H., Liu, Y., Wang, S., Yin, D., & Ma, S. (2023). Aligning the Capabilities of Large Language Models with the Context of Information Retrieval via Contrastive Feedback. ArXiv. /abs/2309.17078
  18. [18]. Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D., & Finn, C. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. ArXiv. /abs/2305.18290
  19. Frolov, S., Hinz, T., Raue, F., Hees, J., & Dengel, A. (2021). Adversarial Text-to-Image Synthesis: A Review. ArXiv. https://doi.org/10.1016/j.neunet.2021.07.019
  20. Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent Abilities of Large Language Models. ArXiv. /abs/2206.07682
  21. Yuan, Z., Liu, J., Zi, Q., Liu, M., Peng, X., & Lou, Y. (2023). Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation. ArXiv. /abs/2308.01240
  22. Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2021). Fine-tuned Language Models Are Zero-Shot Learners. ArXiv. /abs/2109.01652
  23. Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., & Irving, G. (2019). Fine-Tuning Language Models from Human Preferences. ArXiv. /abs/1909.08593
  24. S. Laato, B. Morschheuser, J. Hamari and J. Björne, "AI-Assisted Learning with ChatGPT and Large Language Models: Implications for Higher Education," 2023 IEEE International Conference on Advanced Learning Technologies (ICALT), Orem, UT, USA, 2023, pp. 226-230, doi: 10.1109/ICALT58122.2023.00072.
  25. P. Maddigan and T. Susnjak, "Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models," in IEEE Access, vol. 11, pp. 45181-45193, 2023, doi: 10.1109/ACCESS.2023.3274199
  26. F. Wei et al., "Empirical Study of LLM Fine-Tuning for Text Classification in Legal Document Review," 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 2023, pp. 2786-2792, doi: 10.1109/BigData59044.2023.10386911
  27. X. Chen, I. Beaver and C. Freeman, "Fine-Tuning Language Models For Semi-Supervised Text Mining," 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 2020, pp. 3608-3617, doi: 10.1109/BigData50022.2020.9377810
  28. M. Liu, N. Pinckney, B. Khailany and H. Ren, "Invited Paper: VerilogEval: Evaluating Large Language Models for Verilog Code Generation," 2023 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Francisco, CA, USA, 2023, pp. 1-8, doi: 10.1109/ICCAD57390.2023.10323812
  29. H. Zhao, H. Yilahun and A. Hamdulla, "Pipeline Chain-of-Thought: A Prompt Method for Large Language Model Relation Extraction," 2023 International Conference on Asian Language Processing (IALP), Singapore, Singapore, 2023, pp. 31-36, doi: 10.1109/IALP61005.2023.10337264.
  30. L. Avramelou, N. Passalis, G. Tsoumakas and A. Tefas, "Domain-Specific Large Language Model Finetuning using a Model Assistant for Financial Text Summarization," 2023 IEEE Symposium Series on Computational Intelligence (SSCI), Mexico City, Mexico, 2023, pp. 381-386, doi: 10.1109/SSCI52147.2023.10371906
  31. B. Fatemi, F. Rabbi, and A. L. Opdahl, "Evaluating the Effectiveness of GPT Large Language Model for News Classification in the IPTC News Ontology," in IEEE Access, vol. 11, pp. 145386-145394, 2023, doi: 10.1109/ACCESS.2023.3345414.
  32. Saeki, T., Takamichi, S., & Saruwatari, H. (2020). Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model. ArXiv. /abs/2012.12612
  33. H. Zhang et al., "StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 5908-5916, doi: 10.1109/ICCV.2017.629
  34. Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2017). AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. ArXiv. /abs/1711.10485
  35. Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). MirrorGAN: Learning Text-to-image Generation by Redescription. ArXiv. /abs/1903.05854
  36. Souza, D. M., Wehrmann, J., & Ruiz, D. D. (2020). Efficient Neural Architecture for Text-to-Image Synthesis. ArXiv. /abs/2004.11437
  37. Singh, Akanksha & Anekar, Sonam & Shenoy, Ritika & Patil, Sainath. (2022). Text to Image using Deep Learning. International Journal of Engineering and Technical Research.
  38. M. Z. Hossain, F. Sohel, M. F. Shiratuddin, H. Laga and M. Bennamoun, "Text to Image Synthesis for Improved Image Captioning," in IEEE Access, vol. 9, pp. 64918-64928, 2021, doi: 10.1109/ACCESS.2021.3075579
  39. Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. ArXiv. /abs/2205.11916
  40. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. ArXiv. /abs/2302.1397
Index Terms

Computer Science
Information Sciences

Keywords

Bias Fine Tuning LLM Multimodal