Survey on Image Description Generation using Deep Learning

Jay Pachupate; Sonal Fatangare; Sneha Tambare; Shraddha Debadwar; Pratik Sonar

Call for Paper

October Edition

IJCA solicits high quality original research papers for the upcoming October edition of the journal. The last date of research paper submission is 22 September 2025

Submit your paper

Know more

The week's pick

RESPONSIVE WEB DESIGN FOR ENHANCED USER EXPERIENCE (UX) AND USER INTERFACE (UI)

Victor Aienobe Muhammad Zahid Iqbal

Random Articles

Reseach Article

Survey on Image Description Generation using Deep Learning

by Jay Pachupate, Sonal Fatangare, Sneha Tambare, Shraddha Debadwar, Pratik Sonar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 186 - Number 17

Year of Publication: 2024

Authors: Jay Pachupate, Sonal Fatangare, Sneha Tambare, Shraddha Debadwar, Pratik Sonar

10.5120/ijca2024923552

Jay Pachupate, Sonal Fatangare, Sneha Tambare, Shraddha Debadwar, Pratik Sonar . Survey on Image Description Generation using Deep Learning. International Journal of Computer Applications. 186, 17 ( Apr 2024), 23-31. DOI=10.5120/ijca2024923552

@article{ 10.5120/ijca2024923552,

author = { Jay Pachupate, Sonal Fatangare, Sneha Tambare, Shraddha Debadwar, Pratik Sonar },

title = { Survey on Image Description Generation using Deep Learning },

journal = { International Journal of Computer Applications },

issue_date = { Apr 2024 },

volume = { 186 },

number = { 17 },

month = { Apr },

year = { 2024 },

issn = { 0975-8887 },

pages = { 23-31 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume186/number17/survey-on-image-description-generation-using-deep-learning/ },

doi = { 10.5120/ijca2024923552 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-04-27T03:06:53.405672+05:30

%A Jay Pachupate

%A Sonal Fatangare

%A Sneha Tambare

%A Shraddha Debadwar

%A Pratik Sonar

%T Survey on Image Description Generation using Deep Learning

%J International Journal of Computer Applications

%@ 0975-8887

%V 186

%N 17

%P 23-31

%D 2024

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In the future, the creation of an image description system could aid those who are blind or visually handicapped in "perceiving” the world. In natural language processing and computer vision, producing logical and contextually appropriate written descriptions for images is a crucial task that is referred to as "image description generation". In Bi-LSTM processes input sequences sequentially and captures contextual information in a bidirectional manner, but it may not capture long-range dependencies effectively. To tackle these issues, the survey focuses on leveraging the BERT uses a self-attention mechanism that allows it to capture context over much longer ranges, making it more effective in handling global context and dependencies. For the model to comprehend the visual elements, BERT must be integrate contextual relationships between different visual elements and generate more coherent and contextually relevant image descriptions.

References

HUAWEI ZHANG, CHENGBO MA, ZHANJUN JIANG, AND JING LIAN ”Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s.”
ARISA UEDA, WEI YANG, AND KOMEI SUGIURA "Switching Text-Based Image Encoders for Captioning Images With Text"
Qimin Cheng, Haiyan Huang, Yuan Xu, Yuzhuo Zhou, Huanying Li, and Zhongyuan Wang "NWPU-Captions Dataset and MLCA-Net for Remote Sensing Image Captioning"
Thanh-Son Nguyen and Basura Fernando "Effective Multimodal Encoding for Image Paragraph Captioning"
KYUNGBOK MIN, MINH DANG, AND HYUN JOON MOON"Deep Learning-Based Short Story Generation for an Image Using the Encoder-Decoder Structure"
MARC A. KASTNER, KAZUKI UMEMURA, ICHIRO IDE, YASUTOMO KAWANISHI, TAKATSUGU HIRAYAMA, KEISUKE DOMAN, DAISUKE DEGUCHI, HIROSHI MURASE AND SHIN’ICHI SATOH" Imageability- and Length-Controllable Image Captioning"
Lin Huo, Lin Bai, and Shang-Ming Zhou "Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework"
Gencer Sumbul, Sonali Nayak, and Begüm Demir "SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning"
Maofu Liu, Huijun Hu, Lingjun Li, Yan Yu, and Weili Guan “Chinese Image Caption Generation via Visual Attention and Topic Modeling”
Ankit Rathi “Deep learning approach for image captioning in Hindi language”
Jie Wu, Tianshui Chen, Hefeng Wu, Zhi Yang, Guangchun Luo, and Liang Lin “Fine-Grained Image Captioning with Global-Local Discriminative Objective”
Haoran Wang, Yue Zhang, and Xiaosheng Yu “An Overview of Image Caption Generation Methods”
FENGYU GUO, RUIFANG HE, AND JIANWU DANG “Implicit Discourse Relation Recognition via a BiLSTM-CNN Architecture With Dynamic Chunk-Based Max Pooling”
GUIXIAN XU, YUETING MENG, XIAOKAI ZHOU, ZIHENG YU, XU WU, AND LIJUN ZHANG “Chinese Event Detection Based on Multi-Feature Fusion and BiLSTM”
Peter Anderson, Xiaodong He,,Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang “Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering”
Junwei Han, Dingwen Zhang,Gong Cheng, Nian Liu, and Dong Xu “Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection”
Sidra shabir, Syed Yasser Arafat “An image conveys a message: A brief survey on image description generation”
Linghui Li, Sheng Tang, Member, IEEE, Yongdong Zhang, “GLA: Global-local Attention for Image Description”
Long Chen Hanwang Zhang Jun Xiao Liqiang Nie Jian Shao Wei Liu Tat-Seng Chua“SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning”
Ting Yao †, Yingwei Pan ‡, Yehao Li §, Zhaofan Qiu ‡, and Tao Mei † “Boosting Image Captioning with Attributes”
Pranay Mathur∗, Aman Gill†, Aayush Yadav‡, Anurag Mishra§ and Nand Kumar Bansode “Camera2Caption : A Real-Time Image Caption Generator”
Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher “Adaptive Attention via A Visual Sentinel for Image Captioning”
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross and Vaibhava Goel “Self-critical Sequence Training for Image Captioning”
Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo “Image Captioning with Semantic Attention”
Kaiming He, Xiangyu Zhang Shaoqing Ren, Jian Sun “Deep Residual Learning for Image Recognition”
Sezer Karaoglu, Ran Taoy, Theo Gevers and Arnold W. M. Smeulders ``Words Matter: Scene Text for Image Classification and Retrieval”
Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars “Guiding the Long-Short Term Memory model for Image Caption Generation”
Priyanka Jain, Priyanka Pawar, Gaurav Koriya, Anuradha Lele, Ajai Kumar and Hemant Darbari “Knowledge acquisition for Language description from Scene understanding”
Ramakrishna Vedantam,C. Lawrence Zitnick,Devi Parikh “CIDEr: Consensus-based Image Description Evaluation”
Andrej Karpathy,Li Fei-Fei “Deep Visual-Semantic Alignments for Generating Image Descriptions”
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan “Show and Tell: A Neural Image Caption Generator”
Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, Tamara L. Berg “BabyTalk: Understanding and Generating Simple Image Descriptions”
Yan Zhu,Hui Xiang, Wenjuan Feng “Generating Text Description from Content-based Annotated Image”.

Index Terms

Computer Science

Information Sciences

Keywords

Convolutional Neural Network (CNN) Bidirectional Encoder Representations from Transformers (BERT) Fine-tuning Evaluation Metrics and Benchmark Datasets