International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 17 |
Year of Publication: 2024 |
Authors: Jay Pachupate, Sonal Fatangare, Sneha Tambare, Shraddha Debadwar, Pratik Sonar |
10.5120/ijca2024923552 |
Jay Pachupate, Sonal Fatangare, Sneha Tambare, Shraddha Debadwar, Pratik Sonar . Survey on Image Description Generation using Deep Learning. International Journal of Computer Applications. 186, 17 ( Apr 2024), 23-31. DOI=10.5120/ijca2024923552
In the future, the creation of an image description system could aid those who are blind or visually handicapped in "perceiving” the world. In natural language processing and computer vision, producing logical and contextually appropriate written descriptions for images is a crucial task that is referred to as "image description generation". In Bi-LSTM processes input sequences sequentially and captures contextual information in a bidirectional manner, but it may not capture long-range dependencies effectively. To tackle these issues, the survey focuses on leveraging the BERT uses a self-attention mechanism that allows it to capture context over much longer ranges, making it more effective in handling global context and dependencies. For the model to comprehend the visual elements, BERT must be integrate contextual relationships between different visual elements and generate more coherent and contextually relevant image descriptions.