CFP last date
22 June 2026
Reseach Article

A Memory-Enhanced RAG Framework for Multimodal Document Processing and Context-Aware Conversational AI

by Vibhu Awasthi, Syed Wajahat Abbas Rizvi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 101
Year of Publication: 2026
Authors: Vibhu Awasthi, Syed Wajahat Abbas Rizvi
10.5120/ijca57c7de8fa5b7

Vibhu Awasthi, Syed Wajahat Abbas Rizvi . A Memory-Enhanced RAG Framework for Multimodal Document Processing and Context-Aware Conversational AI. International Journal of Computer Applications. 187, 101 ( May 2026), 6-10. DOI=10.5120/ijca57c7de8fa5b7

@article{ 10.5120/ijca57c7de8fa5b7,
author = { Vibhu Awasthi, Syed Wajahat Abbas Rizvi },
title = { A Memory-Enhanced RAG Framework for Multimodal Document Processing and Context-Aware Conversational AI },
journal = { International Journal of Computer Applications },
issue_date = { May 2026 },
volume = { 187 },
number = { 101 },
month = { May },
year = { 2026 },
issn = { 0975-8887 },
pages = { 6-10 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number101/a-memory-enhanced-rag-framework-for-multimodal-document-processing-and-context-aware-conversational-ai/ },
doi = { 10.5120/ijca57c7de8fa5b7 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2026-05-17T02:28:57.180362+05:30
%A Vibhu Awasthi
%A Syed Wajahat Abbas Rizvi
%T A Memory-Enhanced RAG Framework for Multimodal Document Processing and Context-Aware Conversational AI
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 101
%P 6-10
%D 2026
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The project introduces an AI-based Retrieval-Augmented Generation (RAG) chatbot which is capable of answering questions intelligently by using various sources of da- ta that include documents uploaded and web links and still preserving the context continuity with the help of chat memory. It can process text in different formats, such as PDF, DOCX, TXT, images and URLs, extract text and process the documents into manageable chunks, and transform semantic embeddings with an embedding model. These embeddings are put into Qdrant vectors database, which renders relevant information to be retrieved with ease related to similarity.To become more conversational, the chatbot also introduces previous question-answer interactions as memory, which will allow it to remember and use the past conversation to get a better contextual comprehension. When a user makes a query, the sys- tem will fetch the most relevant content of the document and memory context and feed it to the Groq Large Language Model (LLM) that is going to produce accurate and coherent an- swers. This architecture has led to the continuity of learning, search capability of semantics and interaction that is context-based.Proposed system provides an efficient and scalable so- lution to knowledge-based conversational AI, by uniting vector databases with language models in modern and current form to provide meaningful, reliable, and intelligent answers to user queries in real-time.

References
  1. Ritter, A., Cherry, C., Dolan, W.B.: Data-driven response generation in social media. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 583–593. Edinburgh, Scotland (2011).
  2. Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: Pro- ceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP), pp. 1577–1586. Beijing, China (2015).
  3. Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A diversity-promoting objective func- tion for neural conversation models. In: Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL- HLT), pp. 110–119. San Diego, USA (2016).
  4. Ji, Z., Lu, Z., Li, H.: An information retrieval approach to short text conversation. arXiv preprint arXiv:1408.6988 (2014).
  5. Zhou, X. et al.: Multi-view response selection for human-computer conversation. In: Pro- ceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 372–381 (2016).
  6. Yan, R., Song, Y., Wu, H.: Learning to respond with deep neural networks for retrieval- based human-computer conversation system. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR),pp. 55–64 (2016).
  7. Wu, Y., Wei, F., Huang, S., Li, Z., Zhou, M.: Response generation by context-aware pro- totype editing. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI),pp. 6383–6390 (2019).
  8. Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D.: Adversarial learning for neu- ral dialogue generation. In: Proceedings of the Conference on Empirical Methods in Natu- ral Language Processing (EMNLP) (2017).
  9. Cai, D., Tu, Z., Shu, R., Zhang, H.: Skeleton-to-response: Dialogue generation guided by retrieval memory. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2019).
  10. Guu, K., Hashimoto, T.B., Oren, Y., Liang, P.: Generating sentences by editing prototypes. Transactions of the Association for Computational Linguistics (TACL) 6, 437–450 (2018).
  11. Madotto, A., Wu, C.-S., Fung, P.: Mem2Seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL) (2018).
  12. Wu, C.-S., Madotto, A., Lin, Z., Fung, P.: Global-to-local memory pointer networks for task-oriented dialogue. arXiv preprint arXiv:1900.xxxxx (2019). (No exact ID provided, so kept generic)
  13. Young, T., Cambria, E., Chaturvedi, I., Zhou, H., Huang, M.: Augmenting end-to-end dia- log systems with commonsense knowledge. arXiv preprint arXiv:1709.05453 (2018).
  14. Parthasarathi, P., Pineau, J.: Extending neural generative conversational model using ex- ternal knowledge sources. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2018).
  15. Weston, J., Chopra, S., Bordes, A.: Memory networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015).
  16. Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Building end-to-end dia- logue systems using generative hierarchical neural network models. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2016).
Index Terms

Computer Science
Information Sciences

Keywords

First Keyword Second Keyword Third Keyword