CFP last date
20 January 2025
Reseach Article

Kafka-based Architecture in Building Data Lakes for Real-time Data Streams

by Kiran Peddireddy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 185 - Number 9
Year of Publication: 2023
Authors: Kiran Peddireddy
10.5120/ijca2023922740

Kiran Peddireddy . Kafka-based Architecture in Building Data Lakes for Real-time Data Streams. International Journal of Computer Applications. 185, 9 ( May 2023), 1-3. DOI=10.5120/ijca2023922740

@article{ 10.5120/ijca2023922740,
author = { Kiran Peddireddy },
title = { Kafka-based Architecture in Building Data Lakes for Real-time Data Streams },
journal = { International Journal of Computer Applications },
issue_date = { May 2023 },
volume = { 185 },
number = { 9 },
month = { May },
year = { 2023 },
issn = { 0975-8887 },
pages = { 1-3 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume185/number9/32726-2023922740/ },
doi = { 10.5120/ijca2023922740 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:25:37.798878+05:30
%A Kiran Peddireddy
%T Kafka-based Architecture in Building Data Lakes for Real-time Data Streams
%J International Journal of Computer Applications
%@ 0975-8887
%V 185
%N 9
%P 1-3
%D 2023
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The purpose of this paper is to investigate how Kafka can be used to construct data lakes for real-time data processing. Kafka has gained widespread popularity as a data ingestion and processing tool that offers scalability, fault tolerance, and flexibility. The benefits of utilizing Kafka in a data lake architecture are analyzed, as well as the procedures involved in utilizing Kafka in a data lake architecture. In addition, a case study is provided of a major financial institution that utilized Kafka to establish a data lake. The significance of Kafka in modern data processing is emphasized in this paper, as well as its worth in developing data lakes for real-time data processing.

References
  1. Kiran Peddireddy. (2023). Book Title: “Enterprise Data Integration and Streaming Using Kafka, ActiveMQ, and AWS Kinesis”- ISBN -13 979-8372725218.
  2. Apache Kafka Documentation. (2021). Retrieved from
  3. https://kafka.apache.org/documentation/
  4. Yu, T., Li, Y., Li, X., & Zhang, J. (2019). A Real-Time Customer Complaint Management System Based on Big Data Analytics. Journal of Computational Science, 31, 15- 24.
  5. H. Wu, Z. Shang, G. Peng and K. Wolter, "A Reactive Batching Strategy of Apache Kafka for Reliable Stream Processing in Real-time", 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pp. 207-217, 2020.
  6. K. Peddireddy and D. Banga, "Enhancing Customer Experience through Kafka Data Steams for Driven Machine Learning for Complaint Management," International Journal of Computer Trends and Technology, vol. 71, no. 3, pp. 7-13, 2023, doi: 10.14445/22312803/IJCTT- V71I3P102.
  7. G. van Dongen and D. V. D. Poel, "A Performance Analysis of Fault Recovery in Stream Processing Frameworks", IEEE Access, vol. 9, pp. 93745-93763, 2021.
  8. J. Kreps, N. Narkhede, J. Rao et al., "Kafka: A distributed messaging system for log processing", Proceedings of the NetDB, pp. 1-7, 2011.
  9. H. Mehmood et al., "Implementing Big Data Lake for Heterogeneous Data Sources," 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), Macao, China, 2019, pp. 37-44, doi: 10.1109/ICDEW.2019.00-37.
  10. J. C. Couto and D. D. Ruiz, "An overview about data integration in data lakes," 2022 17th Iberian Conference on Information Systems and Technologies (CISTI), Madrid, Spain, 2022, pp. 1-7, doi: 10.23919/CISTI54924.2022.9820576.
Index Terms

Computer Science
Information Sciences

Keywords

Kafka KSQL Data Lake.