International Conference on Artificial Intelligence and Data Science Applications - 2023 |
Control System labs |
ICAIDSC2023 - Number 3 |
January 2025 |
Authors: Akmal Anorbaev, Prerna Agarwal, Pranav Shrivastava |
10.5120/icaidsc202419 |
Akmal Anorbaev, Prerna Agarwal, Pranav Shrivastava . You Only Look Once (YOLO): Object Detection Algorithm. International Conference on Artificial Intelligence and Data Science Applications - 2023. ICAIDSC2023, 3 (January 2025), 9-14. DOI=10.5120/icaidsc202419
In their groundbreaking research paper titled "You Only Look Once: Unified, Real-Time Object Detection,"[1] Joseph Redmon, Santosha Divvala, Ross Girshick, and Ali Farhadi introduced the innovative “YOLO (You Only Look Once)” algorithm. This algorithm revolutionizes real-time object detection by providing a unified and efficient methodology. The rapid advancements in computer vision have led to the emergence of real-time object detection as a pivotal challenge with applications ranging from surveillance to autonomous vehicles. This study delves deeply into the groundbreaking "You Only Look Once" (YOLO) algorithm, a cutting-edge approach in real-time object detection. YOLO has transformed the landscape of object detection by seamlessly incorporating object localization and classification within a single pass of a neural network.. This innovative method ensures outstanding efficiency while maintaining exceptional accuracy, marking a significant advancement in the field of computer vision. The central aim of this research endeavor is to comprehensively elucidate YOLO's architecture, methodology, and performance. The novel grid-based approach and holistic end-to-end detection process are highlighted. Through theoretical experiments on benchmark datasets and custom scenarios, YOLO's accuracy and processing speed are rigorously evaluated. The mean Average Precision (mAP) metric is employed to assess accuracy across various Intersection over Union (IoU) thresholds, showcasing YOLO's robustness in object identification. Additionally, high frames per second (FPS) figures underscore YOLO's real-time processing capabilities [2,3]. The paper discusses YOLO's strengths in efficiency, accuracy, and adaptability across different versions and variations. It also addresses potential limitations in detecting small objects, close-packed objects, and complex scenes. The implications of YOLO's performance are discussed, emphasizing its significance in applications like robotics, autonomous vehicles, and industrial automation. Looking ahead, future developments in fine-grained detection, 3D object detection, multi-modal fusion, and domain-specific customization are anticipated. The exceptional performance, efficiency, and adaptability of YOLO position it as a transformative force in the realm of real-time object detection, shaping the landscape of various industries and fostering innovation. This paper equips researchers, practitioners, and enthusiasts with a comprehensive understanding of YOLO, enabling them to harness its capabilities effectively and explore its potential for addressing complex challenges in computer vision and object detection.