CFP last date
20 January 2025
Reseach Article

Efficient Dynamic Multiple GPGPU Layer for OpenCV

by Afshan Jafri
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 164 - Number 3
Year of Publication: 2017
Authors: Afshan Jafri
10.5120/ijca2017913604

Afshan Jafri . Efficient Dynamic Multiple GPGPU Layer for OpenCV. International Journal of Computer Applications. 164, 3 ( Apr 2017), 42-48. DOI=10.5120/ijca2017913604

@article{ 10.5120/ijca2017913604,
author = { Afshan Jafri },
title = { Efficient Dynamic Multiple GPGPU Layer for OpenCV },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2017 },
volume = { 164 },
number = { 3 },
month = { Apr },
year = { 2017 },
issn = { 0975-8887 },
pages = { 42-48 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume164/number3/27467-2017913604/ },
doi = { 10.5120/ijca2017913604 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:10:18.899445+05:30
%A Afshan Jafri
%T Efficient Dynamic Multiple GPGPU Layer for OpenCV
%J International Journal of Computer Applications
%@ 0975-8887
%V 164
%N 3
%P 42-48
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

General purpose graphic processing unit (GPGPU) provides high performance resource for computing. CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) permit writing of parallel computing programs that utilize multiple central processing units (CPU) and GPGPUs. The image processing library, OpenCV (Open Source Computer Vision library), may benefit greatly from parallel use of multiple GPGPUs, however, its CUDA implementation is restricted to benefiting from a single GPGPU only. This research develops an abstraction layer above OpenCV single GPU module that enables multiple GPUs for single instruction multiple data (SIMD) architecture. This approach has a controller/parent thread which generates various worker threads to operate on several GPU devices, to handle balancing of work load on GPUs, as the task allocation is dynamic for any number of GPUs. The experiments on running bilateral filtering, color to gray conversion, fast Fourier transform, and convolution on homogeneous and heterogeneous sized images of scenery, objects, and faces, indicate that: (1) threading reduces computation time by half of sequential operation for GPU; (2) tuned static load balanced GPU threading reduces computation time by up to a fourth when compared to CPU threading; (3) performance of dynamic load balancing approaches that of manually iteratively balanced static operation.

References
  1. Jespersen, D.C., 2010. Acceleration of a CFD code with a GPU. Scientific Programming, 18(3-4), pp.193-201.
  2. Xu, R., Tian, X., Chandrasekaran, S. and Chapman, B., 2015. Multi-GPU support on single node using directive-based programming model. Scientific Programming.
  3. Lee, J.H., Nigania, N., Kim, H., Patel, K. and Kim, H., 2015. OpenCL performance evaluation on modern multicore CPUs. Scientific Programming, 2015, p.4.
  4. J., Varbanescu, A.L. and Sips, H., 2011, September. A comprehensive performance comparison of CUDA and OpenCL. In Parallel Processing (ICPP), 2011 International Conference on (pp. 216-225). IEEE.
  5. Karimi, K., Dickson, N.G. and Hamze, F., 2010. A performance comparison of CUDA and OpenCL. arXiv preprint arXiv:1005.2581.
  6. Bradski, G. and Kaehler, A., 2008. Learning OpenCV: Computer vision with the OpenCV library. " O'Reilly Media, Inc.".
  7. OpenCV, GPU Module Introduction. [online] http://docs.opencv.org/modules/gpu/doc/introduction.html
  8. Sanders, J. and Kandrot, E., 2010. CUDA by Example: An Introduction to General-Purpose GPU Programming, Portable Documents. Addison-Wesley Professional.
  9. Kirk, D.B. and Wen-mei, W.H., 2010. Programming massively parallel processor. Morgan Kaufmann.
  10. Nielsen, I. and Janssen, C.L., 2008. Multicore challenges and benefits for high performance scientific computing. Scientific Programming, 16(4), pp.277-285.
  11. Lan, Z., Taylor, V.E. and Bryan, G., 2002. Dynamic load balancing of SAMR applications on distributed systems. Scientific Programming, 10(4), pp.319-328.
  12. Parent, J., Verbeeck, K., Lemeire, J., Nowe, A., Steenhaut, K. and Dirkx, E., 2004. Adaptive load balancing of parallel applications with multi-agent reinforcement learning on heterogeneous systems. Scientific Programming, 12(2), pp.71-79.
  13. OpenCV Test data. [online] Available at: https://github.com/itseez/opencv_extra.
  14. Caltech 256 database, J2K and 256_object category, http://www.csee.wvu.edu/~xinl/database.html
  15. Standard test Image, online http://www.imageprocessingplace.com/root_files_v3/image_databases.html
Index Terms

Computer Science
Information Sciences

Keywords

GPGPU OpenCV SIMD CUDA OpenCL Multiple GPU Load Balancing Threading.