CFP last date
20 December 2024
Reseach Article

Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP

by Alexandru Voicu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 100 - Number 6
Year of Publication: 2014
Authors: Alexandru Voicu
10.5120/17529-8100

Alexandru Voicu . Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP. International Journal of Computer Applications. 100, 6 ( August 2014), 21-30. DOI=10.5120/17529-8100

@article{ 10.5120/17529-8100,
author = { Alexandru Voicu },
title = { Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP },
journal = { International Journal of Computer Applications },
issue_date = { August 2014 },
volume = { 100 },
number = { 6 },
month = { August },
year = { 2014 },
issn = { 0975-8887 },
pages = { 21-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume100/number6/17529-8100/ },
doi = { 10.5120/17529-8100 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:29:15.333531+05:30
%A Alexandru Voicu
%T Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP
%J International Journal of Computer Applications
%@ 0975-8887
%V 100
%N 6
%P 21-30
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In the course of less than a decade, Graphics Processing Units (GPUs) have evolved from narrowly scoped application specific accelerators to general-purpose parallel machines capable of accommodating an ever-growing set of algorithms. At the same time, programming GPUs appears to have become trapped around an attractor characterised by ad-hoc practices, non-portable implementations and inexact, uninformative performance reporting. The purpose of this paper is two-fold, on one hand pursuing an in-depth look at GPU hardware and its characteristics, and on the other demonstrating that portable, generic, mathematically grounded programming of these machines is possible and desirable. An agent-based meta-heuristic, the Max-Min Ant System (MMAS), provides the context. The major contributions brought about by this article are the following: (1) an optimal, portable, generic-algorithm based MMAS implementation is derived; (2) an in-depth analysis of AMD's Graphics Core Next (GCN) GPU and the C++ AMP programming model is supplied; (3) a more robust approach to performance reporting is presented; (4) novel techniques for raising the abstraction level without sacrificing performance are employed. This represents the first implementation of an algorithm from the Ant Colony Optimisation (ACO) family using C++ AMP, whilst at the same time being one of the first uses of the latter programming environment.

References
  1. B. Simeone, Ed. , Combinatorial optimization: lectures given at the 3rd session of the Centro internazionale matematico estivo (C. I. M. E. ) held at Como, Italy, August 25-September 2, 1986. Berlin?; New York: Springer-Verlag, 1989.
  2. A. Schrijver, "On the history of combinatorial optimization (till 1960)," in Handbooks in Operations Research and Management Science Discrete Optimization. , K. Aardal, G. L. Nemhauser, and R. Weismantel, Eds. Burlington: Elsevier, 2005, pp. 1–68.
  3. M. R. Garey and D. S. Johnson, Computers and intractability: a guide to the theory of NP-completeness. San Francisco: W. H. Freeman, 1979.
  4. T. Stützle and H. Hoos, "MAX-MIN Ant System and local search for the traveling salesman problem," 1997, pp. 309–314.
  5. T. Stützle and H. Hoos, "Improving the Ant System: A detailed report on the MAX-MIN Ant System," 1996.
  6. T. Stützle and H. H. Hoos, "MAX-MIN Ant System," Future Gener Comput Syst, vol. 16, no. 9, pp. 889–914, Jun. 2000.
  7. T. Stützle and H. H. Hoos, "Improvements on the Ant-System: Introducing the MAX-MIN Ant System," in Artificial Neural Nets and Genetic Algorithms, Vienna: Springer Vienna, 1998, pp. 245–249.
  8. M. Dorigo, V. Maniezzo, and A. Colorni, "Ant system: optimization by a colony of cooperating agents," IEEE Trans. Syst. Man Cybern. Part B Cybern. , vol. 26, no. 1, pp. 29–41, Feb. 1996.
  9. T. Stützle, M. López-Ibáñez, and M. Dorigo, "A Concise Overview of Applications of Ant Colony Optimization," Wiley Encyclopedia of Operations Research and Management Science. John Wiley & Sons, Inc. , Hoboken, NJ, USA, 15-Jun-2010.
  10. R. Vuduc, A. Chandramowlishwaran, J. Choi, M. Guney, and A. Shringarpure, "On the Limits of GPU Acceleration," in Proceedings of the 2Nd USENIX Conference on Hot Topics in Parallelism, Berkeley, CA, USA, 2010, pp. 13–13.
  11. V. W. Lee, P. Hammarlund, R. Singhal, P. Dubey, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, and S. Chennupaty, "Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU," 2010, p. 451.
  12. M. Mantor and M. Houston, "AMD Graphic Core Next: Low Power High Performance Graphics & Parallel Compute," presented at the High-Performance Graphics 2011, Vancouver, Canada, 07-Aug-2011.
  13. K. Gregory and A. Miller, C++ AMP: accelerated massive parallelism with Microsoft Visual C++. Sebastopol, California: O'Reilly Media, 2012.
  14. D. B. Shmoys, J. K. Lenstra, A. H. G. R. Kan, and E. L. Lawler, Eds. , The Traveling salesman problem: a guided tour of combinatorial optimization. Chichester [West Sussex]?; New York: Wiley, 1985.
  15. P. H. Douglas, "The Cobb-Douglas Production Function Once Again: Its History, Its Testing, and Some New Empirical Values," J. Polit. Econ. , vol. 84, no. 5, pp. 903–15, 1976.
  16. D. S. Johnson and L. A. McGeoch, "Experimental Analysis of Heuristics for the STSP," in The Traveling Salesman Problem and Its Variations, vol. 12, G. Gutin and A. P. Punnen, Eds. Boston, MA: Springer US, 2007.
  17. J. . Dongarra and D. . Sorensen, "A portable environment for developing parallel FORTRAN programs," Parallel Comput. , vol. 5, no. 1–2, pp. 175–186, Jul. 1987.
  18. W. D. Hillis and G. L. Steele, "Data parallel algorithms," Commun. ACM, vol. 29, no. 12, pp. 1170–1183, Dec. 1986.
  19. B. R. Gaster and L. Howes, "Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck?," Computer, vol. 45, no. 8, pp. 42–52, Aug. 2012.
  20. T. Stützle and M. Dorigo, "A short convergence proof for a class of ant colony optimization algorithms," IEEE Trans. Evol. Comput. , vol. 6, no. 4, pp. 358–365, Aug. 2002.
  21. W. J. Gutjahr, "Mathematical runtime analysis of ACO algorithms: survey on an emerging issue," Swarm Intell. , vol. 1, no. 1, pp. 59–79, Oct. 2007.
  22. B. Bullnheimer, G. Kotsis, and C. Strauß, "Parallelization Strategies for the Ant System," in High Performance Algorithms and Software in Nonlinear Optimization, vol. 24, R. Leone, A. Murli, P. M. Pardalos, and G. Toraldo, Eds. Boston, MA: Springer US, 1999.
  23. P. Delisle, M. Gravel, M. Krajecki, C. Gagné, and W. L. Price, "Comparing parallelization of an ACO: message passing vs. shared memory," in Hybrid Metaheuristics, vol. 3636, M. J. Blesa, C. Blum, A. Roli, and M. Sampels, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 1–11.
  24. I. Ellabib, P. Calamai, and O. Basir, "Exchange strategies for multiple Ant Colony System," Inf. Sci. , vol. 177, no. 5, pp. 1248–1264, Mar. 2007.
  25. M. Manfrin, M. Birattari, T. Stützle, and M. Dorigo, "Parallel Ant Colony Optimization for the Traveling Salesman Problem," in Proceedings of the 5th International Conference on Ant Colony Optimization and Swarm Intelligence, Berlin, Heidelberg, 2006, pp. 224–234.
  26. J. Jun Ouyang and G. -R. Gui-Rong Yan, "A multi-group ant colony system algorithm for TSP," in Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on, 2004, vol. 1, pp. 117–121.
  27. M. Pedemonte, S. Nesmachnow, and H. Cancela, "A survey on parallel ant colony optimization," Appl. Soft Comput. , vol. 11, no. 8, pp. 5181–5197, Dec. 2011.
  28. T. Stützle, "Parallelization strategies for Ant Colony Optimization," in Parallel Problem Solving from Nature — PPSN V, vol. 1498, A. E. Eiben, T. Bäck, M. Schoenauer, and H. -P. Schwefel, Eds. Berlin/Heidelberg: Springer-Verlag, 1998, pp. 722–731.
  29. H. Bai, D. Ouyang, X. Li, L. He, and H. Yu, "MAX-MIN Ant System on GPU with CUDA," in 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC), 2009, pp. 801–804.
  30. J. M. Cecilia, J. M. García, A. Nisbet, M. Amos, and M. Ujaldón, "Enhancing data parallelism for Ant Colony Optimization on GPUs," J. Parallel Distrib. Comput. , vol. 73, no. 1, pp. 42–51, Jan. 2013.
  31. A. Delévacq, P. Delisle, M. Gravel, and M. Krajecki, "Parallel Ant Colony Optimization on Graphics Processing Units," J. Parallel Distrib. Comput. , vol. 73, no. 1, pp. 52–61, Jan. 2013.
  32. K. Kobashi, A. Fujii, T. Tanaka, and K. Miyoshi, "Acceleration of Ant Colony Optimization for the Traveling Salesman Problem on a GPU," 2011.
  33. O. Nitica, "A Parallel Ant Colony Optimization Algorithm for the Travelling Salesman Problem: Improving Performance Using CUDA," Bachelor thesis, University of Delaware, 2011.
  34. K. Tantawy, "Ant Colony Optimization Parallel Algorithm for GPU," Carleton University, Honours Project COMP 4905, Apr. 2011.
  35. A. Uchida, Y. Ito, and K. Nakano, "An Efficient GPU Implementation of Ant Colony Optimization for the Traveling Salesman Problem," in 2012 Third International Conference on Networking and Computing (ICNC), 2012, pp. 94–102.
  36. M. Dorigo, Ant colony optimization. Cambridge, Mass: MIT Press, 2004.
  37. J. S. Angelo, D. A. Augusto, and H. J. C. Barbosa, "Strategies for Parallel Ant Colony Optimization on Graphics Processing Units," in Ant Colony Optimization - Techniques and Applications, H. J. C. Barbosa, Ed. InTech, 2013.
  38. A. Munshi, Ed. , OpenCL programming guide. Upper Saddle River, NJ: Addison-Wesley, 2012.
  39. J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable parallel programming with CUDA," Queue, vol. 6, no. 2, p. 40, Mar. 2008.
  40. F. Darema, D. A. George, V. A. Norton, and G. F. Pfister, "A single-program-multiple-data computational model for EPEX/FORTRAN," Parallel Comput. , vol. 7, no. 1, pp. 11–24, Apr. 1988.
  41. "NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110. " NVIDIA, 2012.
  42. A. Williams, C++ concurrency in action. Shelter Island, N. Y: Manning, 2012.
  43. "multicoreware / cppamp-driver-ng / wiki / Home — Bitbucket. " [Online]. Available: https://bitbucket. org/multicoreware/cppamp-driver-ng/wiki/Home. [Accessed: 29-Jun-2014].
  44. N. M. Josuttis, The C++ standard library: a tutorial and reference, 2nd ed. Upper Saddle River, NJ: Addison-Wesley, 2012.
  45. D. Vandevoorde, C++ templates: the complete guide. Boston, MA: Addison-Wesley, 2003.
  46. A. A. Stepanov, Elements of programming. Upper Saddle River, NJ: Addison-Wesley, 2009.
  47. B. R. Gaster and L. Howes, "OpenCL C++," in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, New York, NY, USA, 2013, pp. 86–95.
  48. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical recipes: the art of scientific computing, 3rd ed. Cambridge, UK?; New York: Cambridge University Press, 2007.
  49. J. J. Bentley, "Fast Algorithms for Geometric Traveling Salesman Problems," ORSA J. Comput. , vol. 4, no. 4, pp. 387–411, Nov. 1992.
  50. A. Blazinskas and A. Misevicius, "Combining 2-opt, 3-opt and 4-opt with K-swap-kick Perturbations for the Traveling Salesman Problem," in Proceedings of the 17th International Conference on Information and Software Technologies, IT 2011, Kaunas, Lithuania, 2011.
  51. M. L. Fredman, D. S. Johnson, L. A. Mcgeoch, and G. Ostheimer, "Data Structures for Traveling Salesmen," J. Algorithms, vol. 18, no. 3, pp. 432–479, May 1995.
  52. G. Reinelt, The traveling salesman: computational solutions for TSP applications. Berlin?; New York: Springer-Verlag, 1994.
  53. "TSPLIB. " [Online]. Available: http://comopt. ifi. uni-heidelberg. de/software/TSPLIB95/. [Accessed: 07-Mar-2013].
  54. D. S. Johnson and L. A. McGeoch, "The traveling salesman problem: a case study," in Local search in combinatorial optimization, E. H. L. Aarts and J. K. Lenstra, Eds. Princeton: Princeton University Press, 2003, pp. 215–310.
  55. M. G. A. Verhoeven, E. H. L. Aarts, and P. C. J. Swinkels, "A parallel 2-opt algorithm for the Traveling Salesman Problem," Future Gener. Comput. Syst. , vol. 11, no. 2, pp. 175–182, 1995.
Index Terms

Computer Science
Information Sciences

Keywords

C++ AMP Generic Programming GPU Programming MAX-MIN Ant System Parallel Meta-Heuristics