CFP last date
20 December 2024
Reseach Article

Use of Reinforcement Learning as a Challenge: A Review

by Rashmi Sharma, Manish Prateek, Ashok K. Sinha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 69 - Number 22
Year of Publication: 2013
Authors: Rashmi Sharma, Manish Prateek, Ashok K. Sinha
10.5120/12105-8332

Rashmi Sharma, Manish Prateek, Ashok K. Sinha . Use of Reinforcement Learning as a Challenge: A Review. International Journal of Computer Applications. 69, 22 ( May 2013), 28-34. DOI=10.5120/12105-8332

@article{ 10.5120/12105-8332,
author = { Rashmi Sharma, Manish Prateek, Ashok K. Sinha },
title = { Use of Reinforcement Learning as a Challenge: A Review },
journal = { International Journal of Computer Applications },
issue_date = { May 2013 },
volume = { 69 },
number = { 22 },
month = { May },
year = { 2013 },
issn = { 0975-8887 },
pages = { 28-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume69/number22/12105-8332/ },
doi = { 10.5120/12105-8332 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:31:02.859975+05:30
%A Rashmi Sharma
%A Manish Prateek
%A Ashok K. Sinha
%T Use of Reinforcement Learning as a Challenge: A Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 69
%N 22
%P 28-34
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Reinforcement learning has its origin from the animal learning theory. RL does not require prior knowledge but can autonomously get optional policy with the help of knowledge obtained by trial-and-error and continuously interacting with the dynamic environment. Due to its characteristics of self improving and online learning, reinforcement learning has become one of intelligent agent's core technologies. This paper gives an introduction of reinforcement learning, discusses its basic model, the optimal policies used in RL , the main reinforcement optimal policy that are used to reward the agent including model free and model based policies – Temporal difference method, Q-learning , average reward, certainty equivalent methods, Dyna , prioritized sweeping , queue Dyna . At last but not the least this paper briefly describe the applications of reinforcement leaning and some of the future research scope in Reinforcement Learning.

References
  1. Singh S, Agents and reinforcement learning [M]. San Matco, CA, USA: Miller freeman publish Inc, 1997.
  2. Bush R R & Mosteller F. Stochastic Models for Learning [M]. New York: Wiley, 1955.
  3. C. Ribeiro, Reinforcement learning agent, Artificial Intelligence Review 17 (2002) 223–250.
  4. A. Ayesh, Emotionally motivated reinforcement learning based controller, in: IEEE SMC, The Hague, The Netherlands, 2004.
  5. S. Gadanho, Reinforcement learning in autonomous robots: an empirical investigation of the role of emotions,PhDThesis, University of Edinburgh, Edinburgh, 1999.
  6. R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
  7. L. P. Kaelbling, M. L. Littman, A. W. Moore, Reinforcement learning: a survey, Journal of Artificial Intelligence Research 4 (1996) 237–285.
  8. Richard S. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 3(1):9-44, 1988.
  9. Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989.
  10. Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279-292, 1992.
  11. Anton Schwartz. A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth International Conference on Machine Learning, pages 298-305, Amherst, Massachusetts, 1993. Morgan Kaufmann.
  12. Sridhar Mahadevan. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1), 1996.
  13. Tommi Jaakkola, Satinder Pal Singh, and Michael I. Jordan. Monte-carlo reinforcement learning in non-Markovian decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995. The MIT Press.
  14. Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, 1990. Morgan Kaufmann.
  15. Richard S. Sutton. Planning by incremental dynamic programming. In Proceedings of the Eighth International Workshop on Machine Learning, pages 353-357. Morgan Kaufmann,1991.
  16. Tesauro, G. J. , Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995).
  17. Nie, J. and Haykin, S. , A dynamic channel assignment policy through Q-learning. IEEE Trans. Neural Netw. 10, 1443–1455 (1999).
  18. Beom, H. R. and Cho, H. S. , A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning. IEEE Trans. Syst. Man Cybern. 25, 464–477 (1995).
  19. Coelho, J. A. , Araujo, E. G. , Huber, M. , and Grupen, R. A. , Dynamical categories and control policy selection. Proceedings of IEEE International Symposium on Intelligent Control, 1998, pp. 459–464.
  20. Witten, I. H. , The apparent conflict between estimation and control—A survey of the two-armed problem. J. Franklin Inst. 301, 161–189 (1976)
  21. Malak, R. J. and Khosla, P. K. , A framework for the adaptive transfer of robot skill knowledge using reinforcement learning agents. Proceedings of IEEE International Conference on Robotics and Automation, 2001, pp. 1994–2001.
  22. S. Schaal and Christopher Atkeson. Robot juggling: An implementation of memory-based learning. Control Systems Magazine, 14, 1994.
  23. Sridhar Mahadevan and Jonathan Connell. Automatic programming of behavior-based robots using reinforcement learning. In Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 1991.
  24. Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157-163, San Francisco, CA, 1994. Morgan Kaufmann.
  25. R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Neural Information Processing Systems 8, 1996
  26. Yong Duan, Qiang Liu, XinHe Xu : Application of reinforcement learning in robot soccer Engineering Applications of Artificial Intelligence 20 (2007) 936–950
  27. Wang Qiang, Zhan Zhongli Reinforcement Learning Model, Algorithms and its application International Conference on Mechatronic Science, Electric Engineering and Computer , August 19-22 , 2011 Jilin China
  28. Gary G. Yen, Travis W. Hickey Reinforcement learning algorithms for robotic navigation in dynamic environments ISA transactions 43 (2004) 217-230.
  29. Maryam Shokri, Knowledge of opposite actions for reinforcement learning Elsevier applied soft computing 11 (2011) 4097-4109.
  30. Prasad Tadepalli, DoKyeong Ok: Model-based average reward reinforcement learning Elsevier Artificial Intelligence 100 (1998) 177-224
  31. Abhijit Gosavi : A Tutorial for Reinforcement Learning March 8, 2013
  32. Soumya Ray, Prasad Tadepalli : Model-Based Reinforcement Learning July 10, 2009
Index Terms

Computer Science
Information Sciences

Keywords

Reinforcement Learning Q-Learning temporal difference robot control