CFP last date
20 December 2024
Reseach Article

Comparing Action as Input and Action as Output in a Reinforcement Learning Task

by Evans Miriti, Peter Waiganjo, andrew Mwaura
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 76 - Number 1
Year of Publication: 2013
Authors: Evans Miriti, Peter Waiganjo, andrew Mwaura
10.5120/13212-0593

Evans Miriti, Peter Waiganjo, andrew Mwaura . Comparing Action as Input and Action as Output in a Reinforcement Learning Task. International Journal of Computer Applications. 76, 1 ( August 2013), 24-28. DOI=10.5120/13212-0593

@article{ 10.5120/13212-0593,
author = { Evans Miriti, Peter Waiganjo, andrew Mwaura },
title = { Comparing Action as Input and Action as Output in a Reinforcement Learning Task },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 76 },
number = { 1 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 24-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume76/number1/13212-0593/ },
doi = { 10.5120/13212-0593 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:44:47.842654+05:30
%A Evans Miriti
%A Peter Waiganjo
%A andrew Mwaura
%T Comparing Action as Input and Action as Output in a Reinforcement Learning Task
%J International Journal of Computer Applications
%@ 0975-8887
%V 76
%N 1
%P 24-28
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Generalization techniques are useful for enabling an agent to be able to approximate the value of states it has not encountered so far in reinforcement learning. They are also useful as memory use minimization mechanisms in situations where the state space is too large such that is infeasible to represent every state in the state space in the computer memory. Artificial Neural Networks are one generalization technique that is usually employed. Various network structures have been proposed in literature. In this study, two of the structures that have been proposed were implemented in a robot navigation task and their performance compared. The results indicate that having a network structure where there is an output node for each of the possible actions, is superior to the structure in which the selected action is fed as an input to the network and its value output by the single network output node.

References
  1. Asada, M. , Noda, S. , Tawaratsumida, S. & Hosoda, K. , 1994. Vision-Based Behavior Acquisition for a Shooting robot by Using a Reinforcement Learning. In IAPR/IEEE Workshop on Visual Behaviours. , 1994.
  2. Kaelbling, L. P. , Littman, M. L. & Moore, A. W. , 1996. Reinforcement Learning: A Survey. Journal of Artificial Intelligence,4 , pp. 237-285. Available Through: CiteSeer [Accessed 7 August 2013].
  3. McClelland, J. L. , 2013. Explorations in Parallel and Distributed Processing: A Handbook of Models, Programs and Exercises.
  4. Microsoft, 2011. Robotics Developer Studio: Getting Started.
  5. Mitchel, T. , 1997. Machine Learning. Singapore: McGraw-Hill.
  6. Sherstov, A. A. & Stone, P. , 2005. Improving Action Selection in MDPs via Knowledge Transfer. In 20th National Conference on Artificial Intelligence. Pittsburgh, USA, 2005. last accessed online on: http://www. cs. utexas. edu/~pstone/Papers/bib2html/b2hd-AAAI05-actions. html.
  7. Sutton S Richard, 1998. Implementation Details of the TD(gamma) Procedure for the Case of Vector Predictions and Backpropagation.
  8. Sutton, S. R. & Barto, G. A. , 1998. Reinforcement Learning: An Introduction. London: MIT Press.
  9. Taylor, M. E. & Stone, P. , July 2005. Behavior Transfer for Value-Function-Based Reinforcement Learning. In Fourth International Joint Conference on Autonomous Agents and Multiagent Systems. Utrecht, The Netherlands, July 2005.
  10. Tesauro, G. , 1995. Temporal Difference Learning and TD-Gammon. Communications of the ACM, 38(3).
  11. Usher, K. , 2006. Obstacle avoidance for a non-holonomic vehicle using occupancy grids. In MacDonald, B. , ed. Conference on Robotics and Automation. Auckland, Newzealand, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Reinforcement Learning Artificial Neural Networks obstacle avoidance