Skip to main content

Learning Epistemic Actions in Model-Free Memory-Free Reinforcement Learning: Experiments with a Neuro-robotic Model

  • Conference paper
Biomimetic and Biohybrid Systems (Living Machines 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8064))

Included in the following conference series:

Abstract

Passive sensory processing is often insufficient to guide biological organisms in complex environments. Rather, behaviourally relevant information can be accessed by performing so-called epistemic actions that explicitly aim at unveiling hidden information. However, it is still unclear how an autonomous agent can learn epistemic actions and how it can use them adaptively. In this work, we propose a definition of epistemic actions for POMDPs that derive from their characterizations in cognitive science and classical planning literature. We give theoretical insights about how partial observability and epistemic actions can affect the learning process and performance in the extreme conditions of model-free and memory-free reinforcement learning where hidden information cannot be represented. We finally investigate these concepts using an integrated eye-arm neural architecture for robot control, which can use its effectors to execute epistemic actions and can exploit the actively gathered information to efficiently accomplish a seek-and-reach task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Behrens, T.E.J., Woolrich, M.W., Walton, M.E., Rushworth, M.F.S.: Learning the value of information in an uncertain world. Nat. Neurosci. 10(9), 1214–1221 (2007)

    Article  Google Scholar 

  2. Kepecs, A., Uchida, N., Zariwala, H.A., Mainen, Z.F.: Neural correlates, computation and behavioural impact of decision confidence. Nature 455(7210), 227–231 (2008)

    Article  Google Scholar 

  3. Pezzulo, G., Rigoli, F., Chersi, F.: The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol. 4, 92 (2013)

    Article  Google Scholar 

  4. Roy, N., Thrun, S.: Coastal navigation with mobile robots. In: Advances in Neural Information Processing Systems, vol. 12 (2000)

    Google Scholar 

  5. Cassandra, A., Kaelbling, L., Kurien, J.: Acting under uncertainty: discrete bayesian models for mobile-robotnavigation. In: Proc. of IROS 1996 (1996)

    Google Scholar 

  6. Kwok, C., Fox, D.: Reinforcement learning for sensing strategies. In: Proc. of IROS 2004 (2004)

    Google Scholar 

  7. Hsiao, K., Kaelbling, L., Lozano-Perez, T.: Task-driven tactile exploration. In: Proc. of Robotics: Science and Systems (RSS) (2010)

    Google Scholar 

  8. Lepora, N., Martinez, U., Prescott, T.: Active touch for robust perception under position uncertainty. In: IEEE Proceedings of ICRA (2013)

    Google Scholar 

  9. Sullivan, J., Mitchinson, B., Pearson, M.J., Evans, M., Lepora, N.F., Fox, C.W., Melhuish, C., Prescott, T.J.: Tactile discrimination using active whisker sensors. IEEE Sensors Journal 12(2), 350–362 (2012)

    Article  Google Scholar 

  10. Moore, R.: 9 a formal theory of knowledge and action. In: Hobbs, J., Moore, R. (eds.) Formal Theories of the Commonsense World. Intellect Books (1985)

    Google Scholar 

  11. Herzig, A., Lang, J., Marquis, P.: Action representation and partially observable planning in epistemic logic. In: Proc. of IJCAI 2003 (2003)

    Google Scholar 

  12. Kirsh, D., Maglio, P.: On distinguishing epistemic from pragmatic action. Cognitive Science 18(4), 513–549 (1994)

    Article  Google Scholar 

  13. Kirsh, D.: Thinking with external representations. AI & Society (February 2010)

    Google Scholar 

  14. Cassandra, A.R.: Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Brown University (1998)

    Google Scholar 

  15. Melo, F.S., Ribeiro, I.M.: Transition entropy in partially observable markov decision processes. In: Proc. of the 9th IAS, pp. 282–289 (2006)

    Google Scholar 

  16. Denzler, J., Brown, C.: Information theoretic sensor data selection for active object recognition and state estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(2), 145–157 (2002)

    Article  Google Scholar 

  17. Whitehead, S., Lin, L.: Reinforcement learning of non-markov decision processes. Artificial Intelligence 73(1-2), 271–306 (1995)

    Article  Google Scholar 

  18. Vlassis, N., Toussaint, M.: Model-free reinforcement learning as mixture learning. In: Proc. of the 26th Ann. Int. Conf. on Machine Learning, pp. 1081–1088. ACM (2009)

    Google Scholar 

  19. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  20. Nolfi, S.: Power and the limits of reactive agents. Neurocomputing 42(1-4), 119–145 (2002)

    Article  MATH  Google Scholar 

  21. Aberdeen, D., Baxter, J.: Scalable internal-state policy-gradient methods for pomdps. In: Proc. of Int. Conf. Machine Learning, pp. 3–10 (2002)

    Google Scholar 

  22. Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)

    Google Scholar 

  23. Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn. (1996)

    Google Scholar 

  24. Ognibene, D.: Ecological Adaptive Perception from a Neuro-Robotic perspective: theory, architecture and experiments. PhD thesis, University of Genoa (May 2009)

    Google Scholar 

  25. Ognibene, D., Pezzulo, G., Baldassarre, G.: Learning to look in different environments: An active-vision model which learns and readapts visual routines. In: Proc. of the 11th Conf. on Simulation of Adaptive Behaviour (2010)

    Google Scholar 

  26. Balkenius, C.: Attention, habituation and conditioning: Toward a computational model. Cognitive Science Quarterly 1(2), 171–204 (2000)

    Google Scholar 

  27. Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proc. ICML, pp. 216–224 (1990)

    Google Scholar 

  28. Berlyne: Curiosity and exploration. Science 153(3731), 9–96 (1966)

    Article  Google Scholar 

  29. Baldassarre, G., Mirolli, M.: Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin (2013)

    Book  Google Scholar 

  30. Tishby, N., Polani, D.: Information theory of decisions and actions. In: Perception-Action Cycle, pp. 601–636. Springer (2011)

    Google Scholar 

  31. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proc.of the ICML, pp. 278–287 (1999)

    Google Scholar 

  32. Beer, R.D.: The dynamics of active categorical perception in an evolved model agent. Adapt. Behav. 11, 209–243 (2003)

    Article  Google Scholar 

  33. Friston, K., Adams, R.A., Perrinet, L., Breakspear, M.: Perceptions as hypotheses: saccades as experiments. Frontiers in Psychology 3 (2012)

    Google Scholar 

  34. Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 469(2153) (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ognibene, D., Volpi, N.C., Pezzulo, G., Baldassare, G. (2013). Learning Epistemic Actions in Model-Free Memory-Free Reinforcement Learning: Experiments with a Neuro-robotic Model. In: Lepora, N.F., Mura, A., Krapp, H.G., Verschure, P.F.M.J., Prescott, T.J. (eds) Biomimetic and Biohybrid Systems. Living Machines 2013. Lecture Notes in Computer Science(), vol 8064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39802-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39802-5_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39801-8

  • Online ISBN: 978-3-642-39802-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics