Learning Epistemic Actions in Model-Free Memory-Free Reinforcement Learning: Experiments with a Neuro-robotic Model

Ognibene, Dimitri; Volpi, Nicola Catenacci; Pezzulo, Giovanni; Baldassare, Gianluca

doi:10.1007/978-3-642-39802-5_17

Dimitri Ognibene²³,
Nicola Catenacci Volpi²⁶,
Giovanni Pezzulo^24,25 &
…
Gianluca Baldassare²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8064))

Included in the following conference series:

Conference on Biomimetic and Biohybrid Systems

4594 Accesses
5 Citations

Abstract

Passive sensory processing is often insufficient to guide biological organisms in complex environments. Rather, behaviourally relevant information can be accessed by performing so-called epistemic actions that explicitly aim at unveiling hidden information. However, it is still unclear how an autonomous agent can learn epistemic actions and how it can use them adaptively. In this work, we propose a definition of epistemic actions for POMDPs that derive from their characterizations in cognitive science and classical planning literature. We give theoretical insights about how partial observability and epistemic actions can affect the learning process and performance in the extreme conditions of model-free and memory-free reinforcement learning where hidden information cannot be represented. We finally investigate these concepts using an integrated eye-arm neural architecture for robot control, which can use its effectors to execute epistemic actions and can exploit the actively gathered information to efficiently accomplish a seek-and-reach task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Modulating Learning Through Expectation in a Simulated Robotic Setup

Autonomous Learning Needs a Second Environmental Feedback Loop

An unsupervised autonomous learning framework for goal-directed behaviours in dynamic contexts

Article Open access 02 June 2022

References

Behrens, T.E.J., Woolrich, M.W., Walton, M.E., Rushworth, M.F.S.: Learning the value of information in an uncertain world. Nat. Neurosci. 10(9), 1214–1221 (2007)
Article Google Scholar
Kepecs, A., Uchida, N., Zariwala, H.A., Mainen, Z.F.: Neural correlates, computation and behavioural impact of decision confidence. Nature 455(7210), 227–231 (2008)
Article Google Scholar
Pezzulo, G., Rigoli, F., Chersi, F.: The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol. 4, 92 (2013)
Article Google Scholar
Roy, N., Thrun, S.: Coastal navigation with mobile robots. In: Advances in Neural Information Processing Systems, vol. 12 (2000)
Google Scholar
Cassandra, A., Kaelbling, L., Kurien, J.: Acting under uncertainty: discrete bayesian models for mobile-robotnavigation. In: Proc. of IROS 1996 (1996)
Google Scholar
Kwok, C., Fox, D.: Reinforcement learning for sensing strategies. In: Proc. of IROS 2004 (2004)
Google Scholar
Hsiao, K., Kaelbling, L., Lozano-Perez, T.: Task-driven tactile exploration. In: Proc. of Robotics: Science and Systems (RSS) (2010)
Google Scholar
Lepora, N., Martinez, U., Prescott, T.: Active touch for robust perception under position uncertainty. In: IEEE Proceedings of ICRA (2013)
Google Scholar
Sullivan, J., Mitchinson, B., Pearson, M.J., Evans, M., Lepora, N.F., Fox, C.W., Melhuish, C., Prescott, T.J.: Tactile discrimination using active whisker sensors. IEEE Sensors Journal 12(2), 350–362 (2012)
Article Google Scholar
Moore, R.: 9 a formal theory of knowledge and action. In: Hobbs, J., Moore, R. (eds.) Formal Theories of the Commonsense World. Intellect Books (1985)
Google Scholar
Herzig, A., Lang, J., Marquis, P.: Action representation and partially observable planning in epistemic logic. In: Proc. of IJCAI 2003 (2003)
Google Scholar
Kirsh, D., Maglio, P.: On distinguishing epistemic from pragmatic action. Cognitive Science 18(4), 513–549 (1994)
Article Google Scholar
Kirsh, D.: Thinking with external representations. AI & Society (February 2010)
Google Scholar
Cassandra, A.R.: Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Brown University (1998)
Google Scholar
Melo, F.S., Ribeiro, I.M.: Transition entropy in partially observable markov decision processes. In: Proc. of the 9th IAS, pp. 282–289 (2006)
Google Scholar
Denzler, J., Brown, C.: Information theoretic sensor data selection for active object recognition and state estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(2), 145–157 (2002)
Article Google Scholar
Whitehead, S., Lin, L.: Reinforcement learning of non-markov decision processes. Artificial Intelligence 73(1-2), 271–306 (1995)
Article Google Scholar
Vlassis, N., Toussaint, M.: Model-free reinforcement learning as mixture learning. In: Proc. of the 26th Ann. Int. Conf. on Machine Learning, pp. 1081–1088. ACM (2009)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Nolfi, S.: Power and the limits of reactive agents. Neurocomputing 42(1-4), 119–145 (2002)
Article MATH Google Scholar
Aberdeen, D., Baxter, J.: Scalable internal-state policy-gradient methods for pomdps. In: Proc. of Int. Conf. Machine Learning, pp. 3–10 (2002)
Google Scholar
Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)
Google Scholar
Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn. (1996)
Google Scholar
Ognibene, D.: Ecological Adaptive Perception from a Neuro-Robotic perspective: theory, architecture and experiments. PhD thesis, University of Genoa (May 2009)
Google Scholar
Ognibene, D., Pezzulo, G., Baldassarre, G.: Learning to look in different environments: An active-vision model which learns and readapts visual routines. In: Proc. of the 11th Conf. on Simulation of Adaptive Behaviour (2010)
Google Scholar
Balkenius, C.: Attention, habituation and conditioning: Toward a computational model. Cognitive Science Quarterly 1(2), 171–204 (2000)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proc. ICML, pp. 216–224 (1990)
Google Scholar
Berlyne: Curiosity and exploration. Science 153(3731), 9–96 (1966)
Article Google Scholar
Baldassarre, G., Mirolli, M.: Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin (2013)
Book Google Scholar
Tishby, N., Polani, D.: Information theory of decisions and actions. In: Perception-Action Cycle, pp. 601–636. Springer (2011)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proc.of the ICML, pp. 278–287 (1999)
Google Scholar
Beer, R.D.: The dynamics of active categorical perception in an evolved model agent. Adapt. Behav. 11, 209–243 (2003)
Article Google Scholar
Friston, K., Adams, R.A., Perrinet, L., Breakspear, M.: Perceptions as hypotheses: saccades as experiments. Frontiers in Psychology 3 (2012)
Google Scholar
Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 469(2153) (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Personal Robotics Laboratory, Imperial College London, UK
Dimitri Ognibene
Istituto di Scienze e Tecnologie della Cognizione, CNR, Italy
Giovanni Pezzulo & Gianluca Baldassare
Istituto di Linguistica Computazionale “Antonio Zampolli”, CNR, Italy
Giovanni Pezzulo
IMT Institute for Advanced Studies, Lucca, Italy
Nicola Catenacci Volpi

Authors

Dimitri Ognibene
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Catenacci Volpi
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Pezzulo
View author publications
You can also search for this author in PubMed Google Scholar
Gianluca Baldassare
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Sheffield, UK
Nathan F. Lepora & Tony J. Prescott &
University of Pompeau Fabra, Barcelona, Spain
Anna Mura
Imperial College, London, UK
Holger G. Krapp
Catalan Institution for Research and Advanced Studies, University of Pompeau Fabra, Barcelona, Spain
Paul F. M. J. Verschure

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ognibene, D., Volpi, N.C., Pezzulo, G., Baldassare, G. (2013). Learning Epistemic Actions in Model-Free Memory-Free Reinforcement Learning: Experiments with a Neuro-robotic Model. In: Lepora, N.F., Mura, A., Krapp, H.G., Verschure, P.F.M.J., Prescott, T.J. (eds) Biomimetic and Biohybrid Systems. Living Machines 2013. Lecture Notes in Computer Science(), vol 8064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39802-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-39802-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39801-8
Online ISBN: 978-3-642-39802-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics