Abstract
Passive sensory processing is often insufficient to guide biological organisms in complex environments. Rather, behaviourally relevant information can be accessed by performing so-called epistemic actions that explicitly aim at unveiling hidden information. However, it is still unclear how an autonomous agent can learn epistemic actions and how it can use them adaptively. In this work, we propose a definition of epistemic actions for POMDPs that derive from their characterizations in cognitive science and classical planning literature. We give theoretical insights about how partial observability and epistemic actions can affect the learning process and performance in the extreme conditions of model-free and memory-free reinforcement learning where hidden information cannot be represented. We finally investigate these concepts using an integrated eye-arm neural architecture for robot control, which can use its effectors to execute epistemic actions and can exploit the actively gathered information to efficiently accomplish a seek-and-reach task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Behrens, T.E.J., Woolrich, M.W., Walton, M.E., Rushworth, M.F.S.: Learning the value of information in an uncertain world. Nat. Neurosci. 10(9), 1214–1221 (2007)
Kepecs, A., Uchida, N., Zariwala, H.A., Mainen, Z.F.: Neural correlates, computation and behavioural impact of decision confidence. Nature 455(7210), 227–231 (2008)
Pezzulo, G., Rigoli, F., Chersi, F.: The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol. 4, 92 (2013)
Roy, N., Thrun, S.: Coastal navigation with mobile robots. In: Advances in Neural Information Processing Systems, vol. 12 (2000)
Cassandra, A., Kaelbling, L., Kurien, J.: Acting under uncertainty: discrete bayesian models for mobile-robotnavigation. In: Proc. of IROS 1996 (1996)
Kwok, C., Fox, D.: Reinforcement learning for sensing strategies. In: Proc. of IROS 2004 (2004)
Hsiao, K., Kaelbling, L., Lozano-Perez, T.: Task-driven tactile exploration. In: Proc. of Robotics: Science and Systems (RSS) (2010)
Lepora, N., Martinez, U., Prescott, T.: Active touch for robust perception under position uncertainty. In: IEEE Proceedings of ICRA (2013)
Sullivan, J., Mitchinson, B., Pearson, M.J., Evans, M., Lepora, N.F., Fox, C.W., Melhuish, C., Prescott, T.J.: Tactile discrimination using active whisker sensors. IEEE Sensors Journal 12(2), 350–362 (2012)
Moore, R.: 9 a formal theory of knowledge and action. In: Hobbs, J., Moore, R. (eds.) Formal Theories of the Commonsense World. Intellect Books (1985)
Herzig, A., Lang, J., Marquis, P.: Action representation and partially observable planning in epistemic logic. In: Proc. of IJCAI 2003 (2003)
Kirsh, D., Maglio, P.: On distinguishing epistemic from pragmatic action. Cognitive Science 18(4), 513–549 (1994)
Kirsh, D.: Thinking with external representations. AI & Society (February 2010)
Cassandra, A.R.: Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Brown University (1998)
Melo, F.S., Ribeiro, I.M.: Transition entropy in partially observable markov decision processes. In: Proc. of the 9th IAS, pp. 282–289 (2006)
Denzler, J., Brown, C.: Information theoretic sensor data selection for active object recognition and state estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(2), 145–157 (2002)
Whitehead, S., Lin, L.: Reinforcement learning of non-markov decision processes. Artificial Intelligence 73(1-2), 271–306 (1995)
Vlassis, N., Toussaint, M.: Model-free reinforcement learning as mixture learning. In: Proc. of the 26th Ann. Int. Conf. on Machine Learning, pp. 1081–1088. ACM (2009)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Nolfi, S.: Power and the limits of reactive agents. Neurocomputing 42(1-4), 119–145 (2002)
Aberdeen, D., Baxter, J.: Scalable internal-state policy-gradient methods for pomdps. In: Proc. of Int. Conf. Machine Learning, pp. 3–10 (2002)
Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)
Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn. (1996)
Ognibene, D.: Ecological Adaptive Perception from a Neuro-Robotic perspective: theory, architecture and experiments. PhD thesis, University of Genoa (May 2009)
Ognibene, D., Pezzulo, G., Baldassarre, G.: Learning to look in different environments: An active-vision model which learns and readapts visual routines. In: Proc. of the 11th Conf. on Simulation of Adaptive Behaviour (2010)
Balkenius, C.: Attention, habituation and conditioning: Toward a computational model. Cognitive Science Quarterly 1(2), 171–204 (2000)
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proc. ICML, pp. 216–224 (1990)
Berlyne: Curiosity and exploration. Science 153(3731), 9–96 (1966)
Baldassarre, G., Mirolli, M.: Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin (2013)
Tishby, N., Polani, D.: Information theory of decisions and actions. In: Perception-Action Cycle, pp. 601–636. Springer (2011)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proc.of the ICML, pp. 278–287 (1999)
Beer, R.D.: The dynamics of active categorical perception in an evolved model agent. Adapt. Behav. 11, 209–243 (2003)
Friston, K., Adams, R.A., Perrinet, L., Breakspear, M.: Perceptions as hypotheses: saccades as experiments. Frontiers in Psychology 3 (2012)
Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 469(2153) (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ognibene, D., Volpi, N.C., Pezzulo, G., Baldassare, G. (2013). Learning Epistemic Actions in Model-Free Memory-Free Reinforcement Learning: Experiments with a Neuro-robotic Model. In: Lepora, N.F., Mura, A., Krapp, H.G., Verschure, P.F.M.J., Prescott, T.J. (eds) Biomimetic and Biohybrid Systems. Living Machines 2013. Lecture Notes in Computer Science(), vol 8064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39802-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-39802-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39801-8
Online ISBN: 978-3-642-39802-5
eBook Packages: Computer ScienceComputer Science (R0)