Abstract
During social interactions, humans are capable of initiating and responding to rich and complex social actions despite having incomplete world knowledge, and physical, perceptual and computational constraints. This capability relies on action perception mechanisms that exploit regularities in observed goal-oriented behaviours to generate robust predictions and reduce the workload of sensing systems. To achieve this essential capability, we argue that the following three factors are fundamental. First, human knowledge is frequently hierarchically structured, both in the perceptual and execution domains. Second, human perception is an active process driven by current task requirements and context; this is particularly important when the perceptual input is complex (e.g. human motion) and the agent has to operate under embodiment constraints. Third, learning is at the heart of action perception mechanisms, underlying the agent’s ability to add new behaviours to its repertoire. Based on these factors, we review multiple instantiations of a hierarchically-organised biologically-inspired framework for embodied action perception, demonstrating its flexibility in addressing the rich computational contexts of action perception and learning in robotic platforms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aloimonos, J., Weiss, I., Bandyopadhyay, A. (1988). Active vision. International Journal of Computer Vision, 1(4), 333–356.
Bajcsy, R. (1988). Active perception. Proceedings of the IEEE, 76(8), 966–1005.
Ballard, D. (1991). Animate vision. Artificial Intelligence, 48, 57–86.
Bar, M. (2007). The proactive brain: using analogies and associations to generate predictions. Trends in Cognitive Science, 11(7), 280–289.
Bar, M., & Biederman, I. (1999). Localizing the cortical region mediating visual awareness of object identity. Proceedings of the National Academy of Sciences USA, 96(4), 1790–1793.
Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Schmidt, A. M., Dale, A. M., Hamalainen, M. S., Marinkovic, K., Schacter, D. L., Rosen, B. R., Halgren, E. (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences USA, 103(2), 449–454.
Bishop, C. M., & Lasserre, J. (2007). Generative or discriminative? getting the best of both worlds. Bayesian Statistics, 8, 3–24.
Buschman, T. J., & Miller, E. K. (2007). Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science, 315(5820), 1860–1862.
Calvo-Merino, B., Glaser, D., Grèzes, J., Passingham, R., Haggard, P. (2005). Action observation and acquired motor skills: an fmri study with expert dancers. Cerebral Cortex, 15(8), 1243–1249.
Cuijpers, R. H., van Schie, H. T., Koppen, M., Erlhagen, W., Bekkering, H. (2006). Goals and means in action observation: a computational approach. Neural Networks, 19(3), 311–322.
Dawkins, R., Bateson, P. P. G., & Hinde, R. A. (1976). Growing points in ethology (pp. 7–54). London: Cambridge University Press.
Dearden, A. M., & Demiris, Y. (2005). Learning forward models for robots. In IJCAI-05, Proceedings of the nineteenth international joint conference on artificial intelligence, Edinburgh, Scotland, UK, July 30–August 5, 2005 (pp. 1440–1445).
Demiris, Y. (2007). Prediction of intent in robotics and multi-agent systems. Cognitive Processing, 8(3), 151–158.
Demiris, Y., & Hayes, G. M. (2002). Imitation as a dual-route process featuring predictive and learning components: a biologically-plausible computational model. In Imitation in animals and artifacts. Cambridge: MIT.
Demiris, Y., & Johnson, M. (2003). Distributed, predictive perception of actions: a biologically inspired robotics architecture for imitation and learning. Connection Science, 15(4), 231–243.
Demiris, Y., & Khadhouri, B. (2006). Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54(5), 361–369.
Demiris, Y., & Khadhouri, B. (2008). Content-based control of goal-directed attention during human action perception. Interaction Studies, 9(2), 353–376.
Epshtein, B., & Ullman, S. (2007). Semantic hierarchies for recognizing objects and parts. In IEEE conference on computer vision and pattern recognition, 2007. CVPR’07 (pp. 1–8). New York: IEEE.
Fadiga, L., Fogassi, L., Pavesi, G., Rizzolatti, G. (1995). Motor facilitation during action observation: a magnetic stimulation study. Journal of Neurophysiology, 73(6), 2608–2611.
Fagioli, S., Hommel, B., Schubotz, R. (2007). Intentional control of attention: action planning primes action-related stimulus dimensions. Psychological Research, 71(1), 22–29.
Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119(2), 593.
Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12), 493–501.
Gangitano, M., Mottaghy, F., Pascual-Leone, A. (2001). Phase-specific modulation of cortical motor output during movement observation. Neuroreport, 12(7), 1489.
Gangitano, M., Mottaghy, F., Pascual-Leone, A. (2004). Modulation of premotor mirror neuron activity during observation of unpredictable grasping movements. European Journal of Neuroscience, 20(8), 2193–2202.
Gazzola, V., & Keysers, C. (2009). The observation and execution of actions share motor and somatosensory voxels in all tested subjects: single-subject analyses of unsmoothed fmri data. Cerebral Cortex, 19(6), 1239–1255.
Gopnik, A., & Meltzoff, A. (1997). Words, Thoughts, and Theories. Cambridge: MIT.
Grafton, S., et al. (2007). Evidence for a distributed hierarchy of action representation in the brain. Human Movement Science, 26(4), 590–616.
Grush, R. (2004). The emulation theory of representation: motor control, imagery, and perception. Behavioral and Brain Sciences, 27(3), 377–96.
Haruno, M., Wolpert, D., Kawato, M. (2001). Mosaic model for sensorimotor learning and control. Neural Computation, 13(10), 2201–2220.
Haruno, M., Wolpert, D., Kawato, M. (2003). Hierarchical mosaic for movement generation. Excepta Medica International Coungress Series, 1250, 575–590.
Hess, W. R. (1957). The functional organization of the diencephalon. New York: Grune & Stratton.
Hesslow, G. (2002). Conscious thought as simulation of behaviour and perception. Trends in Cognitive Sciences, 6(6), 242–247.
Hinton, G. (2010). Learning to represent visual input. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1537), 177.
Hinton, G. E., & Ghahramani, Z. (1997). Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society of London B, 352, 1177–1190.
Honeycutt, C., & Nichols, T. (2010). The decerebrate cat generates the essential features of the force constraint strategy. Journal of Neurophysiology, 103(6), 3266.
Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286(5449), 2526–2528.
Ivanov, Y., & Bobick, A. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.
Jeannerod, M. (1981). Intersegmental coordination during reaching at natural visual objects (vol. 9, pp. 153–168). Hillsdale: Lawrence Erlbaum Associates, Inc.
Jeannerod, M. (1994). The representing brain: neural correlates of motor intention and imagery. Behavioral and Brain Sciences, 17(02), 187–202.
Johnson, M., & Demiris, Y. (2004). Towards Autonomous Robotic Systems: Proceedings of TAROS 2004; University of Essex, 6.-8.9.2004. Technical report series/Department of Computer Science, University of Essex. http://books.google.co.uk/books?id=XIzhjwEACAAJ
Kato, T., & Floreano, D. (2001). An evolutionary active-vision system. In Proceedings of the 2001 congress on evolutionary computation (vol. 1, pp. 107–114). New York: IEEE. doi:10.1109/CEC.2001.934378.
Keysers, C., & Gazzola, V. (2010). Social neuroscience: mirror neurons recorded in humans. Current Biology, 20, 353–354.
Langley, P., & Stromsten, S. (2000). Learning context-free grammars with a simplicity bias. In Proceedings of the 11th European conference on machine learning (pp. 321–338). Berlin: Springer.
Lee, K., & Demiris, Y. (2011). Towards incremental learning of task-dependent action sequences using probabilistic parsing. In IEEE first joint international conference on development and learning and on epigenetic robotics (ICDL-EPIROB 2011). Germany: Frankfurt am Main
Lee, K., Kim, T. K., Demiris, Y. (2012). Learning reusable task representations using hierarchical activity grammars with uncertainties. In IEEE international conference on robotics and automation (IEEE ICRA 2012). Minnesota: St. Paul.
Liske, E. (1999). The hierarchical organiztion of mantid behaviours. In F. R. Prete, H. Wells, P. H. Wells, L. E. Hurd (Eds.), The praying mantids. Baltimore: Johns Hopkins University Press.
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of imaging understanding workshop (pp. 121–130). Darpa.
Malcolm, G. L., & Henderson, J. M. (2010). Combining top-down processes to guide eye movements during real-world scene search. Journal of Vision, 10, 1–11.
Nehaniv, C., & Dautenhahn, K. (2002). The correspondence problems, Chap. 2 (pp. 41–61). Cambridge: MIT.
Ognibene, D., Catenacci Volpi, N., Pezzulo, G. (2011). Learning to grasp information with your own hands. In Proceedings of 12th conference towards autonomous robotics systems (TAROS 2011). Berlin: Springer. http://link.springer.com/book/10.1007/978-3-642-23232-9/page/1
Ognibene, D., Pezzulo, G., Baldassarre, G. (2010). How can bottom-up information shape learning of top-down attention control skills? In Proceedings of 9th international conference on development and learning. New York: IEEE.
O’Regan, J. K., & Noé, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral Brain Science, 24(5), 939–973.
Pearl, J. (2000). Causality: models, reasoning and inference. Cambridge: Cambridge University Press.
Pezzulo, G., Barca, L., Bocconi, A., Borghi, A. (2010). When affordances climb into your mind: advantages of motor simulation in a memory task performed by novice and expert rock climbers. Brain and Cognition, 73(1), 68–73.
Pezzulo, G., Barsalou, L. W., Cangelosi, A., Fischer, M. H., Spivey, M., McRae, K. (2011). The mechanics of embodiment: a dialogue on embodiment and computational modeling. Frontiers in Psychology, 2(00005).
Rao, R. P., & Ballard, D. (1995). An active vision architecture based on iconic representations. Artificial Intelligence, 78(1–2), 461–505.
Reddy, L., & Kanwisher, N. (2006). Coding of visual objects in the ventral stream. Current Opinion in Neurobiology, 16(4), 408–414.
Ryoo, M., & Aggarwal, J. (2006). Recognition of composite human activities through context-free grammar based representation. In IEEE computer society conference on computer vision and pattern recognition, 2006 (vol. 2, pp. 1709–1718). New York: IEEE.
Sarabia, M., Ros, R., Demiris, Y. (2011). Towards an open-source social middleware for humanoid robots. In Proceedings of the IEEE/RAS international conference on humanoid robotics. New York: IEEE.
Shanton, K., & Goldman, A. (2010). Simulation theory. Wiley Interdisciplinary Reviews: Cognitive Science, 1(4), 527–538.
Simmons, G., & Demiris, Y. (2006). Object grasping using the minimum variance model. Biological Cybernetics, 94(5), 393–407.
Simon, H. A. (1962). The architecture of complexity. Proceedings of the American Philosophical Society, 106(6), 467–482.
Sutton, R. S., Precup, D., Singh, S. (1999). Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 211, 112–181.
Suzuki, M., & Floreano, D. (2006). Evolutionary active vision toward three dimensional landmark-navigation. In From animals to animats 9. Berlin: Springer. http://link.springer.com/book/10.1007/11840541/page/1
Tate, A. (1977). Generating project networks. In Proceedings of the international joint conference on artificial intelligence (IJCAI-77) (pp. 888–893). Cambridge: Morgan Kaufmann.
Tatler, B. W., Hayhoe, M. M., Land, M. F., Ballard, D. (2011). Eye guidance in natural vision: reinterpreting salience. Journal of Vision, 11(5), 1–23.
Theocharous, G., Murphy, K., Kaelbling, L. (2004). Representing hierarchical pomdps as dbns for multi-scale robot localization. In 2004 IEEE international conference on robotics and automation (ICRA) (vol. 1, pp. 1045–1051). New York: IEEE.
Wu, Y., & Demiris, Y. (2010). Towards one shot learning by imitation for humanoid robots. In 2010 IEEE international conference on robotics and automation (ICRA) (pp. 2889–2894). New york: IEEE.
Acknowledgments
This research has received funding from the European Union Seventh Framework Programme FP7/2007-2013, under grant agreement no. [270490]- [EFAA].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ognibene, D., Wu, Y., Lee, K., Demiris, Y. (2013). Hierarchies for Embodied Action Perception. In: Baldassarre, G., Mirolli, M. (eds) Computational and Robotic Models of the Hierarchical Organization of Behavior. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39875-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-39875-9_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39874-2
Online ISBN: 978-3-642-39875-9
eBook Packages: Computer ScienceComputer Science (R0)