Hierarchies for Embodied Action Perception

Ognibene, Dimitri; Wu, Yan; Lee, Kyuhwa; Demiris, Yiannis

doi:10.1007/978-3-642-39875-9_5

Dimitri Ognibene³,
Yan Wu³,
Kyuhwa Lee³ &
…
Yiannis Demiris³

1217 Accesses
5 Citations

Abstract

During social interactions, humans are capable of initiating and responding to rich and complex social actions despite having incomplete world knowledge, and physical, perceptual and computational constraints. This capability relies on action perception mechanisms that exploit regularities in observed goal-oriented behaviours to generate robust predictions and reduce the workload of sensing systems. To achieve this essential capability, we argue that the following three factors are fundamental. First, human knowledge is frequently hierarchically structured, both in the perceptual and execution domains. Second, human perception is an active process driven by current task requirements and context; this is particularly important when the perceptual input is complex (e.g. human motion) and the agent has to operate under embodiment constraints. Third, learning is at the heart of action perception mechanisms, underlying the agent’s ability to add new behaviours to its repertoire. Based on these factors, we review multiple instantiations of a hierarchically-organised biologically-inspired framework for embodied action perception, demonstrating its flexibility in addressing the rich computational contexts of action perception and learning in robotic platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reinforcement Learning with Information-Theoretic Actuation

Hierarchical motor control in mammals and machines

Article Open access 02 December 2019

Beyond Watching: Action Understanding by Humans and Implications for Motion Planning by Interacting Robots

References

Aloimonos, J., Weiss, I., Bandyopadhyay, A. (1988). Active vision. International Journal of Computer Vision, 1(4), 333–356.
Article Google Scholar
Bajcsy, R. (1988). Active perception. Proceedings of the IEEE, 76(8), 966–1005.
Article Google Scholar
Ballard, D. (1991). Animate vision. Artificial Intelligence, 48, 57–86.
Article Google Scholar
Bar, M. (2007). The proactive brain: using analogies and associations to generate predictions. Trends in Cognitive Science, 11(7), 280–289.
Article Google Scholar
Bar, M., & Biederman, I. (1999). Localizing the cortical region mediating visual awareness of object identity. Proceedings of the National Academy of Sciences USA, 96(4), 1790–1793.
Article Google Scholar
Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Schmidt, A. M., Dale, A. M., Hamalainen, M. S., Marinkovic, K., Schacter, D. L., Rosen, B. R., Halgren, E. (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences USA, 103(2), 449–454.
Article Google Scholar
Bishop, C. M., & Lasserre, J. (2007). Generative or discriminative? getting the best of both worlds. Bayesian Statistics, 8, 3–24.
MathSciNet Google Scholar
Buschman, T. J., & Miller, E. K. (2007). Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science, 315(5820), 1860–1862.
Article Google Scholar
Calvo-Merino, B., Glaser, D., Grèzes, J., Passingham, R., Haggard, P. (2005). Action observation and acquired motor skills: an fmri study with expert dancers. Cerebral Cortex, 15(8), 1243–1249.
Article Google Scholar
Cuijpers, R. H., van Schie, H. T., Koppen, M., Erlhagen, W., Bekkering, H. (2006). Goals and means in action observation: a computational approach. Neural Networks, 19(3), 311–322.
Article MATH Google Scholar
Dawkins, R., Bateson, P. P. G., & Hinde, R. A. (1976). Growing points in ethology (pp. 7–54). London: Cambridge University Press.
Google Scholar
Dearden, A. M., & Demiris, Y. (2005). Learning forward models for robots. In IJCAI-05, Proceedings of the nineteenth international joint conference on artificial intelligence, Edinburgh, Scotland, UK, July 30–August 5, 2005 (pp. 1440–1445).
Google Scholar
Demiris, Y. (2007). Prediction of intent in robotics and multi-agent systems. Cognitive Processing, 8(3), 151–158.
Article Google Scholar
Demiris, Y., & Hayes, G. M. (2002). Imitation as a dual-route process featuring predictive and learning components: a biologically-plausible computational model. In Imitation in animals and artifacts. Cambridge: MIT.
Google Scholar
Demiris, Y., & Johnson, M. (2003). Distributed, predictive perception of actions: a biologically inspired robotics architecture for imitation and learning. Connection Science, 15(4), 231–243.
Article Google Scholar
Demiris, Y., & Khadhouri, B. (2006). Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54(5), 361–369.
Article Google Scholar
Demiris, Y., & Khadhouri, B. (2008). Content-based control of goal-directed attention during human action perception. Interaction Studies, 9(2), 353–376.
Article Google Scholar
Epshtein, B., & Ullman, S. (2007). Semantic hierarchies for recognizing objects and parts. In IEEE conference on computer vision and pattern recognition, 2007. CVPR’07 (pp. 1–8). New York: IEEE.
Google Scholar
Fadiga, L., Fogassi, L., Pavesi, G., Rizzolatti, G. (1995). Motor facilitation during action observation: a magnetic stimulation study. Journal of Neurophysiology, 73(6), 2608–2611.
Google Scholar
Fagioli, S., Hommel, B., Schubotz, R. (2007). Intentional control of attention: action planning primes action-related stimulus dimensions. Psychological Research, 71(1), 22–29.
Article Google Scholar
Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119(2), 593.
Article Google Scholar
Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12), 493–501.
Article Google Scholar
Gangitano, M., Mottaghy, F., Pascual-Leone, A. (2001). Phase-specific modulation of cortical motor output during movement observation. Neuroreport, 12(7), 1489.
Article Google Scholar
Gangitano, M., Mottaghy, F., Pascual-Leone, A. (2004). Modulation of premotor mirror neuron activity during observation of unpredictable grasping movements. European Journal of Neuroscience, 20(8), 2193–2202.
Article Google Scholar
Gazzola, V., & Keysers, C. (2009). The observation and execution of actions share motor and somatosensory voxels in all tested subjects: single-subject analyses of unsmoothed fmri data. Cerebral Cortex, 19(6), 1239–1255.
Article Google Scholar
Gopnik, A., & Meltzoff, A. (1997). Words, Thoughts, and Theories. Cambridge: MIT.
Google Scholar
Grafton, S., et al. (2007). Evidence for a distributed hierarchy of action representation in the brain. Human Movement Science, 26(4), 590–616.
Article Google Scholar
Grush, R. (2004). The emulation theory of representation: motor control, imagery, and perception. Behavioral and Brain Sciences, 27(3), 377–96.
Google Scholar
Haruno, M., Wolpert, D., Kawato, M. (2001). Mosaic model for sensorimotor learning and control. Neural Computation, 13(10), 2201–2220.
Article MATH Google Scholar
Haruno, M., Wolpert, D., Kawato, M. (2003). Hierarchical mosaic for movement generation. Excepta Medica International Coungress Series, 1250, 575–590.
Article Google Scholar
Hess, W. R. (1957). The functional organization of the diencephalon. New York: Grune & Stratton.
Google Scholar
Hesslow, G. (2002). Conscious thought as simulation of behaviour and perception. Trends in Cognitive Sciences, 6(6), 242–247.
Article Google Scholar
Hinton, G. (2010). Learning to represent visual input. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1537), 177.
Article Google Scholar
Hinton, G. E., & Ghahramani, Z. (1997). Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society of London B, 352, 1177–1190.
Article Google Scholar
Honeycutt, C., & Nichols, T. (2010). The decerebrate cat generates the essential features of the force constraint strategy. Journal of Neurophysiology, 103(6), 3266.
Article Google Scholar
Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286(5449), 2526–2528.
Article Google Scholar
Ivanov, Y., & Bobick, A. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.
Article Google Scholar
Jeannerod, M. (1981). Intersegmental coordination during reaching at natural visual objects (vol. 9, pp. 153–168). Hillsdale: Lawrence Erlbaum Associates, Inc.
Google Scholar
Jeannerod, M. (1994). The representing brain: neural correlates of motor intention and imagery. Behavioral and Brain Sciences, 17(02), 187–202.
Article Google Scholar
Johnson, M., & Demiris, Y. (2004). Towards Autonomous Robotic Systems: Proceedings of TAROS 2004; University of Essex, 6.-8.9.2004. Technical report series/Department of Computer Science, University of Essex. http://books.google.co.uk/books?id=XIzhjwEACAAJ
Kato, T., & Floreano, D. (2001). An evolutionary active-vision system. In Proceedings of the 2001 congress on evolutionary computation (vol. 1, pp. 107–114). New York: IEEE. doi:10.1109/CEC.2001.934378.
Google Scholar
Keysers, C., & Gazzola, V. (2010). Social neuroscience: mirror neurons recorded in humans. Current Biology, 20, 353–354.
Article Google Scholar
Langley, P., & Stromsten, S. (2000). Learning context-free grammars with a simplicity bias. In Proceedings of the 11th European conference on machine learning (pp. 321–338). Berlin: Springer.
Google Scholar
Lee, K., & Demiris, Y. (2011). Towards incremental learning of task-dependent action sequences using probabilistic parsing. In IEEE first joint international conference on development and learning and on epigenetic robotics (ICDL-EPIROB 2011). Germany: Frankfurt am Main
Google Scholar
Lee, K., Kim, T. K., Demiris, Y. (2012). Learning reusable task representations using hierarchical activity grammars with uncertainties. In IEEE international conference on robotics and automation (IEEE ICRA 2012). Minnesota: St. Paul.
Google Scholar
Liske, E. (1999). The hierarchical organiztion of mantid behaviours. In F. R. Prete, H. Wells, P. H. Wells, L. E. Hurd (Eds.), The praying mantids. Baltimore: Johns Hopkins University Press.
Google Scholar
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of imaging understanding workshop (pp. 121–130). Darpa.
Google Scholar
Malcolm, G. L., & Henderson, J. M. (2010). Combining top-down processes to guide eye movements during real-world scene search. Journal of Vision, 10, 1–11.
Article Google Scholar
Nehaniv, C., & Dautenhahn, K. (2002). The correspondence problems, Chap. 2 (pp. 41–61). Cambridge: MIT.
Google Scholar
Ognibene, D., Catenacci Volpi, N., Pezzulo, G. (2011). Learning to grasp information with your own hands. In Proceedings of 12th conference towards autonomous robotics systems (TAROS 2011). Berlin: Springer. http://link.springer.com/book/10.1007/978-3-642-23232-9/page/1
Ognibene, D., Pezzulo, G., Baldassarre, G. (2010). How can bottom-up information shape learning of top-down attention control skills? In Proceedings of 9th international conference on development and learning. New York: IEEE.
Google Scholar
O’Regan, J. K., & Noé, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral Brain Science, 24(5), 939–973.
Article Google Scholar
Pearl, J. (2000). Causality: models, reasoning and inference. Cambridge: Cambridge University Press.
Google Scholar
Pezzulo, G., Barca, L., Bocconi, A., Borghi, A. (2010). When affordances climb into your mind: advantages of motor simulation in a memory task performed by novice and expert rock climbers. Brain and Cognition, 73(1), 68–73.
Article Google Scholar
Pezzulo, G., Barsalou, L. W., Cangelosi, A., Fischer, M. H., Spivey, M., McRae, K. (2011). The mechanics of embodiment: a dialogue on embodiment and computational modeling. Frontiers in Psychology, 2(00005).
Google Scholar
Rao, R. P., & Ballard, D. (1995). An active vision architecture based on iconic representations. Artificial Intelligence, 78(1–2), 461–505.
Article Google Scholar
Reddy, L., & Kanwisher, N. (2006). Coding of visual objects in the ventral stream. Current Opinion in Neurobiology, 16(4), 408–414.
Article Google Scholar
Ryoo, M., & Aggarwal, J. (2006). Recognition of composite human activities through context-free grammar based representation. In IEEE computer society conference on computer vision and pattern recognition, 2006 (vol. 2, pp. 1709–1718). New York: IEEE.
Google Scholar
Sarabia, M., Ros, R., Demiris, Y. (2011). Towards an open-source social middleware for humanoid robots. In Proceedings of the IEEE/RAS international conference on humanoid robotics. New York: IEEE.
Google Scholar
Shanton, K., & Goldman, A. (2010). Simulation theory. Wiley Interdisciplinary Reviews: Cognitive Science, 1(4), 527–538.
Google Scholar
Simmons, G., & Demiris, Y. (2006). Object grasping using the minimum variance model. Biological Cybernetics, 94(5), 393–407.
Article MathSciNet MATH Google Scholar
Simon, H. A. (1962). The architecture of complexity. Proceedings of the American Philosophical Society, 106(6), 467–482.
Google Scholar
Sutton, R. S., Precup, D., Singh, S. (1999). Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 211, 112–181.
MathSciNet Google Scholar
Suzuki, M., & Floreano, D. (2006). Evolutionary active vision toward three dimensional landmark-navigation. In From animals to animats 9. Berlin: Springer. http://link.springer.com/book/10.1007/11840541/page/1
Tate, A. (1977). Generating project networks. In Proceedings of the international joint conference on artificial intelligence (IJCAI-77) (pp. 888–893). Cambridge: Morgan Kaufmann.
Google Scholar
Tatler, B. W., Hayhoe, M. M., Land, M. F., Ballard, D. (2011). Eye guidance in natural vision: reinterpreting salience. Journal of Vision, 11(5), 1–23.
Article Google Scholar
Theocharous, G., Murphy, K., Kaelbling, L. (2004). Representing hierarchical pomdps as dbns for multi-scale robot localization. In 2004 IEEE international conference on robotics and automation (ICRA) (vol. 1, pp. 1045–1051). New York: IEEE.
Google Scholar
Wu, Y., & Demiris, Y. (2010). Towards one shot learning by imitation for humanoid robots. In 2010 IEEE international conference on robotics and automation (ICRA) (pp. 2889–2894). New york: IEEE.
Google Scholar

Download references

Acknowledgments

This research has received funding from the European Union Seventh Framework Programme FP7/2007-2013, under grant agreement no. [270490]- [EFAA].

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, Imperial College London, London, UK
Dimitri Ognibene, Yan Wu, Kyuhwa Lee & Yiannis Demiris

Authors

Dimitri Ognibene
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kyuhwa Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yiannis Demiris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitri Ognibene .

Editor information

Editors and Affiliations

Consiglio Nazionale delle Ricerche, Istituto di Scienze e Tecnologie della Cognizione, Rome, Italy
Gianluca Baldassarre
Consiglio Nazionale delle Ricerche, Istituto di Scienze e Tecnologie della Cognizione, Rome, Italy
Marco Mirolli

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ognibene, D., Wu, Y., Lee, K., Demiris, Y. (2013). Hierarchies for Embodied Action Perception. In: Baldassarre, G., Mirolli, M. (eds) Computational and Robotic Models of the Hierarchical Organization of Behavior. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39875-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-39875-9_5
Published: 28 September 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39874-2
Online ISBN: 978-3-642-39875-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics