Reinforcement learning to adjust parametrized motor primitives to new situations

Kober, Jens; Wilhelm, Andreas; Oztop, Erhan; Peters, Jan

doi:10.1007/s10514-012-9290-3

Reinforcement learning to adjust parametrized motor primitives to new situations

Published: 05 April 2012

Volume 33, pages 361–379, (2012)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Jens Kober^1,2,
Andreas Wilhelm³,
Erhan Oztop^4,5,6 &
…
Jan Peters^1,2

2498 Accesses
117 Citations
3 Altmetric
Explore all metrics

Abstract

Humans manage to adapt learned movements very quickly to new situations by generalizing learned behaviors from similar situations. In contrast, robots currently often need to re-learn the complete movement. In this paper, we propose a method that learns to generalize parametrized motor plans by adapting a small set of global parameters, called meta-parameters. We employ reinforcement learning to learn the required meta-parameters to deal with the current situation, described by states. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. To show its feasibility, we evaluate this algorithm on a toy example and compare it to several previous approaches. Subsequently, we apply the approach to three robot tasks, i.e., the generalization of throwing movements in darts, of hitting movements in table tennis, and of throwing balls where the tasks are learned on several different real physical robots, i.e., a Barrett WAM, a BioRob, the JST-ICORP/SARCOS CBi and a Kuka KR 6.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental reinforcement learning for multi-objective robotic tasks

Article 22 September 2016

Automated Robot Skill Learning from Demonstration for Various Robot Systems

Reinforcement learning control of a biomechanical model of the upper extremity

Article Open access 14 July 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Note that the dynamical systems motor primitives ensure the stability of the movement generation but cannot guarantee the stability of the movement execution (Ijspeert et al. 2002; Schaal et al. 2007).
The equality (Φ ^T RΦ+λI)⁻¹ Φ ^T R=Φ ^T(ΦΦ ^T+λR ⁻¹)⁻¹ is straightforward to verify by left and right multiplying the non-inverted terms: Φ ^T R(ΦΦ ^T+λR ⁻¹)=(Φ ^T RΦ+λI)Φ ^T.

References

Barto, A., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379.
Article MathSciNet Google Scholar
Bays, P., & Wolpert, D. (2007). Computational principles of sensorimotor control that minimise uncertainty and variability. Journal of Physiology, 578, 387–396.
Article Google Scholar
Bentivegna, D. C., Ude, A., Atkeson, C. G., & Cheng, G. (2004). Learning to act from observation and practice. International Journal of Humanoid Robotics, 1(4), 585–611.
Article Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.
MATH Google Scholar
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.
Article Google Scholar
Cheng, G., Hyon, S., Morimoto, J., Ude, A., Hale, J. G., Colvin, G., Scroggin, W., & Jacobsen, S. C. (2007). CB: A humanoid research platform for exploring neuroscience. Advanced Robotics, 21(10), 1097–1114.
Article Google Scholar
Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278.
Article MATH Google Scholar
Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4–6), 495–506.
Article Google Scholar
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proc. int. conf. machine learning (pp. 201–208).
Chapter Google Scholar
Grimes, D. B., & Rao, R. P. N. (2008). Learning nonparametric policies by imitation. In Proc. int. conf. intelligent robots and system (pp. 2022–2028).
Google Scholar
Huber, M., & Grupen, R. (1998). Learning robot control—using control policies as abstract actions. In NIPS’98 workshop: abstraction and hierarchy in reinforcement learning.
Google Scholar
Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1523–1530).
Google Scholar
Jaakkola, T., Jordan, M. I., Singh, S. P. (1993). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (Vol. 6, pp. 703–710).
Google Scholar
Jetchev, N., & Toussaint, M. (2009). Trajectory prediction: learning to map situations to robot trajectories. In Proc. int. conf. machine learning (p. 57).
Google Scholar
Kober, J., & Peters, J. (2011a). Learning elementary movements jointly with a higher level task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 338–343).
Google Scholar
Kober, J., & Peters, J. (2011b). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203.
Article MATH Google Scholar
Kober, J., Mülling, K., Krömer, O., Lampert, C. H., Schölkopf, B., & Peters, J. (2010a). Movement templates for learning of hitting and batting. In Proc. IEEE int. conf. robotics and automation (pp. 853–858).
Google Scholar
Kober, J., Oztop, E., & Peters, J. (2010b). Reinforcement learning to adjust robot movements to new situations. In Proc. robotics: science and systems conf. (pp. 33–40).
Google Scholar
Kronander, K., Khansari-Zadeh, M. S., & Billard, A. (2011). Learning to control planar hitting motions in a minigolf-like task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 710–717).
Google Scholar
Lampariello, R., Nguyen-Tuong, D., Castellini, C., Hirzinger, G., & Peters, J. (2011). Trajectory planning for optimal robot catching in real-time. In Proc. IEEE int. conf. robotics and automation (pp. 3719–3726).
Google Scholar
Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proc. int. conf. uncertainty in artificial intelligence (pp. 354–361).
Google Scholar
Lens, T., Kunz, J., Trommer, C., Karguth, A., & von Stryk, O. (2010). Biorob-arm: A quickly deployable and intrinsically safe, light-weight robot arm for service robotics applications. In 41st international symposium on robotics/6th German conference on robotics (pp. 905–910).
Google Scholar
Masters Games Ltd (2010). The rules of darts. http://www.mastersgames.com/rules/darts-rules.htm.
McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proc. int. conf. machine learning (pp. 361–368).
Google Scholar
McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing.
Google Scholar
Mülling, K., Kober, J., & Peters, J. (2010). Learning table tennis with a mixture of motor primitives. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 411–416).
Chapter Google Scholar
Mülling, K., Kober, J., & Peters, J. (2011). A biomimetic approach to robot table tennis. Adaptive Behavior, 9(5), 359–376.
Article Google Scholar
Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47(2–3), 79–91.
Article Google Scholar
Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 91–98).
Google Scholar
Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In Proc. IEEE int. conf. robotics and automation (pp. 1293–1298).
Google Scholar
Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212.
Article Google Scholar
Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.
Article Google Scholar
Pongas, D., Billard, A., & Schaal, S. (2005). Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 2911–2916).
Chapter Google Scholar
Rasmussen, C. E., & Williams, C. K. (2006). Gaussian processes for machine learning. Cambridge: MIT Press.
MATH Google Scholar
Russell, S. (1998). Learning agents for uncertain environments (extended abstract). In Proc. eleventh annual conference on computational learning theory (pp. 101–103). New York: ACM.
Chapter Google Scholar
Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445.
Article Google Scholar
Schmidt, R., & Wrisberg, C. (2000). Motor learning and performance (2nd edn.). Champaign: Human Kinetics.
Google Scholar
Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
Google Scholar
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (Vol. 12, pp. 1057–1063).
Google Scholar
Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815.
Article Google Scholar
Urbanek, H., Albu-Schäffer, A., & van der Smagt, P. (2004). Learning from demonstration repetitive movements for autonomous service robotics. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 3495–3500).
Google Scholar
Welling, M. (2010). The Kalman filter. Lecture notes.
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.
MATH Google Scholar
Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics.
Google Scholar

Download references

Acknowledgements

The project receives funding from the European Community’s Seventh Framework Programme under grant agreement no. ICT-248273 GeRT. The project receives funding from the European Community’s Seventh Framework Programme under grant agreement no. ICT-270327 CompLACS. The authors thank Prof. K. Wöllhaf from the University of Applied Sciences Ravensburg-Weingarten for supporting the Kuka KR 6 experiment.

Author information

Authors and Affiliations

MPI for Intelligent Systems, Tübingen, Germany
Jens Kober & Jan Peters
TU Darmstadt, Darmstadt, Germany
Jens Kober & Jan Peters
University of Applied Sciences Ravensburg-Weingarten, Weingarten, Germany
Andreas Wilhelm
NICT, Kyoto, Japan
Erhan Oztop
ATR, Kyoto, Japan
Erhan Oztop
Özyeğin University, Istanbul, Turkey
Erhan Oztop

Authors

Jens Kober
View author publications
You can also search for this author inPubMed Google Scholar
Andreas Wilhelm
View author publications
You can also search for this author inPubMed Google Scholar
Erhan Oztop
View author publications
You can also search for this author inPubMed Google Scholar
Jan Peters
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jens Kober.

Appendix: Motor primitive meta-parameters

The motor primitives based on dynamical systems (Ijspeert et al. 2002; Schaal et al. 2007; Kober et al. 2010a) have six natural meta-parameters: the initial position $\mathbf {\mathrm {x}}_{1}^{0}$, the initial velocity $\mathbf {\mathrm {x}}_{2}^{0}$, the goal g, the goal velocities $\dot{ \mathbf {\mathrm {g}}}$, the amplitude A and the duration T. The meta-parameters modify the global movement by rescaling it spatially or temporally, or by reshaping it with respect to the desired boundary conditions. In the table tennis task the initial position and velocity are determined by the phase preceding the hitting phase. In Fig. 24 we illustrate influence of the goal, goal velocity and duration meta-parameters on the movement generation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kober, J., Wilhelm, A., Oztop, E. et al. Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33, 361–379 (2012). https://doi.org/10.1007/s10514-012-9290-3

Download citation

Received: 20 May 2011
Accepted: 17 March 2012
Published: 05 April 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s10514-012-9290-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning to adjust parametrized motor primitives to new situations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Incremental reinforcement learning for multi-objective robotic tasks

Automated Robot Skill Learning from Demonstration for Various Robot Systems

Reinforcement learning control of a biomechanical model of the upper extremity

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Motor primitive meta-parameters

Appendix: Motor primitive meta-parameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now