Abstract
Humans manage to adapt learned movements very quickly to new situations by generalizing learned behaviors from similar situations. In contrast, robots currently often need to re-learn the complete movement. In this paper, we propose a method that learns to generalize parametrized motor plans by adapting a small set of global parameters, called meta-parameters. We employ reinforcement learning to learn the required meta-parameters to deal with the current situation, described by states. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. To show its feasibility, we evaluate this algorithm on a toy example and compare it to several previous approaches. Subsequently, we apply the approach to three robot tasks, i.e., the generalization of throwing movements in darts, of hitting movements in table tennis, and of throwing balls where the tasks are learned on several different real physical robots, i.e., a Barrett WAM, a BioRob, the JST-ICORP/SARCOS CBi and a Kuka KR 6.























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The equality (Φ T RΦ+λI)−1 Φ T R=Φ T(ΦΦ T+λR −1)−1 is straightforward to verify by left and right multiplying the non-inverted terms: Φ T R(ΦΦ T+λR −1)=(Φ T RΦ+λI)Φ T.
References
Barto, A., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379.
Bays, P., & Wolpert, D. (2007). Computational principles of sensorimotor control that minimise uncertainty and variability. Journal of Physiology, 578, 387–396.
Bentivegna, D. C., Ude, A., Atkeson, C. G., & Cheng, G. (2004). Learning to act from observation and practice. International Journal of Humanoid Robotics, 1(4), 585–611.
Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.
Cheng, G., Hyon, S., Morimoto, J., Ude, A., Hale, J. G., Colvin, G., Scroggin, W., & Jacobsen, S. C. (2007). CB: A humanoid research platform for exploring neuroscience. Advanced Robotics, 21(10), 1097–1114.
Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278.
Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4–6), 495–506.
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proc. int. conf. machine learning (pp. 201–208).
Grimes, D. B., & Rao, R. P. N. (2008). Learning nonparametric policies by imitation. In Proc. int. conf. intelligent robots and system (pp. 2022–2028).
Huber, M., & Grupen, R. (1998). Learning robot control—using control policies as abstract actions. In NIPS’98 workshop: abstraction and hierarchy in reinforcement learning.
Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1523–1530).
Jaakkola, T., Jordan, M. I., Singh, S. P. (1993). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (Vol. 6, pp. 703–710).
Jetchev, N., & Toussaint, M. (2009). Trajectory prediction: learning to map situations to robot trajectories. In Proc. int. conf. machine learning (p. 57).
Kober, J., & Peters, J. (2011a). Learning elementary movements jointly with a higher level task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 338–343).
Kober, J., & Peters, J. (2011b). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203.
Kober, J., Mülling, K., Krömer, O., Lampert, C. H., Schölkopf, B., & Peters, J. (2010a). Movement templates for learning of hitting and batting. In Proc. IEEE int. conf. robotics and automation (pp. 853–858).
Kober, J., Oztop, E., & Peters, J. (2010b). Reinforcement learning to adjust robot movements to new situations. In Proc. robotics: science and systems conf. (pp. 33–40).
Kronander, K., Khansari-Zadeh, M. S., & Billard, A. (2011). Learning to control planar hitting motions in a minigolf-like task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 710–717).
Lampariello, R., Nguyen-Tuong, D., Castellini, C., Hirzinger, G., & Peters, J. (2011). Trajectory planning for optimal robot catching in real-time. In Proc. IEEE int. conf. robotics and automation (pp. 3719–3726).
Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proc. int. conf. uncertainty in artificial intelligence (pp. 354–361).
Lens, T., Kunz, J., Trommer, C., Karguth, A., & von Stryk, O. (2010). Biorob-arm: A quickly deployable and intrinsically safe, light-weight robot arm for service robotics applications. In 41st international symposium on robotics/6th German conference on robotics (pp. 905–910).
Masters Games Ltd (2010). The rules of darts. http://www.mastersgames.com/rules/darts-rules.htm.
McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proc. int. conf. machine learning (pp. 361–368).
McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing.
Mülling, K., Kober, J., & Peters, J. (2010). Learning table tennis with a mixture of motor primitives. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 411–416).
Mülling, K., Kober, J., & Peters, J. (2011). A biomimetic approach to robot table tennis. Adaptive Behavior, 9(5), 359–376.
Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47(2–3), 79–91.
Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 91–98).
Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In Proc. IEEE int. conf. robotics and automation (pp. 1293–1298).
Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212.
Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.
Pongas, D., Billard, A., & Schaal, S. (2005). Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 2911–2916).
Rasmussen, C. E., & Williams, C. K. (2006). Gaussian processes for machine learning. Cambridge: MIT Press.
Russell, S. (1998). Learning agents for uncertain environments (extended abstract). In Proc. eleventh annual conference on computational learning theory (pp. 101–103). New York: ACM.
Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445.
Schmidt, R., & Wrisberg, C. (2000). Motor learning and performance (2nd edn.). Champaign: Human Kinetics.
Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (Vol. 12, pp. 1057–1063).
Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815.
Urbanek, H., Albu-Schäffer, A., & van der Smagt, P. (2004). Learning from demonstration repetitive movements for autonomous service robotics. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 3495–3500).
Welling, M. (2010). The Kalman filter. Lecture notes.
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.
Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics.
Acknowledgements
The project receives funding from the European Community’s Seventh Framework Programme under grant agreement no. ICT-248273 GeRT. The project receives funding from the European Community’s Seventh Framework Programme under grant agreement no. ICT-270327 CompLACS. The authors thank Prof. K. Wöllhaf from the University of Applied Sciences Ravensburg-Weingarten for supporting the Kuka KR 6 experiment.
Author information
Authors and Affiliations
Corresponding author
Appendix: Motor primitive meta-parameters
Appendix: Motor primitive meta-parameters
The motor primitives based on dynamical systems (Ijspeert et al. 2002; Schaal et al. 2007; Kober et al. 2010a) have six natural meta-parameters: the initial position \(\mathbf {\mathrm {x}}_{1}^{0}\), the initial velocity \(\mathbf {\mathrm {x}}_{2}^{0}\), the goal g, the goal velocities \(\dot{ \mathbf {\mathrm {g}}}\), the amplitude A and the duration T. The meta-parameters modify the global movement by rescaling it spatially or temporally, or by reshaping it with respect to the desired boundary conditions. In the table tennis task the initial position and velocity are determined by the phase preceding the hitting phase. In Fig. 24 we illustrate influence of the goal, goal velocity and duration meta-parameters on the movement generation.
In this figure, we demonstrate the influence of the goal, goal velocity and duration meta-parameters. The movement represents the hitting phase of the table tennis experiment (Sect. 3.3) and we demonstrate the variation of the meta-parameters employed in this task. The ball is hit at the end of the movement. In these plots we only vary a single meta-parameter at a time and keep the other ones fixed. In (a) the goal g is varied, which allows to hit the ball in different locations and with different orientations. In (b) duration T is varied, which allows to time the hit. In (c) the goal velocity \(\dot{ \mathbf {\mathrm {g}}}\) is varied, which allows to aim at different locations on the opponent’s side of the table
Rights and permissions
About this article
Cite this article
Kober, J., Wilhelm, A., Oztop, E. et al. Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33, 361–379 (2012). https://doi.org/10.1007/s10514-012-9290-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-012-9290-3