Active reward learning with a novel acquisition function

Daniel, Christian; Kroemer, Oliver; Viering, Malte; Metz, Jan; Peters, Jan

doi:10.1007/s10514-015-9454-z

Active reward learning with a novel acquisition function

Published: 16 July 2015

Volume 39, pages 389–405, (2015)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Christian Daniel ORCID: orcid.org/0000-0001-7065-5326¹,
Oliver Kroemer¹,
Malte Viering¹,
Jan Metz¹ &
…
Jan Peters^1,2

1411 Accesses
37 Citations
1 Altmetric
Explore all metrics

Abstract

Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. We introduce a framework, wherein the robot simultaneously learns an action policy and a model of the reward function by actively querying a human expert for ratings. We represent the reward model using a Gaussian process and evaluate several classical acquisition functions (AFs) from the Bayesian optimization literature in this context. Furthermore, we present a novel AF, expected policy divergence. We demonstrate results of our method for a robot grasping task and show that the learned reward function generalizes to a similar task. Additionally, we evaluate the proposed novel AF on a real robot pendulum swing-up task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning Experiments and Benchmark for Solving Robotic Reaching Tasks

Relational Affordance Learning for Task-Dependent Robot Grasping

Robot learning from demonstration for path planning: A review

Article 06 July 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Akrour, R., Schoenauer, M., & Sebag, M. (2011). Preference-based policy learning. Machine learning and knowledge discovery in databases. Berlin: Springer.
Google Scholar
Akrour, R., Schoenauer, M., & Sebag, M. (2013). Interactive robot education. In European Conference on Machine Learning Workshop.
Balasubramanian, R., Ling, X., Brook, P. D., Smith, J. R., & Matsuoka, Y. (2012). Physical human interactive guidance: Identifying grasping principles from human-planned grasps. IEEE Transactions on Robotics, 28(4), 899–910.
Article Google Scholar
Bratman, J., Singh, S., Sorg, J., & Lewis, R. (2012). Strong mitigation: Nesting search for good policies within search for good reward. In International Conference on Autonomous Agents and Multiagent Systems.
Cakmak, M., & Thomaz, A. L. (2012). Designing robot learners that ask good questions. In International Conference on Human-Robot Interaction.
Cheng, W., Fürnkranz, J., Hüllermeier, E., & Park, S.-H. (2011). Preference-based policy iteration: Leveraging preference learning for reinforcement learning. Machine learning and knowledge discovery in databases. Berlin: Springer.
Google Scholar
Chu, W., & Ghahramani, Z. (2005). Preference learning with Gaussian processes. In International Conference on Machine Learning.
Dang, H., & Allen, P. K. (2012). Learning grasp stability. In International Conference on Robotics and Automation.
Daniel, C., Neumann, G., & Peters, J. (2013). Learning sequential motor tasks. In International Conference on Robotics and Automation.
Devlin, S., & Kudenko, D. (2012). Dynamic potential-based reward shaping. In International Conference on Autonomous Agents and Multiagent Systems.
Dorigo, M., & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71, 321–370.
Article Google Scholar
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In International Conference on Machine Learning.
Ghavamzadeh, M., & Engel, Y. (2007). Bayesian policy gradient algorithms. Advances in neural information processing systems. Cambridge, MA: MIT Press.
Google Scholar
Griffith, S., Subramanian, K., Scholz, J., Isbell, C., & Thomaz, A. L. (2013). Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems.
Hake, H. W., & Garner, W. R. (1951). The effect of presenting various numbers of discrete steps on scale reading accuracy. Journal of Experimental Psychology, 42, 358.
Article Google Scholar
Hoffman, M. D., Brochu, E., & de Freitas, N. (2011). Portfolio allocation for Bayesian optimization. In Conference on Uncertainty in Artificial Intelligence.
Ijspeert, A., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. Advances in neural information processing systems. Cambridge, MA: MIT Press.
Google Scholar
Jain, A., Wojcik, B., Joachims, T., & Saxena, A. (2013). Learning trajectory preferences for manipulators via iterative improvement. Advances in neural information processing systems. Cambridge, MA: MIT Press.
Google Scholar
Julier, S. J, & Uhlmann, J. K. (1997). A new extension of the kalman filter to nonlinear systems. In International Symposium on Aerospace/Defense Sensing, Simulation and Controls.
Knox, W. B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. In International Conference on Knowledge Capture.
Kober, J., Mohler, B. J., & Peters, J. (2008). Learning perceptual coupling for motor primitives. In Intelligent Robots and Systems.
Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In International Conference on Machine Learning.
Kormushev, P., Calinon, S., & Caldwell, D. (2010). Robot motor skill coordination with EM-based reinforcement learning. In Intelligent Robots and Systems.
Kroemer, O., Detry, R., Piater, J., & Peters, J. (2010). Combining active learning and reactive control for robot grasping. Robotics and Autonomous Systems, 58, 1105–1116.
Article Google Scholar
Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Fluids Engineering, 86(1), 97–106.
MathSciNet Google Scholar
Lopes, M., Melo, F., & Montesano, L. (2009). Active learning for reward estimation in inverse reinforcement learning. Machine learning and knowledge discovery in databases. Berlin: Springer.
Google Scholar
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.
Article Google Scholar
Mockus, J., Tiesis, V., & Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. Towards global optimization. Amsterdam: North Holland.
Google Scholar
Ng, A., & Coates, A. (1998). Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX.
Ng, A., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In International Conference on Machine Learning.
Peters, J., Mülling, K., & Altun, Y. (2010). Relative entropy policy search. In National Conference on Artificial Intelligence.
Rasmussen, C. E., & Rasmussen, C. K. I. (2006). Gaussian processes for machine learning. Cambridge, MA: MIT Press.
MATH Google Scholar
Ratliff, N., Bagnell, A., & Zinkevich, M. (2006). Maximum margin planning. In International Conference on Machine Learning.
Ratliff, N., Silver, D., & Bagnell, A. (2009). Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27, 25–53.
Article Google Scholar
Schoenauer, M., Akrour, R., Sebag, M., & Souplet, J.-C. (2014). Programming by feedback. In International Conference on Machine Learning.
Singh, S., Lewis, R. L., Barto, A. G., & Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. Autonomous Mental Development, 2, 70–82.
Article Google Scholar
Srinivas, N., Krause, A., Kakade, S. M., & Seeger, M. (2010). Gaussian process optimization in the bandit setting: No regret and experimental design. In Internation Conference on Machine Learning.
Suárez Feijóo, R., Cornellá Medrano, J., & Roa Garzón, M. (2014). Grasp quality measures: Review and performance. Autonomous Robots, 38(1), 65–88. http://link.springer.com/article/10.1007/s10514-014-9402-3
Thomaz, A. L., & Breazeal, C. (2008). Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence, 172, 716–737.
Article Google Scholar
Wilson, A., Fern, A., & Tadepalli, P. (2012). A bayesian approach for policy learning from trajectory preference queries. In Advances in Neural Information Processing Systems.
Ziebart, B., Maas, A., Bagnell, A., & Dey, A. (2008) Maximum entropy inverse reinforcement learning. In Conference on Artificial Intelligence.

Download references

Acknowledgments

The authors want to thank for the support of the European Union Projects #FP7-ICT-270327 (Complacs) and #FP7-ICT-2013-10 (3rd Hand).

Author information

Authors and Affiliations

Technische Universität Darmstadt, Hochschulstrasse 10, 64289, Darmstadt, Germany
Christian Daniel, Oliver Kroemer, Malte Viering, Jan Metz & Jan Peters
Max-Planck-Institut für Intelligente Systeme, Spemannstraße 38, 72076, Tübingen, Germany
Jan Peters

Authors

Christian Daniel
View author publications
You can also search for this author inPubMed Google Scholar
Oliver Kroemer
View author publications
You can also search for this author inPubMed Google Scholar
Malte Viering
View author publications
You can also search for this author inPubMed Google Scholar
Jan Metz
View author publications
You can also search for this author inPubMed Google Scholar
Jan Peters
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Christian Daniel.

Additional information

This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daniel, C., Kroemer, O., Viering, M. et al. Active reward learning with a novel acquisition function. Auton Robot 39, 389–405 (2015). https://doi.org/10.1007/s10514-015-9454-z

Download citation

Received: 07 November 2014
Accepted: 02 July 2015
Published: 16 July 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s10514-015-9454-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active reward learning with a novel acquisition function

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning Experiments and Benchmark for Solving Robotic Reaching Tasks

Relational Affordance Learning for Task-Dependent Robot Grasping

Robot learning from demonstration for path planning: A review

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now