Novel First Order Bayesian Optimization with an Application to Reinforcement Learning

J., Prabuchandran K.; Penubothula, Santosh; Kamanchi, Chandramouli; Bhatnagar, Shalabh

doi:10.1007/s10489-020-01896-w

Novel First Order Bayesian Optimization with an Application to Reinforcement Learning

Published: 30 September 2020

Volume 51, pages 1565–1579, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Prabuchandran K. J.¹,
Santosh Penubothula ORCID: orcid.org/0000-0002-9178-7130²,
Chandramouli Kamanchi³ &
…
Shalabh Bhatnagar³

1153 Accesses
14 Citations
Explore all metrics

Abstract

Zeroth Order Bayesian Optimization (ZOBO) methods optimize an unknown function based on its black-box evaluations at the query locations. Unlike most optimization procedures, ZOBO methods fail to utilize gradient information even when it is available. On the other hand, First Order Bayesian Optimization (FOBO) methods exploit the available gradient information to arrive at better solutions faster. However, the existing FOBO methods do not utilize a crucial information that the gradient is zero at the optima. Further, the inherent sequential nature of the FOBO methods incur high computational cost limiting their wide applicability. To alleviate the aforementioned difficulties of FOBO methods, we propose a relaxed statistical model to leverage the gradient information that directly searches for points where gradient vanishes. To accomplish this, we develop novel acquisition algorithms that search for global optima effectively. Unlike the existing FOBO methods, the proposed methods are parallelizable. Through extensive experimentation on standard test functions, we compare the performance of our methods over the existing methods. Furthermore, we explore an application of the proposed FOBO methods in the context of policy gradient reinforcement learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Constraining Policy Updates Using the KL Divergence

Reinforcement Learning

Finite-time error bounds for Greedy-GQ

Article 30 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

This observation could be utilized in the existing FOBO methods as well. However, due to the computational burden of the joint GP model in the existing FOBO methods, we propose to to utilize this fact in independent GP modeling . Note further that we do not require joint GP modeling to utilize this fact.

References

Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Book Google Scholar
Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Article Google Scholar
Ahmed MO, Shahriarim B, Schmidt M (2016) Do we need “harmless” bayesian optimization and “first-order” bayesian optimization. NIPS BayesOpt
Brochu E, Cora VM, De Freitas N (2010) A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599
Bull AD (2011) Convergence rates of efficient global optimization algorithms. J Mach Learn Res 12(10):1–27
MathSciNet MATH Google Scholar
Calandra R, Seyfarth A, Peters J, Deisenroth MP (2016) Bayesian optimization for learning gaits under uncertainty. Ann Math Artif Intell 76(1–2):5–23
Article MathSciNet Google Scholar
Deisenroth M, Rasmussen CE (2011) Pilco: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 465–472
Duffy AC (2009) An introduction to gradient computation by the discrete adjoint method, Tech. Rep.
Frazier PI, Powell WB, Dayanik S (2008) A knowledge-gradient policy for sequential information collection. SIAM J Control Optim 47(5):2410–2439
Article MathSciNet Google Scholar
Fu J, Luo H, Feng J, Chua T -S (2016) Distilling reverse-mode automatic differentiation (drmad) for optimizing hyperparameters of deep neural networks, arXiv:1601.00917
Hernández-Lobato J M, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems, pp 918–926
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492
Article MathSciNet Google Scholar
Kingma D, Adam JBA (2014) A method for stochastic optimization. arXiv:1412.6980
Koistinen OP, Maras E, Vehtari A, Jónsson H (2016) Minimum energy path calculations with Gaussian process regression. Nanosyst Phys Chem Math 7(6):925
Article Google Scholar
Lizotte DJ (2008) Practical Bayesian optimization. University of Alberta
Lizotte DJ, Wang T, Bowling MH, Schuurmans D (2007) Automatic gait optimization with Gaussian process regression. In: IJCAI, vol 7, pp 944–949
Luketina J, Berglund M, Greff K, Raiko T (2016) Scalable gradient-based tuning of continuous regularization hyperparameters. In: International conference on machine learning, pp 2952–2960
Maclaurin D, Duvenaud D, Adams R (2015) Gradient-based hyperparameter optimization through reversible learning. In: International conference on machine learning, pp 2113–2122
Martinez-Cantin R (2017) Bayesian optimization with adaptive kernels for robot control. In: IEEE International conference on robotics and automation (ICRA). IEEE, pp 3350–3356
Martinez-Cantin R, de Freitas N, Brochu E, Castellanos J, Doucet A (2009) A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Auton Rob 27(2):93–103
Article Google Scholar
McLeod M, Osborne MA, Roberts SJ (2018) Optimization, fast and slow: optimally switching between local and Bayesian optimization, arXiv:1805.08610
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
O’Donoghue B, Munos R, Kavukcuoglu K, Mnih V (2016) Combining policy gradient and Q-learning. arXiv:1611.01626
Osborne MA, Garnett R, Roberts SJ (2009) Gaussian processes for global optimization. In: 3rd international conference on learning and intelligent optimization (LION3), pp 1–15
Peters J, Schaal S (2006) Policy gradient methods for robotics. In: IEEE/RSJ international conference on intelligent robots and systems, pp 2219–2225
Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7):1180–1190
Article Google Scholar
Plessix R -E (2006) A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys J Int 167(2):459–503
Article Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms, arXiv:1707.06347
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning, vol 1. MIT Press, Cambridge
MATH Google Scholar
Rückstiess T, Sehnke F, Schaul T, Wierstra D, Sun Y, Schmidhuber J (2010) Exploring parameter space in reinforcement learning. Paladyn 1(1):14–24
Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959
Srinivas N, Krause A, Seeger M, Kakade SM (2010) Gaussian process optimization in the bandit setting: no regret and experimental design. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 1015–1022
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge
MATH Google Scholar
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn 4(2):26–31
Google Scholar
Vazquez E, Bect J (2010) Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J Stat Plan Inference 140(11):3088–3095
Article MathSciNet Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
MATH Google Scholar
Wilson A, Fern A, Tadepalli P (2014) Using trajectory data to improve Bayesian optimization for reinforcement learning. J Mach Learn Res 15(1):253–282
MathSciNet MATH Google Scholar
Wu J, Poloczek M, Wilson AG, Frazier PI (2017) Bayesian optimization with gradients. In: Advances in neural information processing systems, pp 5267–5278
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yogatama D, Kong L, Smith NA (2015) Bayesian optimization of text representations. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2100–2105

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Dharwad, India
Prabuchandran K. J.
IBM Research, Bangalore, India
Santosh Penubothula
Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
Chandramouli Kamanchi & Shalabh Bhatnagar

Authors

Prabuchandran K. J.
View author publications
You can also search for this author inPubMed Google Scholar
Santosh Penubothula
View author publications
You can also search for this author inPubMed Google Scholar
Chandramouli Kamanchi
View author publications
You can also search for this author inPubMed Google Scholar
Shalabh Bhatnagar
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Prabuchandran K. J..

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Prabuchandran K. J. was supported by SGNF research grant from Indian Institute of Technology, Dharwad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

J., P.K., Penubothula, S., Kamanchi, C. et al. Novel First Order Bayesian Optimization with an Application to Reinforcement Learning. Appl Intell 51, 1565–1579 (2021). https://doi.org/10.1007/s10489-020-01896-w

Download citation

Published: 30 September 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10489-020-01896-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel First Order Bayesian Optimization with an Application to Reinforcement Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Constraining Policy Updates Using the KL Divergence

Reinforcement Learning

Finite-time error bounds for Greedy-GQ

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now