Model-Selection for Non-parametric Function Approximation in Continuous Control Problems: A Case Study in a Smart Energy System

Urieli, Daniel; Stone, Peter

doi:10.1007/978-3-642-40988-2_5

Daniel Urieli²³ &
Peter Stone²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8188))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3617 Accesses

Abstract

This paper investigates the application of value-function-based reinforcement learning to a smart energy control system, specifically the task of controlling an HVAC system to minimize energy while satisfying residents’ comfort requirements. In theory, value-function-based reinforcement learning methods can solve control problems such as this one optimally. However, since choosing an appropriate parametric representation of the value function turns out to be difficult, we develop an alternative method, which results in a practical algorithm for value function approximation in continuous state-spaces. To avoid the need to carefully design a parametric representation for the value function, we use a smooth non-parametric function approximator, specifically Locally Weighted Linear Regression (LWR). LWR is used within Fitted Value Iteration (FVI), which has met with several practical successes. However, for efficiency reasons, LWR is used with a limited sample-size, which leads to poor performance without careful tuning of LWR’s parameters. We therefore develop an efficient meta-learning procedure that performs online model-selection and tunes LWR’s parameters based on the Bellman error. Our algorithm is fully implemented and tested in a realistic simulation of the HVAC control domain, and results in significant energy savings.

Download to read the full chapter text

Chapter PDF

Enhancing photovoltaic parameter estimation: integration of non-linear hunting and reinforcement learning strategies with golden jackal optimizer

Article Open access 02 February 2024

Adaptive modeling for reliability in optimal control of complex HVAC systems

Article 13 August 2019

Hybrid Offline/Online Optimization for Energy Management via Reinforcement Learning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)
Article Google Scholar
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning (1997)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E.: PILCO: A Model-Based and Data-Efficient Approach to Policy Search. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (June 2011)
Google Scholar
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with gaussian processes. In: Proc. of the 22nd International Conference on Machine Learning, pp. 201–208. ACM Press (2005)
Google Scholar
Farahmand, A.M., Szepesvári, C.: Model selection in reinforcement learning. Mach. Learn. 85(3), 299–332 (2011)
Article MATH Google Scholar
Gordon, G.J.: Stable function approximation in dynamic programming. In: Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann (1995)
Google Scholar
Hansen, N.: The CMA Evolution Strategy: A Tutorial (January 2009)
Google Scholar
Keller, P.W., Mannor, S., Precup, D.: Automatic basis function construction for approximate dynamic programming and reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 449–456. ACM, New York (2006)
Google Scholar
Kohl, N., Stone, P.: Machine learning for fast quadrupedal locomotion. In: The Nineteenth National Conference on Artificial Intelligence, pp. 611–616 (July 2004)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
MathSciNet Google Scholar
Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215–238 (2005)
Article MathSciNet MATH Google Scholar
Munos, R., Szepesvári, C.: Finite time bounds for sampling based fitted value iteration. In: ICML, pp. 881–886 (2005)
Google Scholar
Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S.: Autonomous helicopter flight via reinforcement learning. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Google Scholar
Parr, R., Painter-Wakefield, C., Li, L., Littman, M.: Analyzing feature generation for value-function approximation. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 737–744. ACM, New York (2007)
Google Scholar
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(79), 1180–1190 (2008)
Google Scholar
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley (2011)
Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes, 3rd edn. The Art of Scientific Computing. Cambridge University Press, New York (2007)
MATH Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)
Book MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Urieli, D., Stone, P.: A learning agent for heat-pump thermostat control. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (May 2013)
Google Scholar
Williams, R.J., Baird III, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. In: Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems (1994), http://leemon.com/papers/1994wb.pdf

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, The University of Texas at Austin, Austin, TX, 78712, USA
Daniel Urieli & Peter Stone

Authors

Daniel Urieli
View author publications
You can also search for this author in PubMed Google Scholar
Peter Stone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Hendrik Blockeel
Fraunhofer IAIS, Department of Knowledge Discovery, University of Bonn, Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Kristian Kersting
LIACS, Universiteit Leiden, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Siegfried Nijssen
Department of Computer Science and Engineering, Czech Technical University, Technicka 2, 16627, Prague 6, Czech Republic
Filip Železný

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Urieli, D., Stone, P. (2013). Model-Selection for Non-parametric Function Approximation in Continuous Control Problems: A Case Study in a Smart Energy System. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-40988-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40987-5
Online ISBN: 978-3-642-40988-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics