Abstract
This paper investigates the application of value-function-based reinforcement learning to a smart energy control system, specifically the task of controlling an HVAC system to minimize energy while satisfying residents’ comfort requirements. In theory, value-function-based reinforcement learning methods can solve control problems such as this one optimally. However, since choosing an appropriate parametric representation of the value function turns out to be difficult, we develop an alternative method, which results in a practical algorithm for value function approximation in continuous state-spaces. To avoid the need to carefully design a parametric representation for the value function, we use a smooth non-parametric function approximator, specifically Locally Weighted Linear Regression (LWR). LWR is used within Fitted Value Iteration (FVI), which has met with several practical successes. However, for efficiency reasons, LWR is used with a limited sample-size, which leads to poor performance without careful tuning of LWR’s parameters. We therefore develop an efficient meta-learning procedure that performs online model-selection and tunes LWR’s parameters based on the Bellman error. Our algorithm is fully implemented and tested in a realistic simulation of the HVAC control domain, and results in significant energy savings.
Chapter PDF
Similar content being viewed by others
Keywords
- Optimal Policy
- Markov Decision Process
- HVAC System
- Approximate Dynamic Program
- Markov Decision Process Model
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning (1997)
Deisenroth, M.P., Rasmussen, C.E.: PILCO: A Model-Based and Data-Efficient Approach to Policy Search. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (June 2011)
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with gaussian processes. In: Proc. of the 22nd International Conference on Machine Learning, pp. 201–208. ACM Press (2005)
Farahmand, A.M., Szepesvári, C.: Model selection in reinforcement learning. Mach. Learn. 85(3), 299–332 (2011)
Gordon, G.J.: Stable function approximation in dynamic programming. In: Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann (1995)
Hansen, N.: The CMA Evolution Strategy: A Tutorial (January 2009)
Keller, P.W., Mannor, S., Precup, D.: Automatic basis function construction for approximate dynamic programming and reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 449–456. ACM, New York (2006)
Kohl, N., Stone, P.: Machine learning for fast quadrupedal locomotion. In: The Nineteenth National Conference on Artificial Intelligence, pp. 611–616 (July 2004)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215–238 (2005)
Munos, R., Szepesvári, C.: Finite time bounds for sampling based fitted value iteration. In: ICML, pp. 881–886 (2005)
Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S.: Autonomous helicopter flight via reinforcement learning. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Parr, R., Painter-Wakefield, C., Li, L., Littman, M.: Analyzing feature generation for value-function approximation. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 737–744. ACM, New York (2007)
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(79), 1180–1190 (2008)
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley (2011)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes, 3rd edn. The Art of Scientific Computing. Cambridge University Press, New York (2007)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Urieli, D., Stone, P.: A learning agent for heat-pump thermostat control. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (May 2013)
Williams, R.J., Baird III, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. In: Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems (1994), http://leemon.com/papers/1994wb.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Urieli, D., Stone, P. (2013). Model-Selection for Non-parametric Function Approximation in Continuous Control Problems: A Case Study in a Smart Energy System. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40988-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40987-5
Online ISBN: 978-3-642-40988-2
eBook Packages: Computer ScienceComputer Science (R0)