Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization

Lakshmanan, K.; Bhatnagar, Shalabh

doi:10.1007/s10589-016-9875-4

Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization

Published: 15 September 2016

Volume 66, pages 533–556, (2017)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

547 Accesses
4 Citations
Explore all metrics

Abstract

We propose a multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for stochastic optimization both with and without inequality constraints. The algorithm combines the smoothed functional (SF) scheme for estimating the gradient with the quasi-Newton method to solve the optimization problem. Newton algorithms typically update the Hessian at each instant and subsequently (a) project them to the space of positive definite and symmetric matrices, and (b) invert the projected Hessian. The latter operation is computationally expensive. In order to save computational effort, we propose in this paper a quasi-Newton SF (QN-SF) algorithm based on the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update rule. In Bhatnagar (ACM TModel Comput S. 18(1): 27–62, 2007), a Jacobi variant of Newton SF (JN-SF) was proposed and implemented to save computational effort. We compare our QN-SF algorithm with gradient SF (G-SF) and JN-SF algorithms on two different problems – first on a simple stochastic function minimization problem and the other on a problem of optimal routing in a queueing network. We observe from the experiments that the QN-SF algorithm performs significantly better than both G-SF and JN-SF algorithms on both the problem settings. Next we extend the QN-SF algorithm to the case of constrained optimization. In this case too, the QN-SF algorithm performs much better than the JN-SF algorithm. Finally we present the proof of convergence for the QN-SF algorithm in both unconstrained and constrained settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The smoothed finite element method (S-FEM): A framework for the design of numerical models for desired solutions

Article 29 January 2019

CSG: A new stochastic gradient method for the efficient solution of structural optimization problems with infinitely many states

Article Open access 31 May 2020

Solving Non-smooth Dynamic Problems Using the Alternating Direction Method of Multipliers

Notes

An alternative simpler proof of stability of $\{Z(n)\}$ can be provided by a straightforward verification of the few stability requirements in [14] that are easily seen to hold in our setting.

References

Andradottir, S.: A scaled stochastic approximation algorithm. Manag. Sci. 42, 475–498 (1996)
Article MATH Google Scholar
Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: a survey. Comput. Netw. 38(4), 393–422 (2002)
Article Google Scholar
Bhatnagar, S.: Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization. ACM Trans. Model. Comput. Simul. 15(1), 74–107 (2005)
Article Google Scholar
Bhatnagar, S.: Adaptive Newton-based smoothed functional algorithms for simulation optimization. ACM Trans. Model. Comput. Simul. 18(1), 27–62 (2007)
Article MathSciNet Google Scholar
Bhatnagar, S., Borkar, V.S.: A two time scale stochastic approximation scheme for simulation based parametric optimization. Probab. Eng. Inf. Sci. 12, 519–531 (1998)
Article MATH Google Scholar
Bhatnagar, S., Fu, M.C., Marcus, S.I., Fard, P.J.: Optimal structured feedback policies for ABR flow control using two-timescale SPSA. IEEE/ACM Trans. Netw. 9(4), 479–491 (2001)
Article Google Scholar
Bhatnagar, S., Fu, M.C., Marcus, S.I., Bhatnagar, S.: Two timescale algorithms for simulation optimization of hidden Markov models. IIE Trans. 33(3), 245–258 (2001)
Google Scholar
Bhatnagar, S., Hemachandra, N., Mishra, V.: Stochastic approximation algorithms for constrained optimization via simulation. ACM Trans. Model. Comput. Simul. 21(2), 15:1–15:22 (2011)
Google Scholar
Bhatnagar, S., Prasad, H.L., Prashanth, L.A.: Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods. Springer, New York (2013). LNCIS Series
Book MATH Google Scholar
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A Stochastic Quasi-Newton Method for Large-Scale Optimization. CoRR arXiv:1401.7020 (2014)
Bordes, A., Bottou, L., Gallinari, P.: SGD-QN: careful quasi-Newton stochastic gradient descent. J. Mach. Learn. Res. 10, 1737–1754 (2009)
MathSciNet MATH Google Scholar
Borkar, V.S.: Stochastic Approximation: A Dynamical Systems View point. Cambridge University Press and Hindustan Book Agency, New Delhi (2008)
MATH Google Scholar
Borkar, V.S.: An actor-critic algorithm for constrained Markov decision processes. Syst. Control Lett. 54, 207–213 (2005)
Article MathSciNet MATH Google Scholar
Borkar, V.S., Meyn, S.P.: The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2), 447–469 (2000)
Article MathSciNet MATH Google Scholar
Brandiere, O.: Some pathological traps for stochastic approximation. SIAM J. Control Optim. 36, 1293–1314 (1998)
Article MathSciNet MATH Google Scholar
Cohen, J.E., Kelly, F.P.: A paradox of congestion in a queueing network. J. Appl. Probab. 27, 730–734 (1990)
Article MathSciNet MATH Google Scholar
Dennis, J.E., Morée, J.J.: Quasi-Newton methods, motivation and theory. SIAM Rev. 19(1), 46–89 (1977)
Article MathSciNet MATH Google Scholar
Harchol-Balter, M., Crovella, M., Murta, C.: On choosing a task assignment policy for a distributed server system. IEEE J. Parallel Distrib. Comput. 59(2), 204–228 (1999)
Article Google Scholar
Hirsch, M.W.: Convergent activation dynamics in continuous time networks. Neural Netw. 2, 331–349 (1989)
Article Google Scholar
Kao, C., Chen, S.: A stochastic quasi-Newton method for simulation response optimization. Eur. J. Oper. Res. 173, 30–46 (2006)
Article MathSciNet MATH Google Scholar
Katkovnik, V.Y., Kulchitsky, Y.: Convergence of a class of random search algorithms. Autom. Remote Control 8, 1321–1326 (1972)
MathSciNet MATH Google Scholar
Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications. Springer, New York (2003)
MATH Google Scholar
Lakshmanan, K., Bhatnagar, S.: Smoothed functional and quasi-Newton algorithms for routing in multi-stage queueing network with constraints. In: International Conference on Distributed Computing and Internet Technology (ICDCIT), vol. 6536, pp. 175–186. LNCS (2011)
Pemantle, R.: Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18, 698–712 (1990)
Article MathSciNet MATH Google Scholar
Schweitzer, P.J.: Perturbation theory and finite Markov chains. J. Appl. Probab. 5, 401–413 (1968)
Article MathSciNet MATH Google Scholar
Spall, J.C.: Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Autom. Control 45, 1839–1853 (2000)
Article MathSciNet MATH Google Scholar
Sunehag, P., Trumpf, J., Vishwanathan, S.V.N., Schraudolph, N.N.: Variable metric stochastic approximation theory. In: Proceedings of 12th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 560–566 (2009)
Vazquez-Abad, F.J., Kushner, H.J.: Estimation of the derivative of a stationary measure with respect to a control parameter. J. Appl. Probab. 29, 343–352 (1992)
Article MathSciNet MATH Google Scholar
Xiao, X., Lionel, M.N.: Internet QoS: a big picture. IEEE Netw. 13, 8–18 (1999)
Article Google Scholar
Zhu, X., Spall, J.C.: A modified second-order SPSA optimization algorithm for finite samples. Int. J. Adapt. Control. 16, 397–409 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita School of Engineering, Bangalore, 560 035, India
K. Lakshmanan
Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560 012, India
Shalabh Bhatnagar

Authors

K. Lakshmanan
View author publications
You can also search for this author in PubMed Google Scholar
Shalabh Bhatnagar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Lakshmanan.

Appendix: Proofs of Sect. 4

Proof of Lemma 1

Note that $Q_l(n)$ is measurable $\mathcal {F}(n)$ for all $n\ge 0$. Further, it is easy to see that $E[Q_l(n+1)|\mathcal {F}(n)] = Q_l(n)$ a.s., $\forall n \ge 0$. We can show that for any real $a_n$ and $b_n$,

$$\begin{aligned} \bigg ( \sum _{n=1}^K (a_n - b_n) \bigg )^2&= \bigg ( \sum _{n=1}^K \frac{a_n-b_n}{\sqrt{a_n^2+b_n^2}} \sqrt{a_n^2+b_n^2} \bigg )^2\\&\le \sum _{n=1}^K \bigg (1-\frac{2 a_n b_n}{a_n^2 + b_n^2} \bigg ) \sum _{n=1}^K (a_n^2 + b_n^2) \le 2K \sum _{n=1}^K(a_n^2+ b_n^2), \end{aligned}$$

where first we have used the Cauchy-Schwarz inequality and the second inequality follows since $-\frac{2a_nb_n}{a_n^2+b_n^2} \le 1$. Hence we have

$$\begin{aligned} E[Q_l^2(n)]&\le \frac{2n}{\beta ^2} \sum _{m=1}^n b^2(m) E \bigg ( \eta _{l}^2(m) \Big ( h(X^{'}_{m}) - h(X_m) \Big )^2 \\&\quad + E^2 \Big ( \eta _l(m) \big ( h(X^{'}_{m}) - h(X_m) \big ) \big | \mathcal {F}(m-1) \Big ) \bigg ). \end{aligned}$$

Here, we let $E^a(X)$ denote $(E(X))^a$. By conditional Jensen’s inequality, we have

$$\begin{aligned}&E^2 \Big ( \eta _l(m) \big ( h(X^{'}_{m}) - h(X_m) \big ) \big | \mathcal {F}(m-1) \Big ) \\&\quad \le E \Big ( \eta _l^2(m) \big ( h(X^{'}_{m}) - h(X_m) \big )^2 \big | \mathcal {F}(m-1) \Big ). \end{aligned}$$

Hence by the Cauchy-Schwarz inequality

$$\begin{aligned} E[Q_l^2(n)]&\le \frac{4n}{\beta ^2} \sum _{m=1}^n b^2(m) E \bigg ( \eta _{l}^2(m) \Big ( h(X^{'}_{m}) - h(X_m) \Big )^2 \bigg ) \\&\le \frac{4n}{\beta ^2} \sum _{m=1}^n b^2(m) E^{1/2} \Big ( \eta _{l}^4(m) \Big ) E^{1/2} \Big ( \big ( h(X^{'}_{m}) - h(X_m) \big )^4 \Big ). \end{aligned}$$

Since h(.) is a Lipschitz continuous function, we have $|h(X^{'}_m) - h(X_m)|^4 \le K ||X^{'}_m - X_m ||^4$, for some constant $K > 0$. Hence, $E^{1/2}[(h(X^{'}_m) - h(X_m))^4] \le \sqrt{K} E^{1/2} || X^{'}_m - X_m ||^4$. As a consequence of Assumption 3, $\sup _m E[ || X_m - X^{'}_m ||^4] < \infty $ [4]. Thus, $E[Q^2_{l}(n)] < \infty $, for all $n \ge 1$, i.e., $Q_l(n)$ are square-integrable and hence also integrable random variables. Thus $(Q_l(n),\mathcal {F}(n)), n\ge 0$ is a square-integrable martingale sequence. We now show that its quadratic variation process is convergent. Thus, note that

$$\begin{aligned}&\sum _n E[(Q_l(n+1) - Q_l(n) )^2 | \mathcal {F}(n)] \\&= \sum _n E \Big [ \Big ( b(n+1) \Big ( \frac{\eta _l(n+1)}{\beta } \Big ( h(X^{'}_{n+1}) - h(X_{n+1}) \Big ) \\&\quad -E \Big ( \frac{\eta _l(n+1)}{\beta } \big ( h(X^{'}_{n+1}) - h(X_{n+1}) \big ) \big | \mathcal {F}(n) \Big ) \Big ) \Big )^2 \Big | \mathcal {F}(n) \Big ] \\&\le \sum _n b^2(n+1) \bigg ( E \Big [ \frac{\eta _l^2(n+1)}{\beta ^2} \big ( h(X^{'}_{n+1}) - h(X_{n+1}) \big )^2 \big | \mathcal {F}(n) \Big ]\\&\quad + E \Big [ E^2 \big [ \frac{\eta _l(n+1)}{\beta } \big ( h(X^{'}_{n+1}) - h(X_{n+1}) \big ) \big | \mathcal {F}(n) \big ] \big | \mathcal {F}(n) \Big ] \bigg ) \\&\le \sum _n 2b^2(n+1) \bigg ( E \Big [ \frac{\eta _l^2(n+1)}{\beta ^2} \big ( h(X^{'}_{n+1}) - h(X_{n+1}) \big )^2 \big | \mathcal {F}(n) \Big ] \bigg ) \end{aligned}$$

where the second inequality follows by another application of conditional Jensen’s inequality. It can now be seen as before using an application of the Cauchy-Schwarz inequality as well as Assumptions 2 and 3 that

$$\begin{aligned} \sup _n \frac{1}{\beta ^2} E \big [ \eta _l^2(n+1) (h(X^{'}_{n+1}) - h(X_{n+1}))^2 \big | \mathcal {F}(n) \big ] < \infty \text{ w.p. } \text{1. } \end{aligned}$$

Now from Assumption 4, $\sum _n E[(Q_l(n+1) - Q_l(n) )^2 | \mathcal {F}(n)] < \infty $ a.s. Thus, the quadratic variation process of $\{Z_l(n)\}$ is almost surely convergent. Hence, by the martingale convergence theorem for square integrable martingales, $\{Q_l(n)\}$ are a.s. convergent martingale sequences. $\square $

Proof of Lemma 3

Note that (9) can be rewritten as

$$\begin{aligned} Z_l(n+1)&= Z_l(n) + b(n) \bigg ( E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \Big ) - Z_l(n) \bigg ) \nonumber \\&\quad + b(n) \bigg ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \nonumber \\&\quad - E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \bigg ). \end{aligned}$$

(34)

Hence we have

$$\begin{aligned}&\sum _n (Z_l(n+1) - Z_l(n))\nonumber \\&\quad = \sum _n b(n) \Big ( E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \Big ) - Z_l(n) \Big )\nonumber \\&\qquad \quad + \sum _n b(n) \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) - E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \Big ) \Big ). \end{aligned}$$

(35)

Note that as a consequence of Lemma 1, $\exists Q_l(\infty )<\infty $ a.s. such that $Q_l(n) \rightarrow Q_l(\infty )$ a.s. as $n\rightarrow \infty $. Now from the definition of $Q_l(n)$, it is clear that

$$\begin{aligned} Q_l(\infty ) =&\sum _{m=1}^\infty b(m) \bigg ( \frac{\eta _l(m)}{\beta } \Big ( h(X^{'}_{m}) - h(X_m) \Big ) \\&- E \Big ( \frac{\eta _l(m)}{\beta } \big ( h(X^{'}_{m}) - h(X_m) \big ) \big | \mathcal {F}(m-1) \Big ) \bigg ) < \infty \text{ a.s. } \end{aligned}$$

Note that the second term in the RHS of (35) is precisely $Q_l(\infty )<\infty $. As a consequence of the above, it is sufficient to show the boundedness of the following recursion in place of (34):

$$\begin{aligned} {\bar{Z}}_l(n+1) = {\bar{Z}}_l(n) + b(n) \bigg ( E \Big ( \frac{\eta _l(n)}{\beta } \Big ( h \big ( X'_n \big ) - h \big ( X_n \big ) \Big ) \big | \mathcal {F}(n-1) \Big ) - {\bar{Z}}_l(n) \bigg ),\nonumber \\ \end{aligned}$$

(36)

with ${\bar{Z}}(0) = Z(0)$.

As in the proof of Lemma 1 it can be seen that

$$\begin{aligned} \sup _n E \left[ \frac{\eta _l(n)}{\beta } \left( h \left( X'_n \right) - h \left( X_n \right) \right) \big | \mathcal {F}(n-1) \right] < \infty \end{aligned}$$

with probability 1. Now since $b(n) \rightarrow 0$ as $n \rightarrow \infty $, there exists a $p_0$ such that for all $n \ge p_0, \, 0 \le b(n) \le 1$. Hence for all $n \ge p_0, \, {\bar{Z}}(n+1)$ is a convex combination of ${\bar{Z}}(n)$ and a quantity that is almost surely uniformly bounded. The claim follows^{Footnote 1}.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lakshmanan, K., Bhatnagar, S. Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization. Comput Optim Appl 66, 533–556 (2017). https://doi.org/10.1007/s10589-016-9875-4

Download citation

Received: 30 April 2014
Published: 15 September 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s10589-016-9875-4

Keywords

Mathematics Subject Classification

62L20

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The smoothed finite element method (S-FEM): A framework for the design of numerical models for desired solutions

CSG: A new stochastic gradient method for the efficient solution of structural optimization problems with infinitely many states

Solving Non-smooth Dynamic Problems Using the Alternating Direction Method of Multipliers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of Sect. 4

Appendix: Proofs of Sect. 4

Proof of Lemma 1

Proof of Lemma 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now