Abstract
We propose a proximal-gradient algorithm with penalization terms and inertial and memory effects for minimizing the sum of a proper, convex, and lower semicontinuous and a convex differentiable function subject to the set of minimizers of another convex differentiable function. We show that, under suitable choices for the step sizes and the penalization parameters, the generated iterates weakly converge to an optimal solution of the addressed bilevel optimization problem, while the objective function values converge to its optimal objective value.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and Preliminaries
Let \({\mathcal H}\) be a real Hilbert space with inner product 〈⋅,⋅〉 and associated norm ∥⋅∥. Let \(f:{\mathcal H}\rightarrow \overline {\mathbb R}=\mathbb {R}\cup \{\pm \infty \}\) be a proper, convex, and lower semicontinuous function, and \(h:{\mathcal H}\to \mathbb {R}\) and \(g:{\mathcal H}\to \mathbb {R}\) be convex and differentiable functions with Lipschitz continuous gradients with positive Lipschitz constants L h and L g , respectively. We consider the bilevel optimization problem
and assume that the set
is nonempty. We also assume without loss of generality that ming = 0.
The work of Attouch and Czarnecki [5] has represented the starting point of a series of articles approaching the minimization of a smooth or (sometimes) a complexly structured nonsmooth objective function subject to the set of minimzers of another function, from either a discrete perspective through iterative numerical algorithms or a continuous one through dynamical systems (see [4,5,6,7,8, 11, 17,18,19, 24,25,26,27, 30, 37, 39]). The function determining the feasible set is evaluated in both settings in the spirit of penalty methods and contributes to the convergence of the generated sequences, in the discrete setting, and to the asymptotic convergence of the generated trajectories, in the continuous setting, to an optimal solution of the underlying bilevel optimization problem. We emphasize in particular the proximal-gradient algorithm with penalty term which has been introduced in [8] and for which weak ergodic convergence has been proved.
In this paper, we consider this algorithm in the context of solving problem (1) and we enhance it with inertial and memory effects. Our aim is to provide suitable choices for the step sizes and the penalization parameters, such that the generated iterates weakly converge to an optimal solution of the problem (1), while the objective function values converge to its optimal objective value. Algorithms of inertial type follow by time discretization of differential inclusions of second-order type (see [1, 3]) and have been first investigated in the context of the minimization of a differentiable function by Polyak in [40] and Bertsekas in [14]. In the last two decades, intensive research efforts dedicated to algorithms of inertial type and to their convergence behavior can be noticed (see [1,2,3, 10, 20,21,22,23, 28, 29, 31,32,33,34,35,36, 38]). For a variety of situations, in particular in the context of solving real-world problems, the presence of inertial terms improves the convergence behavior of the generated sequences. It is also well-known (see [9, 13]) that enhancing the proximal-gradient algorithm with inertial effects may lead to a considerable improvement of the convergence behavior of the sequence of objective function values.
The proximal-gradient algorithm with penalization terms and inertial and memory effects we propose for solving (1) is the following.
Algorithm 1
Initialization: Choose positive sequences \(\{\lambda _{n}\}_{n=1}^{\infty }\), \(\{\beta _{n}\}_{n=1}^{\infty }\), and a nonnegative constant α ∈ [0, 1). Take arbitrary \(x_{0},x_{1}\in {\mathcal H}\). Iterative step: For every n ≥ 1 and given current iterates \(x_{n-1}, x_{n}\in {\mathcal H}\) define \(x_{n+1} \in {\mathcal H}\) by
For \(x \in {\mathcal H}\) we denote by \(\text {prox}_{\lambda _{n} f}(x)\) the proximal point of the function f of parameter λ n at x, which is the unique optimal solution of the optimization problem
In Algorithm 1, \(\{\lambda _{n}\}_{n=1}^{\infty }\) denotes the sequence of step sizes, \(\{\beta _{n}\}_{n=1}^{\infty }\) the sequence of penalization parameters, and α ∈ [0, 1) the parameter that controls the inertial terms.
The proposed numerical scheme recovers, when α = 0, the algorithm investigated in [37] and, under the additional assumption f = 0, the gradient method of penalty type from [39]. In the case f = 0, Algorithm 1 gives rise to the gradient method of penalty type with inertial and memory effects introduced and studied in [30].
We prove weak convergence for the generated iterates to an optimal solution of (1), by making use of generalized Fejér monotonicity techniques and of the Opial lemma. The performed analysis allows us also to show the convergence of the objective function values to the optimal objective value of (1).
In the remaining of this section we recall some elements of convex analysis. For a function \(f:{\mathcal H}\rightarrow \overline {\mathbb {R}}\) we denote by \(\text {dom}\,f=\{x\in {\mathcal H}:f(x)<+\infty \}\) its effective domain and say that f is proper, if dom f≠∅ and f(x)≠ −∞ for all \(x\in {\mathcal H}\). Let \(f^{\ast }:{\mathcal H} \rightarrow \overline {\mathbb R}\), \(f^{\ast }(u)=\sup _{x\in {\mathcal H}}\{\langle u,x\rangle -f(x)\}\) for all \(u\in {\mathcal H}\), be the conjugate function of f. The subdifferential of f at \(x\in {\mathcal H}\), with \(f(x)\in \mathbb {R}\), is the set \(\partial f(x):=\{v\in {\mathcal H}:f(y)\geq f(x)+\langle v,y-x\rangle ~ \forall y\in {\mathcal H}\}\). We take by convention ∂ f(x) := ∅, if f(x) ∈{±∞}. We also denote by \(\min f := \inf _{x \in {\mathcal H}} f(x)\) the optimal objective value of the function f and by \(\arg \min f :=\{x \in {\mathcal H}: f(x) = \min f\}\) its set of global minima.
A convex and differentiable function \(g:{\mathcal H}\to \mathbb {R}\) has a Lipschitz continuous gradient with Lipschitz constant L g > 0, if ∥∇g(x) −∇g(y)∥≤ L g ∥x − y∥ for all \(x,y \in {\mathcal H}\). It is well-known (see, for instance, [12, Theorem 18.15]) that this is equivalent to ∇g is \(\frac {1}{L_{g}}\)-cocoercive, namely, \(\langle x-y, \nabla g(x) - \nabla g(y) \rangle \geq \frac {1}{L_{g}} \|\nabla g(x) - \nabla g(y)\|^{2}\) for all \(x,y \in {\mathcal H}\).
Let \(M\subseteq {\mathcal H}\) be a nonempty set. The indicator function of M, \(\delta _{M}:{\mathcal H}\rightarrow \overline {\mathbb {R}}\), is the function which takes the value 0 on M and + ∞ otherwise. The subdifferential of the indicator function is the normal cone of M, that is \(N_{M}(x)=\{u\in {\mathcal H}:\langle u,y-x\rangle \leq 0~ \forall y\in M\}\), if x ∈ M and N M (x) = ∅ for x∉M. For x ∈ M, we have that u ∈ N M (x) if and only if σ M (u) = 〈u,x〉, where σ M is the support function of M, defined by \(\sigma _{M} : {\mathcal H} \rightarrow \overline {\mathbb {R}}, \sigma _{M}(u)=\sup _{y\in M}\langle y,u\rangle \). Finally, ran(N M ) denotes the range of the normal cone N M , that is, p ∈ran(N M ) if and only if there exists x ∈ M, such that p ∈ N M (x).
2 Technical Lemmas
The setting in which we will carry out the convergence analysis for Algorithm 1 is settled by the following hypotheses.
Assumption 1
-
(I)
The subdifferential sum formula ∂(f + δ arg min g ) = ∂ f + N arg min g holds;
-
(II)
The objective function f + h is bounded from below;
-
(III)
There exist positive constants η 0,a,b,K and c > 1, such that for every n ≥ 1:
$$0<a\leq \lambda_{n}\beta_{n}\leq b<\frac{2}{L_{g}(1+\eta_{0})^{2}}, \frac{L_{h}+\beta_{n}L_{g}}{2}+\frac{\alpha-1}{\lambda_{n}}\leq -(1+2\alpha)K-c $$and
$$\beta_{n+1}-\beta_{n}\leq K\frac{\eta_{0}}{1+\eta_{0}}\lambda_{n+1}\beta_{n+1}; $$ -
(IV)
\(\{\lambda _{n}\}_{n=1}^{\infty }\in \ell ^{2}\setminus \ell ^{1}\) and \(\left (\frac {1}{\lambda _{n+1}}-\frac {1}{\lambda _{n}}\right )\alpha \leq 2\) for every n ≥ 1;
-
(V)
\({\sum }_{n=1}^{\infty }\lambda _{n}\beta _{n}\left [g^{\ast }\left (\frac {p}{\beta _{n}}\right )-\sigma _{\arg \min g}\left (\frac {p}{\beta _{n}}\right )\right ]\!\!<\!+\infty \) for every p ∈ran(N arg min g ).
Remark 1
For conditions which guarantee exact convex subdifferential sum formulas, we refer to [12, 15, 16, 41]. One of these conditions, which is frequently fulfilled in applications, asks for the continuity of the function f and thus does not require any knowledge of the set of minimizers of g.
The assumption in (V) originates from the work of Attouch and Czarnecki [5]; we refer to [4,5,6,7,8, 11, 17,18,19, 24,25,26,27, 30, 37, 39] for other variants, generalizations to monotone operators, and concrete examples for which this condition is satisfied (see also Remark 2).
The aim of the following three results is to derive a generalized Fejér-type inequality in the spirit of the one in the hypotheses of Lemma 4. This will be achieved in terms of the sequence (Γ n ) n≥1, defined before Lemma 3, and which can be seen as a Lyapunov sequence equal to the sum of the objective function and a penalization of the function g both at the current iterate, and the distance of the current iterate to a fixed optimal solution.
Lemma 1
Let \(u\in \mathcal {S}\). According to the first-order optimality conditions, there exist v ∈ ∂ f(u)and p ∈ N arg min g (u), such that 0 = v + ∇h(u) + p. Set φ n := ∥x n − u∥2 for every n ≥ 1. Then, for every n ≥ 1and η > 0, it holds that
Proof
Set y n := x n + α(x n − x n−1) for every n ≥ 1. Since y n − x n+1 − λ n ∇h(x n ) − λ n β n ∇g(x n ) ∈ λ n ∂ f(x n+1) and v ∈ ∂ f(u), the monotonicity of ∂ f guarantees that
or, equivalently,
We notice that for every n ≥ 0
and so for every n ≥ 1
By employing (4) and (5) in inequality (3), we obtain for every n ≥ 1
Next, we evaluate the first two terms on the right-hand side in the above statement. Since ∇g is \(\frac {1}{L_{g}}\)-cocoercive, we have
and from here, since ∇g(u) = 0,
On the other hand, since g is convex and differentiable, we have
or, equivalently,
From (7) and (8) we obtain for all n ≥ 1
For the term 2λ n β n 〈x n − x n+1,∇g(x n )〉 in (6) we have for all n ≥ 1 the following estimate
Employing the inequalities (9) and (10) in (6), we obtain for every n ≥ 1 that
and further
Not least,
and by employing this estimate in (11), we deduce that for every n ≥ 1
By using the \(\frac {1}{L_{h}}\)-cocoercivity of ∇h we obtain for every n ≥ 1 that
while, since p ∈ N arg min g (u), it holds that
By combining these two inequalities with (12), it follows for every n ≥ 1 that
Since α ∈ [0, 1) and η > 0, it holds that \(\frac {1}{1+\eta }+\frac {\eta }{2(1+\eta )}+\alpha -1<1\), which together with the inequality above lead to the conclusion. □
For simplicity, we will make use of the following notation:
Lemma 2
For every n ≥ 1it holds that
Proof
Recall that for every n ≥ 1 we have \(\frac {y_{n}-x_{n+1}}{\lambda _{n}}-\nabla h(x_{n})-\beta _{n}g(x_{n})\in \partial f(x_{n+1})\), which implies
From here, it follows that for every n ≥ 1 we have
From the Descent Lemma (see for example [12, Theorem 18.15]), we obtain for every n ≥ 1 that
and
By combining these relations with inequality (14), we finally obtain the inequality in the statement of the lemma. □
For the forthcoming statements, we fix an element \(u\in \mathcal {S}\). For a simpler formulation of these results, we will use the following notation:
Lemma 3
Let \(u\in \mathcal {S}\). According to the first-order optimality conditions, there exist v ∈ ∂ f(u)and p ∈ N arg min g (u),such that 0 = v + ∇h(u) + p.Then, for every n ≥ 2, it holds that
Proof
We write (2) for η := η 0, multiply it by K, and after combining the resulting inequality with (13), we obtain for every n ≥ 1 that
In view of Assumption 1(III), we deduce that
and further
In order to obtain (15), we only have to add α(Ω n−1(x n−1) −Ω n (x n )) and \(\frac {\alpha K\eta _{0}}{1+\eta _{0}}(\lambda _{n}\beta _{n}g(x_{n})-\lambda _{n-1}\beta _{n-1}g(x_{n-1}))\) to both sides of the above inequality. □
The following results is a very useful tool in the convergence analysis of inertial algorithms (see [1, 2, 20]).
Lemma 4
Let \(\{a_{n}\}_{n=0}^{\infty }\) , \(\{b_{n}\}_{n=1}^{\infty }\) and \(\{c_{n}\}_{n=1}^{\infty }\) be real sequences and α ∈ [0, 1)be a given real number. Assume that \(\{a_{n}\}_{n=1}^{\infty }\) is bounded from below, \(\{b_{n}\}_{n=1}^{\infty }\) is nonnegative and \({\sum }_{n=1}^{\infty } c_{n}<+\infty ,\) such that
Then the following statements hold:
-
(i)
\({\sum }_{n=1}^{\infty }[a_{n}-a_{n-1}]_{+}<+\infty \), where [t]+ := max{t,0};
-
(ii)
\(\{a_{n}\}_{n=1}^{\infty }\) converges and \({\sum }_{n=1}^{\infty } b_{n}<+\infty \).
The results presented in Lemma 5 related to the convergence of the generated iterates and in Lemma 6 related to the convergence of the objective function values will be used in the next section in the proof of the main theorem in combination with the Opial lemma stated in Lemma 7.
Lemma 5
Let \(u\in \mathcal {S}\). According to the first-order optimality conditions there exist v ∈ ∂ f(u)and p ∈ N arg min g (u), such that 0 = v + ∇h(u) + p.Then the following statements are true:
-
(i)
The sequence \(\{{\Gamma }_{n}\}_{n=1}^{+\infty }\) is bounded from below;
-
(ii)
\({\sum }_{n=1}^{\infty }\|x_{n+1}-x_{n}\|^{2}<+\infty \) ;
-
(iii)
\(\lim _{n\to +\infty }{\Gamma }_{n}\) exists and \({\sum }_{n=1}^{+\infty }\lambda _{n}\beta _{n}\|\nabla g(x_{n})\|^{2}<+\infty \) ;
-
(iv)
\(\lim _{n\to +\infty }\|x_{n}-u\|\) exists, \({\sum }_{n=1}^{\infty }[\varphi _{n}-\varphi _{n-1}]_{+}\!<+\infty \) and \({\sum }_{n=1}^{\infty }\lambda _{n}\beta _{n}g(x_{n})\!<+\infty \) ;
-
(v)
\(\lim _{n\to +\infty }{\Omega }_{n}(x_{n})\) exists;
-
(vi)
\(\lim _{n\to +\infty }g(x_{n})=0\) and every sequential weak cluster point of \(\{x_{n}\}_{n=1}^{\infty }\) lies in arg ming.
Proof
(i) According to Assumption 1(III) we have that
which implies that 1 ≥ K λ n and further
By using the definition of Γ n and Assumption 1(II), we easily derive that \(\{{\Gamma }_{n}\}_{n=1}^{\infty }\) is bounded from below.
(ii) For every n ≥ 2, we set
and
For a fixed natural number N 0 ≥ 2, it holds that
Since f + h is bounded from below and relation (16) is true, we obtain that \(\{\omega _{n}\}_{n=1}^{\infty }\) is summable.
For every n ≥ 1, we set
Consequently, according Assumption 1(III), it follows that
Further, for every n ≥ 1, it holds
which, together with Assumption 1(IV), implies that
and so
On the other hand, by Assumption 1(III), we also have for every n ≥ 1
and so
By employing the last inequality in Lemma 3 we obtain for every n ≥ 2
where for the last inequality we use (17) and (18). Since λ n → 0 as n → + ∞, there exists \(N_{1}\in \mathbb {N}\), such that for every n ≥ N 1, we have \(\frac {4(1+\eta _{0})}{\eta _{0}}\lambda _{n}-\frac {2}{L_{h}}<0\). This implies that for every n ≥ N 1
Summing up this inequality for n = N 1,…,N 2, where N 2 is a natural number with N 2 ≥ N 1, we obtain that
This means that \(\{\mu _{n}\}_{n=2}^{\infty }\) is bounded from above (we take into account that c > 1). Let M be a positive upper bound of \(\{\mu _{n}\}_{n=2}^{\infty }\). Observing that Γ n+1 − αΓ n ≤ μ n+1 ≤ M, thus Γ n+1 ≤ αΓ n + M for every n ≥ N 1, we obtain
Since \(\{{\Gamma }_{n}\}_{n=1}^{\infty }\) is bounded from below, there exists \(C\in \mathbb {R}\), such that
Thus, from the inequality (21), by taking into account that c > 1, we deduce that
(iii) From Lemma 3, by using (17), (19), and (20), we obtain
Since \(\{{\Gamma }_{n}\}_{n=1}^{\infty }\) is bounded from below, by using Lemma 4, it follows that (iii) is true.
(iv) The statement follows from Lemma 1 and Lemma 4.
(v) Thanks to (iii), (iv) and \({\Gamma }_{n}={\Omega }_{n}(x_{n})-K\frac {\eta _{0}}{1+\eta _{0}}\lambda _{n}\beta _{n} g(x_{n})+K\varphi _{n}\) for every n ≥ 1, we obtain that \(\lim _{n\to +\infty }{\Omega }_{n}(x_{n})\) exists.
(vi) Since λ n β n ≥ a > 0 for every n ≥ 1, we have \({\sum }_{n=1}^{+\infty }g(x_{n})<+\infty \) and so \(\lim _{n\to +\infty }g(x_{n})=0\).
Finally, let \(y\in {\mathcal H}\) be a sequential weak cluster point of \(\{x_{n}\}_{n=1}^{\infty }\) and \(\{x_{n_{j}}\}_{j=1}^{\infty }\) be a subsequence of \(\{x_{n}\}_{n=1}^{\infty }\), such that \(x_{n_{j}}\) weakly converges to y as j → + ∞. Since g is weakly lower semicontinuous, we obtain that
which means that y ∈ arg min g. □
Lemma 6
Let \(u\in \mathcal {S}\) . Then, we have
Proof
For every n ≥ 1, we have that
and so
According to the Descent Lemma we have for every n ≥ 1
and
which give rise to the following estimate
Further, we notice that for every n ≥ 1
or, equivalently,
On the other hand, since g(u) = 0, we have for every n ≥ 1
or, equivalently,
Similarly, we have for every n ≥ 1
which implies that
By summing up the inequalities in (23)–(25), we obtain for every n ≥ 1
Not least, according to (4) and (5), we obtain for every n ≥ 1
which, combined with (26) and Lemma 5(iv), implies that
The conclusion follows by taking into account (22). □
3 Convergence of the Iterates and of the Objective Function Values
In this section, we will prove the main result of this paper. This addresses the convergence of both the sequence of iterates \(\{x_{n}\}_{n=1}^{\infty }\) generated by Algorithm 1 and of the sequence of objective values \(\{f(x_{n})+h(x_{n})\}_{n=1}^{\infty }\).
The Opial lemma, which we state next and for which we refer to [12, Lemma 2.39], will play a crucial role in the convergence analysis.
Lemma 7
Let \({\mathcal H}\) be a real Hilbert space, \(C\subseteq {\mathcal H}\) a nonempty set and \(\{x_{n}\}_{n=1}^{\infty }\) a given sequence, such that:
-
(i)
For every z ∈ C, \(\lim _{n\to +\infty }\|x_{n}-z\|\) exists.
-
(ii)
Every sequential weak cluster point of \(\{x_{n}\}_{n=1}^{\infty }\) lies in C.
Then, the sequence \(\{x_{n}\}_{n=1}^{\infty }\) converges weakly to a point in C.
Theorem 1
Let \(\{x_{n}\}_{n=1}^{\infty }\) be the sequence generated by Algorithm 1. Then:
-
(i)
the sequence \(\{x_{n}\}_{n=1}^{\infty }\) converges weakly to a point in \(\mathcal {S}\) ;
-
(ii)
the sequence \(\{f(x_{n})+h(x_{n})\}_{n=1}^{\infty }\) converges to the optimal objective value of the optimization problem(1).
Proof
(i) We know that \(\lim _{n\to +\infty }\|x_{n}-u\|\) exists for all \(u\in \mathcal {S}\) (see Lemma 5(iv)); hence, in view of the Opial lemma, it is sufficient to show that all sequential weak cluster points of \(\{x_{n}\}_{n=1}^{\infty }\) are in \(\mathcal {S}\). Since \(\{\lambda _{n}\}_{n=1}^{\infty }\notin \ell ^{1}\) and \(\lim _{n\to +\infty }{\Omega }_{n}(x_{n})\) exists, from Lemma 6, we obtain that
Let \(x^{\ast }\in {\mathcal H}\) be a sequential weak cluster point of \(\{x_{n}\}_{n=1}^{\infty }\) and \(\{x_{n_{k}}\}_{k=1}^{\infty }\) be a subsequence of \(\{x_{n}\}_{n=1}^{\infty }\), such that \(x_{n_{k}}\) converges weakly to x ∗ as k → + ∞. From here, by Lemma 5(vi), we obtain that x ∗∈ arg min g. Take an arbitrary \(u\in \mathcal {S}\). The weak lower semicontinuity of f and h implies that
which means that \(x^{\ast }\in \mathcal {S}\).
(ii) The statement is a direct consequence of the above inequalities. □
We close the paper by a remark which discusses the fulfillment of the conditions stated in Assumption 1.
Remark 2
We chose
and
We set
and
for every n ≥ 1.
-
(i)
By taking \(a:=(1-\alpha )\gamma -\frac {1}{L_{g}(1+\eta _{0})^{2}}\) and b > 0, such that \(a<b<\frac {2}{L_{g}(1+\eta _{0})^{2}}\), we have
$$0<a=\lambda_{n}\beta_{n}< b<\frac{2}{L_{g}(1+\eta_{0})^{2}} \quad \forall n\geq1. $$ -
(ii)
Since \(\beta _{n}\geq \frac {\gamma [L_{h}+2((1+2\alpha )K+c)]}{2-\gamma L_{g}}\), we have \(\frac {L_{h}+\beta _{n} L_{g}}{2}-\frac {\beta _{n}}{\gamma }\leq -(1+2\alpha )K-c\) for every n ≥ 1. On the other hand, since \(\frac {\beta _{n}}{\gamma }\leq \frac {(1-\alpha )}{\lambda _{n}}\), we have that \(\frac {L_{h}+\beta _{n}L_{g}}{2}+\frac {\alpha -1}{\lambda _{n}}\leq -(1+2\alpha )K-c\) for every n ≥ 1.
-
(iii)
For every n ≥ 1, we also have
$$\begin{array}{@{}rcl@{}} \beta_{n+1}-\beta_{n} &=&\left[\frac{(1-\alpha)\gamma L_{g}(1+\eta_{0})^{2}-1}{L_{g}(1+\eta_{0})^{2}}\right]\frac{K\eta_{0}}{1+\eta_{0}}\left( (n+1)^{q}-n^{q}\right)\\ &\leq&\left[\frac{(1-\alpha)\gamma L_{g}(1+\eta_{0})^{2}-1}{L_{g}(1+\eta_{0})^{2}}\right]\frac{K\eta_{0}}{1+\eta_{0}}\\ &=&\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n+1}\beta_{n+1}. \end{array} $$ -
(iv)
From (iii), it follows that for every n ≥ 1
$$\frac{1}{\lambda_{n+1}}-\frac{1}{\lambda_{n}} = \left( \beta_{n+1}-\beta_{n}\right)\left[\frac{L_{g}(1+\eta_{0})^{2}}{(1-\alpha)\gamma L_{g}(1+\eta_{0})^{2}-1}\right]\leq\frac{K\eta_{0}}{1+\eta_{0}}=\frac{2}{\alpha}. $$ -
(v)
Due to the fact that \(q\in \left (\frac {1}{2},1\right )\), we have \({\sum }_{n=1}^{+\infty }\frac {1}{\beta _{n}}=+\infty \) and \({\sum }_{n=1}^{+\infty }\frac {1}{{\beta _{n}^{2}}}<+\infty \). Consequently, \(\{\lambda _{n}\}_{n=1}^{\infty }\in \ell ^{2}\setminus \ell ^{1}\).
-
(vi)
Since g ≤ δ arg min g , it holds that g ∗≥ (δ arg min g )∗ = σ arg min g and so g ∗− σ arg min g ≥ 0. For a function \(g: {\mathcal H} \rightarrow \mathbb {R}\) fulfilling \(g \geq \frac {a}{2}\text {dist}^{2}(\cdot ,\arg \min g)\) where a > 0, it holds that \(g^{\ast }(x)-\sigma _{\arg \min g}(x)\leq \frac {1}{2a}\|x\|^{2}\) for every \(x \in {\mathcal H}\). Thus, for every n ≥ 1,
$$\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{p}{\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{p}{\beta_{n}}\right)\right]\leq \frac{\lambda_{n}}{2a\beta_{n}}\|p\|^{2} \quad \forall p\in\text{ran}(N_{\arg\min g}). $$Since \({\sum }_{n=1}^{\infty }\frac {1}{{\beta _{n}^{2}}}<+\infty \), from here it follows that
$$\sum\limits_{n=1}^{\infty}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{p}{\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{p}{\beta_{n}}\right)\right]<+\infty. $$
4 Conclusions
We investigate the weak (non-ergodic) convergence of an inertial proximal-gradient method with penalization terms in connection with the solving of a bilevel optimization problem, having as objective the sum of a convex nonsmooth with a convex smooth function, and as constrained set the set of minimizers of another convex and differentiable function. The techniques of the proof combine tools specific to inertial algorithms [3] and penalty type methods [5, 8]. We show the convergence of both generated iterates and objective function values.
References
Alvarez, F.: On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J. Control. Optim. 38, 1102–1119 (2000)
Alvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14, 773–782 (2004)
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9, 3–11 (2001)
Attouch, H., Cabot, A., Czarnecki, M.-O.: Asymptotic behavior of nonautonomous monotone and subgradient evolution equations. Trans. Am. Math. Soc. http://dx.doi.org/https://doi.org/10.1090/tran/6965, arXiv:1601.00767 (2016)
Attouch, H., Czarnecki, M.-O.: Asymptotic behavior of coupled dynamical systems with multiscale aspects. J. Differ. Equ. 248, 1315–1344 (2010)
Attouch, H., Czarnecki, M.-O.: Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J. Differ. Equ. 262, 2745–2770 (2017)
Attouch, H., Czarnecki, M.-O., Peypouquet, J.: Prox-penalization and splitting methods for constrained variational problems. SIAM J. Optim. 21, 149–173 (2011)
Attouch, H., Czarnecki, M.-O., Peypouquet, J.: Coupling forward-backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J. Optim. 21, 1251–1274 (2011)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k 2. SIAM J. Optim. 26, 1824–1834 (2016)
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Optim. 24, 232–256 (2014)
Banert, S., Boţ, R.I.: Backward penalty schemes for monotone inclusion problems. J. Optim. Theory Appl. 166, 930–948 (2015)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Cambridge (1999)
Borwein, J.M., Vanderwerff, J.D.: Convex Functions: Constructions, Characterizations and Counterexamples. Cambridge University Press, Cambridge (2010)
Boţ, R.I.: Conjugate Duality in Convex Optimization. Lecture Notes in Economics and Mathematical Systems, vol. 637. Springer, Berlin (2010)
Boţ, R. I., Csetnek, E.R.: Forward-backward and Tseng’s type penalty schemes for monotone inclusion problems. Set-valued Var. Anal. 22, 313–331 (2014)
Boţ, R.I., Csetnek, E.R.: A Tseng’s type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J. Math. 42, 451–465 (2014)
Boţ, R.I., Csetnek, E.R.: Levenberg-Marquardt dynamics associated to variational inequalities. Set-Valued Var. Anal. (2017) https://doi.org/10.1007/s11228-017-0409-8, arXiv:1603.04460 (2016)
Boţ, R.I., Csetnek, E.R.: An inertial forward-backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algorithms 71, 519–540 (2016)
Boţ, R.I., Csetnek, E.R.: An inertial alternating direction method of multipliers. Minimax Theory Appl. 1, 29–49 (2016)
Boţ, R.I., Csetnek, E.R.: A hybrid proximal-extragradient algorithm with inertial effects. Numer. Funct. Anal. Optim. 36, 951–963 (2015)
Boţ, R.I., Csetnek, E.R.: An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems. J. Optim. Theory Appl. 171, 600–616 (2016)
Boţ, R.I., Csetnek, E.R.: Approaching the solving of constrained variational inequalities via penalty term-based dynamical systems. J. Math. Anal. Appl. 435, 1688–1700 (2016)
Boţ, R.I., Csetnek, E.R.: Penalty schemes with inertial effects for monotone inclusion problems. Optimization 66, 965–982 (2017)
Boţ, R.I., Csetnek, E.R.: Second-order dynamical systems associated to variational inequalities. Appl. Anal. 96, 799–809 (2017)
Boţ, R.I., Csetnek, E.R.: A second order dynamical system with Hessian-driven damping and penalty term associated to variational inequalities. arXiv:1608.04137 (2016)
Boţ, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)
Boţ, R.I., Csetnek, E.R., László, S.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4, 3–25 (2016)
Boţ, R. I., Csetnek, E.R., Nimana, N.: Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data. Optim. Lett. https://doi.org/10.1007/s11590-017-1158-1 (2017)
Cabot, A., Frankel, P.: Asymptotics for some proximal-like method involving inertia and memory aspects. Set-valued Var. Anal. 19, 59–74 (2011)
Chen, C., Chan, R.H., Ma, S., Yang, J.: Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imag. Sci. 8, 2239–2267 (2015)
Chen, C., Ma, S., Yang, J.: A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J. Optim. 25, 2120–2142 (2015)
Maingé, P.-E.: Convergence theorems for inertial KM-type algorithms. J. Comput. Appl. Math. 219, 223–236 (2008)
Maingé, P.-E., Moudafi, A.: Convergence of new inertial proximal methods for DC programming. SIAM J. Optim. 19, 397–413 (2008)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155, 447–454 (2003)
Noun, N., Peypouquet, J.: Forward-backward penalty scheme for constrained convex minimization without inf-compactness. J. Optim. Theory Appl. 158, 787–795 (2013)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imag. Sci. 7, 1388–1419 (2014)
Peypouquet, J.: Coupling the gradient method with a general exterior penalization scheme for convex minimization. J. Optim. Theory Appl. 153, 123–138 (2012)
Polyak, B.T.: Introduction to Optimization. (Translated from the Russian) Translations Series in Mathematics and Engineering, Optimization Software, Inc. Publications Division, New York (1987)
Zalinescu, C.: Convex Analysis in General Vector Spaces. World Scientific, Singapore (2002)
Acknowledgements
Open access funding provided by Austrian Science Fund (FWF). The authors are thankful to two anonymous referees and the Guest Editor Regina Burachik for comments and remarks which improved the quality of the presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Dedicated to Professor Michel Théra’s 70th birthday.
The work of the first author was partially supported by by FWF (Austrian Science Fund), project I2419-N32. The work of the second author was supported by by FWF (Austrian Science Fund), project P29809-N32. Research was done during the two months’ stay of the third author in Spring 2016 at the Faculty of Mathematics of the University of Vienna. The third author is thankful to the Royal Golden Jubilee PhD Program for financial support.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Boţ, R., Csetnek, E. & Nimana, N. An Inertial Proximal-Gradient Penalization Scheme for Constrained Convex Optimization Problems. Vietnam J. Math. 46, 53–71 (2018). https://doi.org/10.1007/s10013-017-0256-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10013-017-0256-9