An Inertial Proximal-Gradient Penalization Scheme for Constrained Convex Optimization Problems

Boţ, Radu Ioan; Csetnek, Ernö Robert; Nimana, Nimit

doi:10.1007/s10013-017-0256-9

An Inertial Proximal-Gradient Penalization Scheme for Constrained Convex Optimization Problems

Open access
Published: 01 September 2017

Volume 46, pages 53–71, (2018)
Cite this article

Download PDF

You have full access to this open access article

Vietnam Journal of Mathematics Aims and scope Submit manuscript

An Inertial Proximal-Gradient Penalization Scheme for Constrained Convex Optimization Problems

Download PDF

Radu Ioan Boţ¹,
Ernö Robert Csetnek¹ &
Nimit Nimana²

2560 Accesses
9 Citations
Explore all metrics

Abstract

We propose a proximal-gradient algorithm with penalization terms and inertial and memory effects for minimizing the sum of a proper, convex, and lower semicontinuous and a convex differentiable function subject to the set of minimizers of another convex differentiable function. We show that, under suitable choices for the step sizes and the penalization parameters, the generated iterates weakly converge to an optimal solution of the addressed bilevel optimization problem, while the objective function values converge to its optimal objective value.

Proximal gradient method for convex multiobjective optimization problems without Lipschitz continuous gradients

Article 10 February 2025

A stochastic primal-dual method for a class of nonconvex constrained optimization

Article 24 June 2022

Proximal gradient methods for multiobjective optimization and their applications

Article 31 October 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction and Preliminaries

Let ${\mathcal H}$ be a real Hilbert space with inner product 〈⋅,⋅〉 and associated norm ∥⋅∥. Let $f:{\mathcal H}\rightarrow \overline {\mathbb R}=\mathbb {R}\cup \{\pm \infty \}$ be a proper, convex, and lower semicontinuous function, and $h:{\mathcal H}\to \mathbb {R}$ and $g:{\mathcal H}\to \mathbb {R}$ be convex and differentiable functions with Lipschitz continuous gradients with positive Lipschitz constants L _h and L _g, respectively. We consider the bilevel optimization problem

$$ \min_{x\in\arg\min g} f(x)+h(x) $$

(1)

and assume that the set

$$\mathcal{S}:=\arg\min\{f(x)+h(x):x\in\arg\min g\} $$

is nonempty. We also assume without loss of generality that ming = 0.

The work of Attouch and Czarnecki [5] has represented the starting point of a series of articles approaching the minimization of a smooth or (sometimes) a complexly structured nonsmooth objective function subject to the set of minimzers of another function, from either a discrete perspective through iterative numerical algorithms or a continuous one through dynamical systems (see [4,5,6,7,8, 11, 17,18,19, 24,25,26,27, 30, 37, 39]). The function determining the feasible set is evaluated in both settings in the spirit of penalty methods and contributes to the convergence of the generated sequences, in the discrete setting, and to the asymptotic convergence of the generated trajectories, in the continuous setting, to an optimal solution of the underlying bilevel optimization problem. We emphasize in particular the proximal-gradient algorithm with penalty term which has been introduced in [8] and for which weak ergodic convergence has been proved.

In this paper, we consider this algorithm in the context of solving problem (1) and we enhance it with inertial and memory effects. Our aim is to provide suitable choices for the step sizes and the penalization parameters, such that the generated iterates weakly converge to an optimal solution of the problem (1), while the objective function values converge to its optimal objective value. Algorithms of inertial type follow by time discretization of differential inclusions of second-order type (see [1, 3]) and have been first investigated in the context of the minimization of a differentiable function by Polyak in [40] and Bertsekas in [14]. In the last two decades, intensive research efforts dedicated to algorithms of inertial type and to their convergence behavior can be noticed (see [1,2,3, 10, 20,21,22,23, 28, 29, 31,32,33,34,35,36, 38]). For a variety of situations, in particular in the context of solving real-world problems, the presence of inertial terms improves the convergence behavior of the generated sequences. It is also well-known (see [9, 13]) that enhancing the proximal-gradient algorithm with inertial effects may lead to a considerable improvement of the convergence behavior of the sequence of objective function values.

The proximal-gradient algorithm with penalization terms and inertial and memory effects we propose for solving (1) is the following.

Algorithm 1

Initialization: Choose positive sequences $\{\lambda _{n}\}_{n=1}^{\infty }$, $\{\beta _{n}\}_{n=1}^{\infty }$, and a nonnegative constant α ∈ [0, 1). Take arbitrary $x_{0},x_{1}\in {\mathcal H}$. Iterative step: For every n ≥ 1 and given current iterates $x_{n-1}, x_{n}\in {\mathcal H}$ define $x_{n+1} \in {\mathcal H}$ by

$$x_{n+1}:=\text{prox}_{\lambda_{n}f}\left( x_{n}+\alpha(x_{n}-x_{n-1})-\lambda_{n}\nabla h(x_{n})-\lambda_{n}\beta_{n}\nabla g(x_{n})\right). $$

For $x \in {\mathcal H}$ we denote by $\text {prox}_{\lambda _{n} f}(x)$ the proximal point of the function f of parameter λ _n at x, which is the unique optimal solution of the optimization problem

$$ \inf_{y\in {\mathcal H}}\left\{f(y)+\frac{1}{2\lambda_{n}}\|y-x\|^{2}\right\}. $$

In Algorithm 1, $\{\lambda _{n}\}_{n=1}^{\infty }$ denotes the sequence of step sizes, $\{\beta _{n}\}_{n=1}^{\infty }$ the sequence of penalization parameters, and α ∈ [0, 1) the parameter that controls the inertial terms.

The proposed numerical scheme recovers, when α = 0, the algorithm investigated in [37] and, under the additional assumption f = 0, the gradient method of penalty type from [39]. In the case f = 0, Algorithm 1 gives rise to the gradient method of penalty type with inertial and memory effects introduced and studied in [30].

We prove weak convergence for the generated iterates to an optimal solution of (1), by making use of generalized Fejér monotonicity techniques and of the Opial lemma. The performed analysis allows us also to show the convergence of the objective function values to the optimal objective value of (1).

In the remaining of this section we recall some elements of convex analysis. For a function $f:{\mathcal H}\rightarrow \overline {\mathbb {R}}$ we denote by $\text {dom}\,f=\{x\in {\mathcal H}:f(x)<+\infty \}$ its effective domain and say that f is proper, if dom f≠∅ and f(x)≠ −∞ for all $x\in {\mathcal H}$. Let $f^{\ast }:{\mathcal H} \rightarrow \overline {\mathbb R}$, $f^{\ast }(u)=\sup _{x\in {\mathcal H}}\{\langle u,x\rangle -f(x)\}$ for all $u\in {\mathcal H}$, be the conjugate function of f. The subdifferential of f at $x\in {\mathcal H}$, with $f(x)\in \mathbb {R}$, is the set $\partial f(x):=\{v\in {\mathcal H}:f(y)\geq f(x)+\langle v,y-x\rangle ~ \forall y\in {\mathcal H}\}$. We take by convention ∂ f(x) := ∅, if f(x) ∈{±∞}. We also denote by $\min f := \inf _{x \in {\mathcal H}} f(x)$ the optimal objective value of the function f and by $\arg \min f :=\{x \in {\mathcal H}: f(x) = \min f\}$ its set of global minima.

A convex and differentiable function $g:{\mathcal H}\to \mathbb {R}$ has a Lipschitz continuous gradient with Lipschitz constant L _g > 0, if ∥∇g(x) −∇g(y)∥≤ L _g∥x − y∥ for all $x,y \in {\mathcal H}$. It is well-known (see, for instance, [12, Theorem 18.15]) that this is equivalent to ∇g is $\frac {1}{L_{g}}$-cocoercive, namely, $\langle x-y, \nabla g(x) - \nabla g(y) \rangle \geq \frac {1}{L_{g}} \|\nabla g(x) - \nabla g(y)\|^{2}$ for all $x,y \in {\mathcal H}$.

Let $M\subseteq {\mathcal H}$ be a nonempty set. The indicator function of M, $\delta _{M}:{\mathcal H}\rightarrow \overline {\mathbb {R}}$, is the function which takes the value 0 on M and + ∞ otherwise. The subdifferential of the indicator function is the normal cone of M, that is $N_{M}(x)=\{u\in {\mathcal H}:\langle u,y-x\rangle \leq 0~ \forall y\in M\}$, if x ∈ M and N _M(x) = ∅ for x∉M. For x ∈ M, we have that u ∈ N _M(x) if and only if σ _M(u) = 〈u,x〉, where σ _M is the support function of M, defined by $\sigma _{M} : {\mathcal H} \rightarrow \overline {\mathbb {R}}, \sigma _{M}(u)=\sup _{y\in M}\langle y,u\rangle $. Finally, ran(N _M) denotes the range of the normal cone N _M, that is, p ∈ran(N _M) if and only if there exists x ∈ M, such that p ∈ N _M(x).

2 Technical Lemmas

The setting in which we will carry out the convergence analysis for Algorithm 1 is settled by the following hypotheses.

Assumption 1

(I)
The subdifferential sum formula ∂(f + δ _{arg min g}) = ∂ f + N _{arg min g} holds;
(II)
The objective function f + h is bounded from below;
(III)
There exist positive constants η ₀,a,b,K and c > 1, such that for every n ≥ 1:
$$0<a\leq \lambda_{n}\beta_{n}\leq b<\frac{2}{L_{g}(1+\eta_{0})^{2}}, \frac{L_{h}+\beta_{n}L_{g}}{2}+\frac{\alpha-1}{\lambda_{n}}\leq -(1+2\alpha)K-c $$
and
$$\beta_{n+1}-\beta_{n}\leq K\frac{\eta_{0}}{1+\eta_{0}}\lambda_{n+1}\beta_{n+1}; $$
(IV)
$\{\lambda _{n}\}_{n=1}^{\infty }\in \ell ^{2}\setminus \ell ^{1}$ and $\left (\frac {1}{\lambda _{n+1}}-\frac {1}{\lambda _{n}}\right )\alpha \leq 2$ for every n ≥ 1;
(V)
${\sum }_{n=1}^{\infty }\lambda _{n}\beta _{n}\left [g^{\ast }\left (\frac {p}{\beta _{n}}\right )-\sigma _{\arg \min g}\left (\frac {p}{\beta _{n}}\right )\right ]\!\!<\!+\infty $ for every p ∈ran(N _{arg min g}).

Remark 1

For conditions which guarantee exact convex subdifferential sum formulas, we refer to [12, 15, 16, 41]. One of these conditions, which is frequently fulfilled in applications, asks for the continuity of the function f and thus does not require any knowledge of the set of minimizers of g.

The assumption in (V) originates from the work of Attouch and Czarnecki [5]; we refer to [4,5,6,7,8, 11, 17,18,19, 24,25,26,27, 30, 37, 39] for other variants, generalizations to monotone operators, and concrete examples for which this condition is satisfied (see also Remark 2).

The aim of the following three results is to derive a generalized Fejér-type inequality in the spirit of the one in the hypotheses of Lemma 4. This will be achieved in terms of the sequence (Γ_n)_n≥1, defined before Lemma 3, and which can be seen as a Lyapunov sequence equal to the sum of the objective function and a penalization of the function g both at the current iterate, and the distance of the current iterate to a fixed optimal solution.

Lemma 1

Let $u\in \mathcal {S}$. According to the first-order optimality conditions, there exist v ∈ ∂ f(u)and p ∈ N _{arg min g}(u), such that 0 = v + ∇h(u) + p. Set φ _n := ∥x _n − u∥² for every n ≥ 1. Then, for every n ≥ 1and η > 0, it holds that

$$\begin{array}{@{}rcl@{}} \varphi_{n+1}&-&\varphi_{n}-\alpha(\varphi_{n}-\varphi_{n-1}) +\lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta)}-(1+\eta)\lambda_{n}\beta_{n}\right]\|\nabla g(x_{n})\|^{2}\\ &+&\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n})\leq 2\alpha\|x_{n}-x_{n-1}\|^{2}+\|x_{n+1}-x_{n}\|^{2}\\ &&+\frac{\,4(1+\eta)}{\eta}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}\\ &&+\lambda_{n}\left[\frac{4(1+\eta)}{\eta}\lambda_{n}-\frac{2}{L_{h}}\right]\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right)\right]. \end{array} $$

(2)

Proof

Set y _n := x _n + α(x _n − x _n−1) for every n ≥ 1. Since y _n − x _n+1 − λ _n∇h(x _n) − λ _n β _n∇g(x _n) ∈ λ _n ∂ f(x _n+1) and v ∈ ∂ f(u), the monotonicity of ∂ f guarantees that

$$\langle y_{n}-x_{n+1}-\lambda_{n}\nabla h(x_{n})-\lambda_{n}\beta_{n}\nabla g(x_{n})-\lambda_{n}v,x_{n+1}-u\rangle\geq 0 \quad \forall n \geq 1 $$

or, equivalently,

$$ 2\langle y_{n}-x_{n+1},u-x_{n+1}\rangle\leq2\lambda_{n}\langle u-x_{n+1},\nabla h(x_{n})+\beta_{n}\nabla g(x_{n})+v\rangle \quad \forall n\geq1. $$

(3)

We notice that for every n ≥ 0

$$ 2\langle x_{n}-x_{n+1},u-x_{n+1} \rangle = \varphi_{n+1}-\varphi_{n}+\|x_{n+1}-x_{n}\|^{2} $$

(4)

and so for every n ≥ 1

$$\begin{array}{@{}rcl@{}} 2\alpha\langle x_{n}-x_{n-1},u-x_{n+1} \rangle &=& 2\alpha\langle x_{n}-x_{n-1},u-x_{n} \rangle + 2\alpha\langle x_{n}-x_{n-1},x_{n}-x_{n+1} \rangle\\ &\geq&\alpha(\|x_{n-1}-u\|^{2}-\|x_{n}-u\|^{2}-\|x_{n}-x_{n-1}\|^{2})\\ &&+\,\alpha(-\|x_{n}-x_{n-1}\|^{2}-\|x_{n+1}-x_{n}\|^{2})\\ &=&\alpha(\varphi_{n-1}-\varphi_{n})-2\alpha\|x_{n}-x_{n-1}\|^{2}\\ &&-\,\alpha\|x_{n+1}-x_{n}\|^{2}. \end{array} $$

(5)

By employing (4) and (5) in inequality (3), we obtain for every n ≥ 1

$$\begin{array}{@{}rcl@{}} \varphi_{n+1}-\varphi_{n} &-&\alpha(\varphi_{n}-\varphi_{n-1})+(1-\alpha)\|x_{n+1}-x_{n}\|^{2}-2\alpha\|x_{n}-x_{n-1}\|^{2}\\ &\leq& 2\lambda_{n}\langle u-x_{n+1},\nabla h(x_{n})+\beta_{n}\nabla g(x_{n})+v\rangle\\ &=& 2\lambda_{n}\langle u-x_{n+1},\beta_{n}\nabla g(x_{n})\rangle+2\lambda_{n}\langle u-x_{n+1},\nabla h(x_{n})+v\rangle\\ &=& 2\lambda_{n}\langle u-x_{n},\beta_{n}\nabla g(x_{n})\rangle+2\lambda_{n}\langle x_{n}-x_{n+1},\beta_{n}\nabla g(x_{n})\rangle\\ &&+\,2\lambda_{n}\langle u-x_{n+1},\nabla h(x_{n})+v\rangle. \end{array} $$

(6)

Next, we evaluate the first two terms on the right-hand side in the above statement. Since ∇g is $\frac {1}{L_{g}}$-cocoercive, we have

$$\langle\nabla g(x_{n})-\nabla g(u),x_{n}-u\rangle\geq\frac{1}{L_{g}}\|\nabla g(x_{n})-\nabla g(u)\|^{2} \quad \forall n \geq 1, $$

and from here, since ∇g(u) = 0,

$$ \langle\nabla g(x_{n}),u-x_{n}\rangle\leq -\frac{1}{L_{g}}\|\nabla g(x_{n})\|^{2} \quad \forall n\geq1. $$

(7)

On the other hand, since g is convex and differentiable, we have

$$0=g(u)\geq g(x_{n})+\langle\nabla g(x_{n}),u-x_{n}\rangle \quad \forall n \geq 1 $$

or, equivalently,

$$ \langle\nabla g(x_{n}),u-x_{n}\rangle\leq -g(x_{n}) \quad \forall n\geq1. $$

(8)

From (7) and (8) we obtain for all n ≥ 1

$$ 2\lambda_{n}\beta_{n}\langle\nabla g(x_{n}),u-x_{n}\rangle\leq -\frac{2}{L_{g}(1+\eta)}\lambda_{n}\beta_{n}\|\nabla g(x_{n})\|^{2} - \frac{2\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n}). $$

(9)

For the term 2λ _n β _n〈x _n − x _n+1,∇g(x _n)〉 in (6) we have for all n ≥ 1 the following estimate

$$ 2\lambda_{n}\beta_{n}\langle x_{n}-x_{n+1},\nabla g(x_{n})\rangle\leq \frac{1}{1+\eta}\|x_{n+1}-x_{n}\|^{2}+(1+\eta){\lambda_{n}^{2}}{\beta_{n}^{2}}\|\nabla g(x_{n})\|^{2}. $$

(10)

Employing the inequalities (9) and (10) in (6), we obtain for every n ≥ 1 that

$$\begin{array}{@{}rcl@{}} \varphi_{n+1}-\varphi_{n} &-&\alpha(\varphi_{n}-\varphi_{n-1})+(1-\alpha)\|x_{n+1}-x_{n}\|^{2}-2\alpha\|x_{n}-x_{n-1}\|^{2}\\ &\leq& -\,\frac{2}{L_{g}(1+\eta)}\lambda_{n}\beta_{n}\|\nabla g(x_{n})\|^{2}-\frac{2\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n})\\ &&+\,\frac{1}{1+\eta}\|x_{n+1}-x_{n}\|^{2}+(1+\eta){\lambda_{n}^{2}}{\beta_{n}^{2}}\|\nabla g(x_{n})\|^{2}\\ &&+\,2\lambda_{n}\langle u-x_{n+1},\nabla h(x_{n})+v\rangle, \end{array} $$

and further

$$\begin{array}{@{}rcl@{}} \varphi_{n+1}-\varphi_{n} &-& \alpha(\varphi_{n}-\varphi_{n-1})+\left[\frac{2}{L_{g}(1+\eta)}-(1+\eta)\lambda_{n}\beta_{n}\right]\lambda_{n}\beta_{n}\|\nabla g(x_{n})\|^{2}\\ &\leq& 2\alpha\|x_{n}-x_{n-1}\|^{2}-\frac{2\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n})\\ &&+\,\left[\frac{1}{1+\eta}+\alpha-1\right]\|x_{n+1}-x_{n}\|^{2}\\ &&+\,2\lambda_{n}\langle u-x_{n+1},\nabla h(x_{n})+v\rangle. \end{array} $$

(11)

Not least,

$$\begin{array}{@{}rcl@{}} 2\lambda_{n}\langle u-x_{n+1},\nabla h(x_{n})+v\rangle &=& 2\lambda_{n}\langle u-x_{n},\nabla h(x_{n})+v\rangle\\ &&+\,2\lambda_{n}\langle x_{n}-x_{n+1},\nabla h(x_{n})+v\rangle\\ &\leq& 2\lambda_{n}\langle u-x_{n},\nabla h(x_{n})+v\rangle\\ && +\,\frac{\eta}{2(1+\eta)}\|x_{n+1}-x_{n}\|^{2}\\ &&+\,\frac{2(1+\eta)}{\eta}{\lambda_{n}^{2}}\|\nabla h(x_{n})+v\|^{2}\\ &\leq& 2\lambda_{n}\langle u-x_{n},\nabla h(x_{n})+v\rangle\\ && +\,\frac{\eta}{2(1+\eta)}\|x_{n+1}-x_{n}\|^{2}\\ &&+\,\frac{4(1+\eta)}{\eta}{\lambda_{n}^{2}}\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\,\frac{4(1+\eta)}{\eta}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2} \end{array} $$

and by employing this estimate in (11), we deduce that for every n ≥ 1

$$\begin{array}{@{}rcl@{}} \varphi_{n+1}&-&\varphi_{n}-\alpha(\varphi_{n}-\varphi_{n-1}) + \lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta)}-(1+\eta)\lambda_{n}\beta_{n}\right]\|\nabla g(x_{n})\|^{2}\\ &+&\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n})\leq 2\alpha\|x_{n}-x_{n-1}\|^{2} \\ &&+\left[\frac{1}{1+\eta}+\frac{\eta}{2(1+\eta)}+\alpha-1\right]\|x_{n+1}-x_{n}\|^{2}\\ &&+\frac{4(1+\eta)}{\eta}{\lambda_{n}^{2}}\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\frac{4(1+\eta)}{\eta}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}\\ &&-\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n})+2\lambda_{n}\langle u-x_{n},\nabla h(x_{n})+v\rangle. \end{array} $$

(12)

By using the $\frac {1}{L_{h}}$-cocoercivity of ∇h we obtain for every n ≥ 1 that

$$\begin{array}{@{}rcl@{}} &&2\lambda_{n}\langle u-x_{n},\nabla h(x_{n})+v\rangle-\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n})\\ &=& 2\lambda_{n}\langle u-x_{n},\nabla h(x_{n})-\nabla h(u)\rangle+2\lambda_{n}\langle u-x_{n},\nabla h(u)+v\rangle-\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n})\\ &\leq& \frac{-2\lambda_{n}}{L_{h}}\|\nabla h(x_{n})-\nabla h(u)\|^{2}+2\lambda_{n}\langle u-x_{n},-p\rangle-\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n}), \end{array} $$

while, since p ∈ N _{arg min g}(u), it holds that

$$\begin{array}{@{}rcl@{}} &&-\,2\lambda_{n}\langle u-x_{n},p\rangle-\frac{\eta}{1+\eta}\lambda_{n}\beta_{n} g(x_{n})\\ &=& 2\lambda_{n}\langle x_{n},p\rangle-\frac{\eta}{1+\eta}\lambda_{n}\beta_{n} g(x_{n})-2\lambda_{n}\langle u, p \rangle\\ &=&\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}\left[\left\langle x_{n},\frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right\rangle-g(x_{n})-\left\langle u,\frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right\rangle\right]\\ &\leq&\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}\left[g^{*}\left( \frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right)-\left\langle u,\frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right\rangle\right]\\ &=&\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}\left[g^{*}\left( \frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right)\right] \quad \forall n \geq 1. \end{array} $$

By combining these two inequalities with (12), it follows for every n ≥ 1 that

$$\begin{array}{@{}rcl@{}} \varphi_{n+1}&-&\varphi_{n}-\alpha(\varphi_{n}-\varphi_{n-1}) +\lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta)}-(1+\eta)\lambda_{n}\beta_{n}\right]\|\nabla g(x_{n})\|^{2}\\ &+&\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}g(x_{n})\leq 2\alpha\|x_{n}-x_{n-1}\|^{2}\\ &&+\left[\frac{1}{1+\eta}+\frac{\eta}{2(1+\eta)}+\alpha-1\right]\|x_{n+1}-x_{n}\|^{2}\\ &&+\frac{4(1+\eta)}{\eta}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}\\ &&+\lambda_{n}\left[\frac{4(1+\eta)}{\eta}\lambda_{n}-\frac{2}{L_{h}}\right]\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\frac{\eta}{1+\eta}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta}{1+\eta}\beta_{n}}\right)\right]. \end{array} $$

Since α ∈ [0, 1) and η > 0, it holds that $\frac {1}{1+\eta }+\frac {\eta }{2(1+\eta )}+\alpha -1<1$, which together with the inequality above lead to the conclusion. □

For simplicity, we will make use of the following notation:

$${\Omega}_{n}(x_{n}):=f(x_{n})+h(x_{n})+\beta_{n} g(x_{n}) \quad \forall n\geq1. $$

Lemma 2

For every n ≥ 1it holds that

$$\begin{array}{@{}rcl@{}} {\Omega}_{n+1}(x_{n+1})-{\Omega}_{n}(x_{n}) &\leq& \left[\frac{\alpha}{2\lambda_{n}}-\frac{1}{\lambda_{n}}+\frac{L_{h}+\beta_{n} L_{g}}{2}\right]\|x_{n+1}-x_{n}\|^{2}\\ &&+\,\frac{\alpha}{2\lambda_{n}}\|x_{n}-x_{n-1}\|^{2}+(\beta_{n+1}-\beta_{n})g(x_{n+1}). \end{array} $$

(13)

Proof

Recall that for every n ≥ 1 we have $\frac {y_{n}-x_{n+1}}{\lambda _{n}}-\nabla h(x_{n})-\beta _{n}g(x_{n})\in \partial f(x_{n+1})$, which implies

$$f(x_{n})\geq f(x_{n+1})+\left\langle \frac{y_{n}-x_{n+1}}{\lambda_{n}}-\nabla h(x_{n})-\beta_{n}g(x_{n}),x_{n}-x_{n+1} \right\rangle. $$

From here, it follows that for every n ≥ 1 we have

$$\begin{array}{@{}rcl@{}} f(x_{n+1})-f(x_{n}) &\leq&-\frac{1}{\lambda_{n}}\|x_{n+1}-x_{n}\|^{2}+\frac{\alpha}{\lambda_{n}}\langle x_{n}-x_{n-1},x_{n+1}-x_{n}\rangle\\ &&+\,\langle \nabla h(x_{n}),x_{n}-x_{n+1}\rangle+\beta_{n}\langle \nabla g(x_{n}),x_{n}-x_{n+1}\rangle\\ &\leq& \left[\frac{\alpha}{2\lambda_{n}}-\frac{1}{\lambda_{n}}\right]\|x_{n+1}-x_{n}\|^{2}+\frac{\alpha}{2\lambda_{n}}\|x_{n}-x_{n-1}\|^{2} \\ &&+\,\langle \nabla h(x_{n}),x_{n}-x_{n+1}\rangle+\beta_{n}\langle \nabla g(x_{n}),x_{n}-x_{n+1}\rangle. \end{array} $$

(14)

From the Descent Lemma (see for example [12, Theorem 18.15]), we obtain for every n ≥ 1 that

$$g(x_{n+1})\leq g(x_{n})+\langle \nabla g(x_{n}), x_{n+1}-x_{n}\rangle+\frac{L_{g}}{2}\|x_{n+1}-x_{n}\|^{2}, $$

and

$$h(x_{n+1})\leq h(x_{n})+\langle \nabla h(x_{n}), x_{n+1}-x_{n}\rangle+\frac{L_{h}}{2}\|x_{n+1}-x_{n}\|^{2}. $$

By combining these relations with inequality (14), we finally obtain the inequality in the statement of the lemma. □

For the forthcoming statements, we fix an element $u\in \mathcal {S}$. For a simpler formulation of these results, we will use the following notation:

$$\begin{array}{@{}rcl@{}} {\Gamma}_{n}&:=&{\Omega}_{n}(x_{n})-K\frac{\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n} g(x_{n})+K\varphi_{n}\\ &=& f(x_{n})+h(x_{n})+\left( 1-K\frac{\eta_{0}}{1+\eta_{0}}\lambda_{n}\right)\beta_{n}g(x_{n})+K\varphi_{n} \quad \forall n\geq1. \end{array} $$

Lemma 3

Let $u\in \mathcal {S}$. According to the first-order optimality conditions, there exist v ∈ ∂ f(u)and p ∈ N _{arg min g}(u),such that 0 = v + ∇h(u) + p.Then, for every n ≥ 2, it holds that

$$\begin{array}{@{}rcl@{}} &&{\Gamma}_{n+1}-{\Gamma}_{n} - \alpha({\Gamma}_{n}-{\Gamma}_{n-1})+K\lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta_{0})}-(1+\eta_{0})\lambda_{n}\beta_{n}\right]\!\!\|\nabla g(x_{n})\|^{2}\\ &\leq&\left[\frac{\alpha}{2\lambda_{n}}-\frac{1}{\lambda_{n}}+\frac{L_{h}+\beta_{n} L_{g}}{2}+K\right]\|x_{n+1}-x_{n}\|^{2}\\ &&+\,\left[\frac{\alpha}{2\lambda_{n}}+2\alpha K\right]\|x_{n}-x_{n-1}\|^{2}+\frac{4K(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}\\ &&+\,K\left[\frac{4(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}-\frac{2\lambda_{n}}{L_{h}}\right]\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}\left[g^{*}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right]\\ &&+\,\alpha({\Omega}_{n-1}(x_{n-1})-{\Omega}_{n}(x_{n}))\,+\,\frac{\alpha K\eta_{0}}{1+\eta_{0}}(\lambda_{n}\beta_{n} g(x_{n})-\!\lambda_{n-1}\beta_{n-1}g(x_{n-1})).\\ \end{array} $$

(15)

Proof

We write (2) for η := η ₀, multiply it by K, and after combining the resulting inequality with (13), we obtain for every n ≥ 1 that

$$\begin{array}{@{}rcl@{}} && {\Omega}_{n+1}(x_{n+1}) +K\varphi_{n+1}-{\Omega}_{n}(x_{n})-K\varphi_{n}+\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}g(x_{n})\\ && -\,\alpha(K\varphi_{n}-K\varphi_{n-1})+K\lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta_{0})}-(1+\eta_{0})\lambda_{n}\beta_{n}\right]\|\nabla g(x_{n})\|^{2}\\ &\leq & \left[\frac{\alpha}{2\lambda_{n}}-\frac{1}{\lambda_{n}}+\frac{L_{h}+\beta_{n} L_{g}}{2}+K\right]\|x_{n+1}-x_{n}\|^{2}+\left[\frac{\alpha}{2\lambda_{n}}+2\alpha K\right]\|x_{n}-x_{n-1}\|^{2}\\ &&+\,\frac{4K(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}+K\left[\frac{4(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}-\frac{2\lambda_{n}}{L_{h}}\right]\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\,(\beta_{n+1}-\beta_{n})g(x_{n+1})+\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right]. \end{array} $$

In view of Assumption 1(III), we deduce that

$$\begin{array}{@{}rcl@{}} && {\Omega}_{n+1}(x_{n+1}) +K\varphi_{n+1}-{\Omega}_{n}(x_{n})-K\varphi_{n}+\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}g(x_{n})\\ && -\,\alpha(K\varphi_{n}-K\varphi_{n-1})+K\lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta_{0})}-(1+\eta_{0})\lambda_{n}\beta_{n}\right]\|\nabla g(x_{n})\|^{2}\\ &\leq & \left[\frac{\alpha}{2\lambda_{n}}-\frac{1}{\lambda_{n}}+\frac{L_{h}+\beta_{n} L_{g}}{2}+K\right]\|x_{n+1}-x_{n}\|^{2}+\left[\frac{\alpha}{2\lambda_{n}}+2\alpha K\right]\|x_{n}-x_{n-1}\|^{2}\\ && +\,\frac{4K(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}+K\left[\frac{4(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}-\frac{2\lambda_{n}}{L_{h}}\right]\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ && +\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n+1}\beta_{n+1}g(x_{n+1})\\ && +\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right] \quad \forall n \geq 1 \end{array} $$

and further

$$\begin{array}{@{}rcl@{}} &&{\Gamma}_{n+1}-{\Gamma}_{n}-\alpha(K\varphi_{n}-K\varphi_{n-1})\\ && +\,K\lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta_{0})}-(1+\eta_{0})\lambda_{n}\beta_{n}\right]\|\nabla g(x_{n})\|^{2}\\ &\leq& \left[\frac{\alpha}{2\lambda_{n}}-\!\frac{1}{\lambda_{n}}+\!\frac{L_{h}+\beta_{n} L_{g}}{2}+K\right]\|x_{n+1}-x_{n}\|^{2}+\left[\frac{\alpha}{2\lambda_{n}}+2\alpha K\right]\|x_{n}-x_{n-1}\|^{2}\\ &&+\,\frac{4K(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}+K\left[\frac{4(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}-\frac{2\lambda_{n}}{L_{h}}\right]\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right] \quad \forall n\geq2. \end{array} $$

In order to obtain (15), we only have to add α(Ω_n−1(x _n−1) −Ω_n(x _n)) and $\frac {\alpha K\eta _{0}}{1+\eta _{0}}(\lambda _{n}\beta _{n}g(x_{n})-\lambda _{n-1}\beta _{n-1}g(x_{n-1}))$ to both sides of the above inequality. □

The following results is a very useful tool in the convergence analysis of inertial algorithms (see [1, 2, 20]).

Lemma 4

Let $\{a_{n}\}_{n=0}^{\infty }$ , $\{b_{n}\}_{n=1}^{\infty }$ and $\{c_{n}\}_{n=1}^{\infty }$ be real sequences and α ∈ [0, 1)be a given real number. Assume that $\{a_{n}\}_{n=1}^{\infty }$ is bounded from below, $\{b_{n}\}_{n=1}^{\infty }$ is nonnegative and ${\sum }_{n=1}^{\infty } c_{n}<+\infty ,$ such that

$$a_{n+1}-a_{n}-\alpha(a_{n}-a_{n-1})+b_{n}\leq c_{n} \quad \forall n \geq 1. $$

Then the following statements hold:

(i)
${\sum }_{n=1}^{\infty }[a_{n}-a_{n-1}]_{+}<+\infty $, where [t]₊ := max{t,0};
(ii)
$\{a_{n}\}_{n=1}^{\infty }$ converges and ${\sum }_{n=1}^{\infty } b_{n}<+\infty $.

The results presented in Lemma 5 related to the convergence of the generated iterates and in Lemma 6 related to the convergence of the objective function values will be used in the next section in the proof of the main theorem in combination with the Opial lemma stated in Lemma 7.

Lemma 5

Let $u\in \mathcal {S}$. According to the first-order optimality conditions there exist v ∈ ∂ f(u)and p ∈ N _{arg min g}(u), such that 0 = v + ∇h(u) + p.Then the following statements are true:

(i)
The sequence $\{{\Gamma }_{n}\}_{n=1}^{+\infty }$ is bounded from below;
(ii)
${\sum }_{n=1}^{\infty }\|x_{n+1}-x_{n}\|^{2}<+\infty $ ;
(iii)
$\lim _{n\to +\infty }{\Gamma }_{n}$ exists and ${\sum }_{n=1}^{+\infty }\lambda _{n}\beta _{n}\|\nabla g(x_{n})\|^{2}<+\infty $ ;
(iv)
$\lim _{n\to +\infty }\|x_{n}-u\|$ exists, ${\sum }_{n=1}^{\infty }[\varphi _{n}-\varphi _{n-1}]_{+}\!<+\infty $ and ${\sum }_{n=1}^{\infty }\lambda _{n}\beta _{n}g(x_{n})\!<+\infty $ ;
(v)
$\lim _{n\to +\infty }{\Omega }_{n}(x_{n})$ exists;
(vi)
$\lim _{n\to +\infty }g(x_{n})=0$ and every sequential weak cluster point of $\{x_{n}\}_{n=1}^{\infty }$ lies in arg ming.

Proof

(i) According to Assumption 1(III) we have that

$$\frac{L_{h}+\beta_{n}L_{g}}{2}+\frac{\alpha-1}{\lambda_{n}}\leq -(1+2\alpha)K-c, $$

which implies that 1 ≥ K λ _n and further

$$ \left( 1- \frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\right)\beta_{n}g(x_{n})\geq0 \quad \forall n\geq1. $$

(16)

By using the definition of Γ_n and Assumption 1(II), we easily derive that $\{{\Gamma }_{n}\}_{n=1}^{\infty }$ is bounded from below.

(ii) For every n ≥ 2, we set

$$\mu_{n}:={\Gamma}_{n}-\alpha{\Gamma}_{n-1}+\left( \frac{1}{2\lambda_{n}}+2K\right)\alpha\|x_{n}-x_{n-1}\|^{2} $$

and

$$\omega_{n}:=\alpha({\Omega}_{n-1}(x_{n-1})-{\Omega}_{n}(x_{n}))+\frac{\alpha K\eta_{0}}{1+\eta_{0}}(\lambda_{n}\beta_{n} g(x_{n})-\lambda_{n-1}\beta_{n-1}g(x_{n-1})). $$

For a fixed natural number N ₀ ≥ 2, it holds that

$$\begin{array}{@{}rcl@{}} \frac{1}{\alpha}\sum\limits_{n=2}^{N_{0}}\omega_{n}&=&f(x_{1})+h(x_{1})+\left( 1-\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{1}\right)\beta_{1}g(x_{1})-f(x_{N_{0}})-h(x_{N_{0}})\\ &&-\left( 1-\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{N_{0}}\right)\beta_{N_{0}}g(x_{N_{0}}). \end{array} $$

Since f + h is bounded from below and relation (16) is true, we obtain that $\{\omega _{n}\}_{n=1}^{\infty }$ is summable.

For every n ≥ 1, we set

$$\delta_{n}:=\left( \frac{1}{2\lambda_{n}}+2K\right)\alpha+c. $$

Consequently, according Assumption 1(III), it follows that

$$ \frac{L_{h}+\beta_{n} L_{g}}{2}+\frac{\alpha}{2\lambda_{n}}-\frac{1}{\lambda_{n}}+K\leq -\delta_{n} \quad \forall n\geq1. $$

(17)

Further, for every n ≥ 1, it holds

$$-\delta_{n}+\alpha\left( \frac{1}{2\lambda_{n+1}}+2K\right) = \frac{\alpha}{2}\left( \frac{1}{\lambda_{n+1}}-\frac{1}{\lambda_{n}}\right)-c, $$

which, together with Assumption 1(IV), implies that

$$ -\delta_{n}+\alpha\left( \frac{1}{2\lambda_{n+1}}+2K\right)\leq 1-c $$

(18)

and so

$$ \delta_{n+1}\leq \delta_{n}+1. $$

(19)

On the other hand, by Assumption 1(III), we also have for every n ≥ 1

$$ 0<\frac{2}{L_{g}(1+\eta_{0})}-(1+\eta_{0})b\leq\frac{2}{L_{g}(1+\eta_{0})}-(1+\eta_{0})\lambda_{n}\beta_{n} $$

(20)

and so

$$K\lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta_{0})}-(1+\eta_{0})\lambda_{n}\beta_{n}\right]\geq0. $$

By employing the last inequality in Lemma 3 we obtain for every n ≥ 2

$$\begin{array}{@{}rcl@{}} \mu_{n+1}-\mu_{n} &=& {\Gamma}_{n+1}-{\Gamma}_{n}-\alpha\left( {\Gamma}_{n}-{\Gamma}_{n-1}\right)+\alpha\left( \frac{1}{2\lambda_{n+1}}+2K\right)\|x_{n+1}-x_{n}\|^{2}\\ && -\,\alpha\left( \frac{1}{2\lambda_{n}}+2K\right)\|x_{n}-x_{n-1}\|^{2}\\ &\leq& \left[\frac{\alpha}{2\lambda_{n}}-\frac{1}{\lambda_{n}}+\frac{L_{h}+\beta_{n} L_{g}}{2}+K+\alpha\left( \frac{1}{2\lambda_{n+1}}+2K\right)\right]\|x_{n+1}-x_{n}\|^{2}\\ &&+\,\frac{4K(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}\\ &&+\,K\left[\frac{4(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}-\frac{2\lambda_{n}}{L_{h}}\right]\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right]+\omega_{n}\\ &\leq& (1-c)\|x_{n+1}-x_{n}\|^{2}+\frac{4K(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}\\ &&+\,K\left[\frac{4(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}-\frac{2\lambda_{n}}{L_{h}}\right]\|\nabla h(x_{n})-\nabla h(u)\|^{2}\\ &&+\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right]+\omega_{n}, \end{array} $$

where for the last inequality we use (17) and (18). Since λ _n → 0 as n → + ∞, there exists $N_{1}\in \mathbb {N}$, such that for every n ≥ N ₁, we have $\frac {4(1+\eta _{0})}{\eta _{0}}\lambda _{n}-\frac {2}{L_{h}}<0$. This implies that for every n ≥ N ₁

$$\begin{array}{@{}rcl@{}} \mu_{n+1}-\mu_{n} &\leq& (1-c)\|x_{n+1}-x_{n}\|^{2}+\frac{4K(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}\\ &&+\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right) - \sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right]+\omega_{n}. \end{array} $$

Summing up this inequality for n = N ₁,…,N ₂, where N ₂ is a natural number with N ₂ ≥ N ₁, we obtain that

$$\begin{array}{@{}rcl@{}} \mu_{N_{2}+1}-\mu_{N_{1}} &\leq& (1-c)\sum\limits_{n=N_{1}}^{N_{2}}\|x_{n+1}-x_{n}\|^{2}+\frac{4K(1+\eta_{0})}{\eta_{0}}\|\nabla h(u)+v\|^{2}\sum\limits_{n=N_{1}}^{N_{2}}{\lambda_{n}^{2}}\\ &&+\,\frac{K\eta_{0}}{1+\eta_{0}}\sum\limits_{n=N_{1}}^{N_{2}}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right]\\ &&+\,\sum\limits_{n=N_{1}}^{N_{2}}\omega_{n}. \end{array} $$

(21)

This means that $\{\mu _{n}\}_{n=2}^{\infty }$ is bounded from above (we take into account that c > 1). Let M be a positive upper bound of $\{\mu _{n}\}_{n=2}^{\infty }$. Observing that Γ_n+1 − αΓ_n ≤ μ _n+1 ≤ M, thus Γ_n+1 ≤ αΓ_n + M for every n ≥ N ₁, we obtain

$${\Gamma}_{n}\leq\alpha^{n-N_{1}}{\Gamma}_{N_{1}}+M\sum\limits_{k=1}^{n-N_{1}}\alpha^{k-1}\leq \alpha^{n-N_{1}}{\Gamma}_{N_{1}}+\frac{M}{1-\alpha} \quad \forall n\geq N_{1}+1. $$

Since $\{{\Gamma }_{n}\}_{n=1}^{\infty }$ is bounded from below, there exists $C\in \mathbb {R}$, such that

$$\begin{array}{@{}rcl@{}} -\mu_{N_{2}+1}&=&-{\Gamma}_{N_{2}+1}+\alpha{\Gamma}_{N_{2}}-\left( \frac{1}{2\lambda_{N_{2}+1}}+2K\right)\alpha\|x_{N_{2}+1}-x_{N_{2}}\|^{2}\\ &\leq& \alpha{\Gamma}_{N_{2}}+C \leq \alpha^{N_{2}-N_{1}+1}{\Gamma}_{N_{1}}+\frac{M\alpha}{1-\alpha}+C \quad \forall N_{2} \geq N_{1}. \end{array} $$

Thus, from the inequality (21), by taking into account that c > 1, we deduce that

$$\sum\limits_{n=1}^{+\infty}\|x_{n+1}-x_{n}\|^{2}<+\infty. $$

(iii) From Lemma 3, by using (17), (19), and (20), we obtain

$$\begin{array}{@{}rcl@{}} &&{\Gamma}_{n+1}-{\Gamma}_{n} - \alpha({\Gamma}_{n}-{\Gamma}_{n-1})+K\lambda_{n}\beta_{n}\left[\frac{2}{L_{g}(1+\eta_{0})}-(1+\eta_{0})b\right]\|\nabla g(x_{n})\|^{2}\\ &\leq&-\delta_{n}\|x_{n+1}-x_{n}\|^{2}+\delta_{n-1}\|x_{n}-x_{n-1}\|^{2}+\frac{4K(1+\eta_{0})}{\eta_{0}}{\lambda_{n}^{2}}\|\nabla h(u)+v\|^{2}\\ &&+\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{2p}{\frac{\eta_{0}}{1+\eta_{0}}\beta_{n}}\right)\right]+\omega_{n} \quad \forall n \geq N_{1}. \end{array} $$

Since $\{{\Gamma }_{n}\}_{n=1}^{\infty }$ is bounded from below, by using Lemma 4, it follows that (iii) is true.

(iv) The statement follows from Lemma 1 and Lemma 4.

(v) Thanks to (iii), (iv) and ${\Gamma }_{n}={\Omega }_{n}(x_{n})-K\frac {\eta _{0}}{1+\eta _{0}}\lambda _{n}\beta _{n} g(x_{n})+K\varphi _{n}$ for every n ≥ 1, we obtain that $\lim _{n\to +\infty }{\Omega }_{n}(x_{n})$ exists.

(vi) Since λ _n β _n ≥ a > 0 for every n ≥ 1, we have ${\sum }_{n=1}^{+\infty }g(x_{n})<+\infty $ and so $\lim _{n\to +\infty }g(x_{n})=0$.

Finally, let $y\in {\mathcal H}$ be a sequential weak cluster point of $\{x_{n}\}_{n=1}^{\infty }$ and $\{x_{n_{j}}\}_{j=1}^{\infty }$ be a subsequence of $\{x_{n}\}_{n=1}^{\infty }$, such that $x_{n_{j}}$ weakly converges to y as j → + ∞. Since g is weakly lower semicontinuous, we obtain that

$$g(y)\leq\liminf_{j\to+\infty}g(x_{n_{j}})\leq \lim_{n\to+\infty}g(x_{n})=0, $$

which means that y ∈ arg min g. □

Lemma 6

Let $u\in \mathcal {S}$ . Then, we have

$$-\infty \leq \sum\limits_{n=1}^{\infty}\lambda_{n}\left[{\Omega}_{n+1}(x_{n+1})-(f(u)+h(u))\right]<+\infty. $$

Proof

For every n ≥ 1, we have that

$$\begin{array}{@{}rcl@{}} {\Omega}_{n+1}(x_{n+1})-(f(u)+h(u)) &=&f(x_{n+1})+h(x_{n})+\beta_{n}g(x_{n})-(f(u)+h(u))\\ &&+\,(h(x_{n+1})-h(x_{n}))+(\beta_{n+1}-\beta_{n})g(x_{n+1})\\ &&+\,\beta_{n}(g(x_{n+1})-g(x_{n})) \\ &\leq&f(x_{n+1})+h(x_{n})+\beta_{n}g(x_{n})-(f(u)+h(u))\\ &&+\,(h(x_{n+1})-h(x_{n}))+\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n+1}\beta_{n+1}g(x_{n+1})\\ &&+\,\beta_{n}(g(x_{n+1})-g(x_{n})) \end{array} $$

and so

$$\begin{array}{@{}rcl@{}} \lambda_{n}[{\Omega}_{n+1}(x_{n+1})-(f(u)+h(u))] &\leq&\lambda_{n}[f(x_{n+1})+h(x_{n})+\beta_{n}g(x_{n})-(f(u)+h(u))]\\ &&+\,\lambda_{n}[(h(x_{n+1})-h(x_{n})]\\&&+\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\lambda_{n+1}\beta_{n+1}g(x_{n+1})\\ &&+\,\lambda_{n}\beta_{n}[g(x_{n+1})-g(x_{n})]. \end{array} $$

According to the Descent Lemma we have for every n ≥ 1

$$\begin{array}{@{}rcl@{}} \lambda_{n}[h(x_{n+1})-h(x_{n})] &\leq&\lambda_{n}\langle \nabla h(x_{n}), x_{n+1}-x_{n}\rangle+\frac{\lambda_{n} L_{h}}{2}\|x_{n+1}-x_{n}\|^{2}\\ &\leq&\frac{{\lambda_{n}^{2}}}{2}\|\nabla h(x_{n})\|^{2}+\frac{1+\lambda_{n} L_{h}}{2}\|x_{n+1}-x_{n}\|^{2} \end{array} $$

and

$$\begin{array}{@{}rcl@{}} \lambda_{n}\beta_{n}[g(x_{n+1})-g(x_{n})]&\leq&\lambda_{n}\beta_{n}\langle \nabla g(x_{n}), x_{n+1}-x_{n}\rangle+\frac{\lambda_{n}\beta_{n} L_{g}}{2}\|x_{n+1}-x_{n}\|^{2}\\ &\leq&\frac{{\lambda_{n}^{2}}{\beta_{n}^{2}}}{2}\|\nabla g(x_{n})\|^{2}+\frac{1+\lambda_{n}\beta_{n} L_{g}}{2}\|x_{n+1}-x_{n}\|^{2}, \end{array} $$

which give rise to the following estimate

$$\begin{array}{@{}rcl@{}} \lambda_{n}[{\Omega}_{n+1}(x_{n+1})-(f(u)+h(u))] &\leq&\lambda_{n}[f(x_{n+1})+h(x_{n})+\beta_{n}g(x_{n})-(f(u)+h(u))]\\ &&+\,\frac{{\lambda_{n}^{2}}}{2}\|\nabla h(x_{n})\|^{2}\\ &&+\,\frac{2+\lambda_{n} (L_{h}+\beta_{n} L_{g})}{2}\|x_{n+1}-x_{n}\|^{2}\\ &&+\,\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n}\lambda_{n+1}\beta_{n+1}g(x_{n+1})\\ &&+\,\frac{{\lambda_{n}^{2}}{\beta_{n}^{2}}}{2}\|\nabla g(x_{n})\|^{2} \quad \forall n\geq1. \end{array} $$

(22)

Further, we notice that for every n ≥ 1

$$f(x_{n+1})-f(u)\leq \left\langle \frac{y_{n}-x_{n+1}}{\lambda_{n}}-\nabla h(x_{n})-\beta_{n}\nabla g(x_{n}),x_{n+1}-u\right\rangle $$

or, equivalently,

$$\begin{array}{@{}rcl@{}} &&\left\langle \frac{y_{n}-x_{n+1}}{\lambda_{n}},u-x_{n+1}\right\rangle - \langle \nabla h(x_{n}),u-x_{n+1}\rangle-\beta_{n}\langle \nabla g(x_{n}),u-x_{n+1}\rangle \\ &&\qquad \leq f(u)-f(x_{n+1}). \end{array} $$

(23)

On the other hand, since g(u) = 0, we have for every n ≥ 1

$$0=g(u)\geq g(x_{n})+\langle \nabla g(x_{n}),u-x_{n}\rangle $$

or, equivalently,

$$ -\beta_{n}\langle \nabla g(x_{n}),u-x_{n+1}\rangle\geq \beta_{n}g(x_{n})+\beta_{n}\langle \nabla g(x_{n}),x_{n+1}-x_{n}\rangle. $$

(24)

Similarly, we have for every n ≥ 1

$$h(u)\geq h(x_{n})+\langle \nabla h(x_{n}),u-x_{n}\rangle, $$

which implies that

$$ -\langle \nabla h(x_{n}),u-x_{n+1}\rangle\geq h(x_{n})-h(u)+\langle \nabla h(x_{n}),x_{n+1}-x_{n}\rangle. $$

(25)

By summing up the inequalities in (23)–(25), we obtain for every n ≥ 1

$$\begin{array}{@{}rcl@{}} 2\lambda_{n}[f(x_{n+1})+h(x_{n})+\beta_{n}g(x_{n})&-&(f(u)+h(u))] \leq 2\langle y_{n}-x_{n+1},x_{n+1}-u\rangle\\ &&+\,2\lambda_{n}\beta_{n}\langle \nabla g(x_{n}),x_{n}-x_{n+1}\rangle\\ &&+\,2\lambda_{n}\langle \nabla h(x_{n}),x_{n}-x_{n+1}\rangle\\ &\leq&2\langle y_{n}-x_{n+1},x_{n+1}-u\rangle\\ &&+\,{\lambda_{n}^{2}}{\beta_{n}^{2}}\|\nabla g(x_{n})\|^{2}+2\|x_{n}-x_{n+1}\|^{2}\\ &&+\,{\lambda_{n}^{2}}\|\nabla h(x_{n})\|^{2}. \end{array} $$

(26)

Not least, according to (4) and (5), we obtain for every n ≥ 1

$$\begin{array}{@{}rcl@{}} 2\langle y_{n}-x_{n+1},x_{n+1}-u\rangle &=&2\langle x_{n}-x_{n+1},x_{n+1}-u\rangle+2\alpha\langle x_{n}-x_{n-1},x_{n+1}-u\rangle\\ &\leq&\varphi_{n}-\varphi_{n+1}+\alpha(\varphi_{n}-\varphi_{n-1})\\ &&+\,2\alpha\|x_{n}-x_{n-1}\|^{2}+(\alpha-1)\|x_{n+1}-x_{n}\|^{2}\\ &\leq &\varphi_{n}-\varphi_{n+1}+\alpha(\varphi_{n}-\varphi_{n-1})+2\alpha\|x_{n}-x_{n-1}\|^{2}, \end{array} $$

which, combined with (26) and Lemma 5(iv), implies that

$$\sum\limits_{n=1}^{\infty}\lambda_{n}[f(x_{n+1})+h(x_{n})+\beta_{n}g(x_{n})-(f(u)+h(u))]<+\infty. $$

The conclusion follows by taking into account (22). □

3 Convergence of the Iterates and of the Objective Function Values

In this section, we will prove the main result of this paper. This addresses the convergence of both the sequence of iterates $\{x_{n}\}_{n=1}^{\infty }$ generated by Algorithm 1 and of the sequence of objective values $\{f(x_{n})+h(x_{n})\}_{n=1}^{\infty }$.

The Opial lemma, which we state next and for which we refer to [12, Lemma 2.39], will play a crucial role in the convergence analysis.

Lemma 7

Let ${\mathcal H}$ be a real Hilbert space, $C\subseteq {\mathcal H}$ a nonempty set and $\{x_{n}\}_{n=1}^{\infty }$ a given sequence, such that:

(i)
For every z ∈ C, $\lim _{n\to +\infty }\|x_{n}-z\|$ exists.
(ii)
Every sequential weak cluster point of $\{x_{n}\}_{n=1}^{\infty }$ lies in C.

Then, the sequence $\{x_{n}\}_{n=1}^{\infty }$ converges weakly to a point in C.

Theorem 1

Let $\{x_{n}\}_{n=1}^{\infty }$ be the sequence generated by Algorithm 1. Then:

(i)
the sequence $\{x_{n}\}_{n=1}^{\infty }$ converges weakly to a point in $\mathcal {S}$ ;
(ii)
the sequence $\{f(x_{n})+h(x_{n})\}_{n=1}^{\infty }$ converges to the optimal objective value of the optimization problem(1).

Proof

(i) We know that $\lim _{n\to +\infty }\|x_{n}-u\|$ exists for all $u\in \mathcal {S}$ (see Lemma 5(iv)); hence, in view of the Opial lemma, it is sufficient to show that all sequential weak cluster points of $\{x_{n}\}_{n=1}^{\infty }$ are in $\mathcal {S}$. Since $\{\lambda _{n}\}_{n=1}^{\infty }\notin \ell ^{1}$ and $\lim _{n\to +\infty }{\Omega }_{n}(x_{n})$ exists, from Lemma 6, we obtain that

$$\lim_{n\to+\infty}{\Omega}_{n}(x_{n})\leq f(u)+h(u) \quad \forall u\in\mathcal{S}. $$

Let $x^{\ast }\in {\mathcal H}$ be a sequential weak cluster point of $\{x_{n}\}_{n=1}^{\infty }$ and $\{x_{n_{k}}\}_{k=1}^{\infty }$ be a subsequence of $\{x_{n}\}_{n=1}^{\infty }$, such that $x_{n_{k}}$ converges weakly to x ^∗ as k → + ∞. From here, by Lemma 5(vi), we obtain that x ^∗∈ arg min g. Take an arbitrary $u\in \mathcal {S}$. The weak lower semicontinuity of f and h implies that

$$\begin{array}{@{}rcl@{}} f(x^{\ast})+h(x^{\ast}) &\leq&\liminf_{k\to+\infty}f(x_{n_{k}})+\liminf_{k\to+\infty}h(x_{n_{k}})\\ &\leq&\liminf_{k\to+\infty}\left[f(x_{n_{k}})+h(x_{n_{k}})\right]\\ &\leq&\lim_{n\to+\infty}{\Omega}_{n}(x_{n})\leq f(u)+h(u)\\ & = &\min\{f(x)+h(x):x\in\arg\min g\} \end{array} $$

which means that $x^{\ast }\in \mathcal {S}$.

(ii) The statement is a direct consequence of the above inequalities. □

We close the paper by a remark which discusses the fulfillment of the conditions stated in Assumption 1.

Remark 2

We chose

$$\eta_{0}\in (0,+\infty),\quad c\in(1,+\infty),\quad q\in \left( \frac{1}{2},1\right),\quad \alpha\in\left( 0, 1-\frac{1}{2(1+\eta_{0})^{2}}\right) $$

and

$$\gamma\in\left( \frac{1}{L_{g}(1-\alpha)(1+\eta_{0})^{2}},\min\left\{\frac{2}{L_{g}},\frac{3}{L_{g}(1-\alpha)(1+\eta_{0})^{2}}\right\}\right). $$

We set

$$K:=\frac{2(1+\eta_{0})}{\alpha\eta_{0}}, $$

$$\beta_{n}:=\frac{\gamma\left[L_{h}+2((1+2\alpha)K+c)\right]}{2-\gamma L_{g}}+\left[\frac{(1-\alpha)\gamma L_{g}(1+\eta_{0})^{2}-1}{L_{g}(1+\eta_{0})^{2}}\right]\frac{K\eta_{0}}{1+\eta_{0}}n^{q}, $$

and

$$\lambda_{n}:=\frac{(1-\alpha)\gamma}{\beta_{n}}-\frac{1}{\beta_{n} L_{g} (1+\eta_{0})^{2}} $$

for every n ≥ 1.

(i)
By taking $a:=(1-\alpha )\gamma -\frac {1}{L_{g}(1+\eta _{0})^{2}}$ and b > 0, such that $a<b<\frac {2}{L_{g}(1+\eta _{0})^{2}}$, we have
$$0<a=\lambda_{n}\beta_{n}< b<\frac{2}{L_{g}(1+\eta_{0})^{2}} \quad \forall n\geq1. $$
(ii)
Since $\beta _{n}\geq \frac {\gamma [L_{h}+2((1+2\alpha )K+c)]}{2-\gamma L_{g}}$, we have $\frac {L_{h}+\beta _{n} L_{g}}{2}-\frac {\beta _{n}}{\gamma }\leq -(1+2\alpha )K-c$ for every n ≥ 1. On the other hand, since $\frac {\beta _{n}}{\gamma }\leq \frac {(1-\alpha )}{\lambda _{n}}$, we have that $\frac {L_{h}+\beta _{n}L_{g}}{2}+\frac {\alpha -1}{\lambda _{n}}\leq -(1+2\alpha )K-c$ for every n ≥ 1.
(iii)
For every n ≥ 1, we also have
$$\begin{array}{@{}rcl@{}} \beta_{n+1}-\beta_{n} &=&\left[\frac{(1-\alpha)\gamma L_{g}(1+\eta_{0})^{2}-1}{L_{g}(1+\eta_{0})^{2}}\right]\frac{K\eta_{0}}{1+\eta_{0}}\left( (n+1)^{q}-n^{q}\right)\\ &\leq&\left[\frac{(1-\alpha)\gamma L_{g}(1+\eta_{0})^{2}-1}{L_{g}(1+\eta_{0})^{2}}\right]\frac{K\eta_{0}}{1+\eta_{0}}\\ &=&\frac{K\eta_{0}}{1+\eta_{0}}\lambda_{n+1}\beta_{n+1}. \end{array} $$
(iv)
From (iii), it follows that for every n ≥ 1
$$\frac{1}{\lambda_{n+1}}-\frac{1}{\lambda_{n}} = \left( \beta_{n+1}-\beta_{n}\right)\left[\frac{L_{g}(1+\eta_{0})^{2}}{(1-\alpha)\gamma L_{g}(1+\eta_{0})^{2}-1}\right]\leq\frac{K\eta_{0}}{1+\eta_{0}}=\frac{2}{\alpha}. $$
(v)
Due to the fact that $q\in \left (\frac {1}{2},1\right )$, we have ${\sum }_{n=1}^{+\infty }\frac {1}{\beta _{n}}=+\infty $ and ${\sum }_{n=1}^{+\infty }\frac {1}{{\beta _{n}^{2}}}<+\infty $. Consequently, $\{\lambda _{n}\}_{n=1}^{\infty }\in \ell ^{2}\setminus \ell ^{1}$.
(vi)
Since g ≤ δ _{arg min g}, it holds that g ^∗≥ (δ _{arg min g})^∗ = σ _{arg min g} and so g ^∗− σ _{arg min g} ≥ 0. For a function $g: {\mathcal H} \rightarrow \mathbb {R}$ fulfilling $g \geq \frac {a}{2}\text {dist}^{2}(\cdot ,\arg \min g)$ where a > 0, it holds that $g^{\ast }(x)-\sigma _{\arg \min g}(x)\leq \frac {1}{2a}\|x\|^{2}$ for every $x \in {\mathcal H}$. Thus, for every n ≥ 1,
$$\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{p}{\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{p}{\beta_{n}}\right)\right]\leq \frac{\lambda_{n}}{2a\beta_{n}}\|p\|^{2} \quad \forall p\in\text{ran}(N_{\arg\min g}). $$
Since ${\sum }_{n=1}^{\infty }\frac {1}{{\beta _{n}^{2}}}<+\infty $, from here it follows that
$$\sum\limits_{n=1}^{\infty}\lambda_{n}\beta_{n}\left[g^{\ast}\left( \frac{p}{\beta_{n}}\right)-\sigma_{\arg\min g}\left( \frac{p}{\beta_{n}}\right)\right]<+\infty. $$

4 Conclusions

We investigate the weak (non-ergodic) convergence of an inertial proximal-gradient method with penalization terms in connection with the solving of a bilevel optimization problem, having as objective the sum of a convex nonsmooth with a convex smooth function, and as constrained set the set of minimizers of another convex and differentiable function. The techniques of the proof combine tools specific to inertial algorithms [3] and penalty type methods [5, 8]. We show the convergence of both generated iterates and objective function values.

References

Alvarez, F.: On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J. Control. Optim. 38, 1102–1119 (2000)
Article MathSciNet MATH Google Scholar
Alvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14, 773–782 (2004)
Article MathSciNet MATH Google Scholar
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9, 3–11 (2001)
Article MathSciNet MATH Google Scholar
Attouch, H., Cabot, A., Czarnecki, M.-O.: Asymptotic behavior of nonautonomous monotone and subgradient evolution equations. Trans. Am. Math. Soc. http://dx.doi.org/https://doi.org/10.1090/tran/6965, arXiv:1601.00767 (2016)
Attouch, H., Czarnecki, M.-O.: Asymptotic behavior of coupled dynamical systems with multiscale aspects. J. Differ. Equ. 248, 1315–1344 (2010)
Article MathSciNet MATH Google Scholar
Attouch, H., Czarnecki, M.-O.: Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J. Differ. Equ. 262, 2745–2770 (2017)
Article MathSciNet MATH Google Scholar
Attouch, H., Czarnecki, M.-O., Peypouquet, J.: Prox-penalization and splitting methods for constrained variational problems. SIAM J. Optim. 21, 149–173 (2011)
Article MathSciNet MATH Google Scholar
Attouch, H., Czarnecki, M.-O., Peypouquet, J.: Coupling forward-backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J. Optim. 21, 1251–1274 (2011)
Article MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k ². SIAM J. Optim. 26, 1824–1834 (2016)
Article MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Optim. 24, 232–256 (2014)
Article MathSciNet MATH Google Scholar
Banert, S., Boţ, R.I.: Backward penalty schemes for monotone inclusion problems. J. Optim. Theory Appl. 166, 930–948 (2015)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York (2011)
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Cambridge (1999)
MATH Google Scholar
Borwein, J.M., Vanderwerff, J.D.: Convex Functions: Constructions, Characterizations and Counterexamples. Cambridge University Press, Cambridge (2010)
Book MATH Google Scholar
Boţ, R.I.: Conjugate Duality in Convex Optimization. Lecture Notes in Economics and Mathematical Systems, vol. 637. Springer, Berlin (2010)
Google Scholar
Boţ, R. I., Csetnek, E.R.: Forward-backward and Tseng’s type penalty schemes for monotone inclusion problems. Set-valued Var. Anal. 22, 313–331 (2014)
Article MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: A Tseng’s type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J. Math. 42, 451–465 (2014)
Article MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: Levenberg-Marquardt dynamics associated to variational inequalities. Set-Valued Var. Anal. (2017) https://doi.org/10.1007/s11228-017-0409-8, arXiv:1603.04460 (2016)
Boţ, R.I., Csetnek, E.R.: An inertial forward-backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algorithms 71, 519–540 (2016)
Article MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: An inertial alternating direction method of multipliers. Minimax Theory Appl. 1, 29–49 (2016)
MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: A hybrid proximal-extragradient algorithm with inertial effects. Numer. Funct. Anal. Optim. 36, 951–963 (2015)
Article MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems. J. Optim. Theory Appl. 171, 600–616 (2016)
Article MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: Approaching the solving of constrained variational inequalities via penalty term-based dynamical systems. J. Math. Anal. Appl. 435, 1688–1700 (2016)
Article MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: Penalty schemes with inertial effects for monotone inclusion problems. Optimization 66, 965–982 (2017)
Article MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: Second-order dynamical systems associated to variational inequalities. Appl. Anal. 96, 799–809 (2017)
Article MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: A second order dynamical system with Hessian-driven damping and penalty term associated to variational inequalities. arXiv:1608.04137 (2016)
Boţ, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)
MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R., László, S.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4, 3–25 (2016)
Article MathSciNet MATH Google Scholar
Boţ, R. I., Csetnek, E.R., Nimana, N.: Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data. Optim. Lett. https://doi.org/10.1007/s11590-017-1158-1 (2017)
Cabot, A., Frankel, P.: Asymptotics for some proximal-like method involving inertia and memory aspects. Set-valued Var. Anal. 19, 59–74 (2011)
Article MathSciNet MATH Google Scholar
Chen, C., Chan, R.H., Ma, S., Yang, J.: Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imag. Sci. 8, 2239–2267 (2015)
Article MathSciNet MATH Google Scholar
Chen, C., Ma, S., Yang, J.: A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J. Optim. 25, 2120–2142 (2015)
Article MathSciNet MATH Google Scholar
Maingé, P.-E.: Convergence theorems for inertial KM-type algorithms. J. Comput. Appl. Math. 219, 223–236 (2008)
Article MathSciNet MATH Google Scholar
Maingé, P.-E., Moudafi, A.: Convergence of new inertial proximal methods for DC programming. SIAM J. Optim. 19, 397–413 (2008)
Article MathSciNet MATH Google Scholar
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155, 447–454 (2003)
Article MathSciNet MATH Google Scholar
Noun, N., Peypouquet, J.: Forward-backward penalty scheme for constrained convex minimization without inf-compactness. J. Optim. Theory Appl. 158, 787–795 (2013)
Article MathSciNet MATH Google Scholar
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imag. Sci. 7, 1388–1419 (2014)
Article MathSciNet MATH Google Scholar
Peypouquet, J.: Coupling the gradient method with a general exterior penalization scheme for convex minimization. J. Optim. Theory Appl. 153, 123–138 (2012)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Introduction to Optimization. (Translated from the Russian) Translations Series in Mathematics and Engineering, Optimization Software, Inc. Publications Division, New York (1987)
Google Scholar
Zalinescu, C.: Convex Analysis in General Vector Spaces. World Scientific, Singapore (2002)
Book MATH Google Scholar

Download references

Acknowledgements

Open access funding provided by Austrian Science Fund (FWF). The authors are thankful to two anonymous referees and the Guest Editor Regina Burachik for comments and remarks which improved the quality of the presentation.

Author information

Authors and Affiliations

Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090, Vienna, Austria
Radu Ioan Boţ & Ernö Robert Csetnek
Department of Mathematics, Faculty of Science, Naresuan University, Phitsanulok, 65000, Thailand
Nimit Nimana

Authors

Radu Ioan Boţ
View author publications
You can also search for this author inPubMed Google Scholar
Ernö Robert Csetnek
View author publications
You can also search for this author inPubMed Google Scholar
Nimit Nimana
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Radu Ioan Boţ.

Additional information

Dedicated to Professor Michel Théra’s 70th birthday.

The work of the first author was partially supported by by FWF (Austrian Science Fund), project I2419-N32. The work of the second author was supported by by FWF (Austrian Science Fund), project P29809-N32. Research was done during the two months’ stay of the third author in Spring 2016 at the Faculty of Mathematics of the University of Vienna. The third author is thankful to the Royal Golden Jubilee PhD Program for financial support.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Boţ, R., Csetnek, E. & Nimana, N. An Inertial Proximal-Gradient Penalization Scheme for Constrained Convex Optimization Problems. Vietnam J. Math. 46, 53–71 (2018). https://doi.org/10.1007/s10013-017-0256-9

Download citation

Received: 20 March 2017
Accepted: 24 July 2017
Published: 01 September 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10013-017-0256-9

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Inertial Proximal-Gradient Penalization Scheme for Constrained Convex Optimization Problems

Abstract

Similar content being viewed by others

Proximal gradient method for convex multiobjective optimization problems without Lipschitz continuous gradients

A stochastic primal-dual method for a class of nonconvex constrained optimization

Proximal gradient methods for multiobjective optimization and their applications

1 Introduction and Preliminaries

Algorithm 1

2 Technical Lemmas

Assumption 1

Remark 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Lemma 5

Proof

Lemma 6

Proof

3 Convergence of the Iterates and of the Objective Function Values

Lemma 7

Theorem 1

Proof

Remark 2

4 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)