A Smoothing Algorithm with Constant Learning Rate for Training Two Kinds of Fuzzy Neural Networks and Its Convergence

Li, Long; Qiao, Zhijun; Long, Zuqiang

doi:10.1007/s11063-019-10135-4

A Smoothing Algorithm with Constant Learning Rate for Training Two Kinds of Fuzzy Neural Networks and Its Convergence

Published: 24 October 2019

Volume 51, pages 1093–1109, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Long Li¹,
Zhijun Qiao² &
Zuqiang Long¹

173 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, a smoothing algorithm with constant learning rate is presented for training two kinds of fuzzy neural networks (FNNs): max-product and max-min FNNs. Some weak and strong convergence results for the algorithm are provided with the error function monotonically decreasing, its gradient going to zero, and weight sequence tending to a fixed value during the iteration. Furthermore, conditions for the constant learning rate are specified to guarantee the convergence. Finally, three numerical examples are given to illustrate the feasibility and efficiency of the algorithm and to support the theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel computational approach to approximate fuzzy interpolation polynomials

Article Open access 27 August 2016

A New Combination Method for Fuzzy Optimal Control

Fuzzy Logic Programming for Tuning Neural Networks

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Baruch IS, Lopez RB, Guzman J-LO, Flores JM (2008) A fuzzy-neural multi-model for nonlinear systems identification and control. Fuzzy Sets Syst 159:2650–2667
Article MathSciNet Google Scholar
Hengjie S, Chunyan M, Zhiqi S, Yuan M, Lee B-S (2009) A fuzzy neural network with fuzzy impact grades. Neurocomputing 72:3098–3122
Article Google Scholar
Castro JR, Castillo O, Melin P, Rodríguez-Díaz A (2009) A hybrid learning algorithm for a class of interval type-2 fuzzy neural networks. Inf Sci 179:2175–2193
Article Google Scholar
Juang C-F, Yang-YinLin, Chiu-ChuanTu (2010) A recurrent self-evolving fuzzy neural network with local feedbacks and itsapplication to dynamic system processing. Fuzzy Sets Syst 161:2552–2568
Article Google Scholar
Khajeh A, Modarress H (2010) Prediction of solubility of gases in polystyrene by adaptive neuro-fuzzy inference system and radial basis function neural network. Expert Syst Appl 37:3070–3074
Article Google Scholar
Nandedkar AV, Biswas PK (2007) A fuzzy min-max neural network classifier with compensatory neuron architecture. IEEE Trans Neural Netw 18:42–54
Article Google Scholar
Sonule PM, Shetty BS (2017) An enhanced fuzzy min-max neural network with ant colony optimization based-rule-extractor for decision making. Neurocomputing 239:204–213
Article Google Scholar
Peeva K (2013) Resolution of fuzzy relational equations-method, algorithm and software with applications. Inf Sci 234(10):44–63
Article MathSciNet Google Scholar
Brouwer RK (2002) A discrete fully recurrent network of max product units for associative memory and classification. Int J Neural Syst 12(3–4):247–262
Article Google Scholar
Wong W-K, Loo C-K, Lim W-S, Tan P-N (2010) Thermal condition monitoring system using log-polar mapping, quaternion correlation and max-product fuzzy neural network classification. Neurocomputing 74(1–3):164–177
Article Google Scholar
Li Y, Zhong-Fu W (2008) Fuzzy feature selection based on min-max learning rule and extension matrix. Pattern Recognit 41:217–226
Article Google Scholar
Wong W, Loo C, Lim W, Tan P (2010) Thermal condition monitoring system using log-polar mapping, quaternion correlation and max-product fuzzy neural network classification. Neurocomputing 74:164–177
Article Google Scholar
Liu J, Ma Y, Zhang H, Hanguang S, Xiao G (2017) A modified fuzzy minCmax neural network for data clustering and its application on pipeline internal inspection data. Neurocomputing 238:56–66
Article Google Scholar
Mohammeda MF, Lim CP (2017) A new hyperbox selection rule and a pruning strategy for the enhanced fuzzy min-max neural network. Neural Netw 86:69–79
Article Google Scholar
Shindea S, Kulkarni U (2016) Extracting classification rules from modified fuzzy minCmax neuralnetwork for data with mixed attributes. Appl Soft Comput 40:364–378
Article Google Scholar
Quteishat A, Lim CP (2008) A modified fuzzy min-max neural network with rule extraction and its application to fault detection and classification. Appl Soft Comput 8:985–995
Article Google Scholar
Park J-H, Kim T-H, Sugie T (2011) Output feedback model predictive control for LPV systems based on quasi-min-max algorithm. Automatica 47:2052–2058
Article MathSciNet Google Scholar
Iliadis LS, Spartalis S, Tachos S (2008) Application of fuzzy T-norms towards a new artificial neural networks? Evaluation framework: a case from wood industry. Inf Sci 178:3828–3839
Article Google Scholar
Marks RJ II, Oh S, Arabshahi P, Caudell TP, Choi JJ, Song BG (1992) Steepest descent adaptation of min-max fuzzy if-then rules. In: Proc. IJCNN, Beijing, China, vol. III, pp 471–477
Stoeva S, Nikov A (2000) A fuzzy backpropagation algorithm. Fuzzy Sets Syst 112:27–39
Article MathSciNet Google Scholar
Nikov A, Stoeva S (2001) Quick fuzzy backpropagation algorithm. Neural Netw 14:231–244
Article Google Scholar
Blanco A, Delgado M, Requena I (1995) Identification of fuzzy relational equations by fuzzy neural networks. Fuzzy Sets Syst 71:215–226
Article Google Scholar
Zhang X, Hang C-C (1996) The min-max function differentiation and training of fuzzy neural networks. IEEE Trans Neural Netw 7(5):1139–1149
Article Google Scholar
Li L, Qiao Z, Liu Y, Chen Y (2017) A convergent smoothing algorithm for training max-min fuzzy neural networks. Neurocomputing 260:404–410
Article Google Scholar
Peng J-M, Lin Z (1999) A non-interior continuation method for generalized linear complementarity problems. Math Program Ser A 86:533–563
Article MathSciNet Google Scholar
Tong X, Qi L, Felix W, Zhou H (2010) A smoothing method for solving portfolio optimization with CVaR and applications in allocation of generation asset. Appl Math Comput 216:1723–1740
MathSciNet MATH Google Scholar
Zhang H, Wu W, Liu F, Yao M (2009) Boundedness and convergence of online gradient method with penalty for feedforward neural networks. IEEE Trans Neural Netw 20:1050–1054
Article Google Scholar
Wei W, Li L, Yang J, Liu Y (2010) A modified gradient-based neuro-fuzzy learning algorithm and its convergence. Inf Sci 180:1630–1642
Article Google Scholar
Shao HM, Zheng GF (2011) Boundedness and convergence of online gradient method with penalty and momentum. Neurocomputing 74:765C770
Google Scholar
Loetamonphong J, Fang S-C (1999) An efficient solution procedure for fuzzy relation equations with max-product composition. IEEE Trans Fuzzy Syst 7:441–445
Article Google Scholar
Yeh CT (2008) On the minimal solutions of max-min fuzzy relational equations. Fuzzy Sets Syst 159:23–39
Article MathSciNet Google Scholar
Wang J, Wei W, Zurada JM (2011) Determinisitc convergence of conjugate gradient method for feedforward neural networks. Neurocomputing 74:2368–2376
Article Google Scholar

Download references

Acknowledgements

This project is partially supported by the Natural Science Foundation of China (11401185), the Hunan Provincial Natural Science Foundation of China (2017JJ2011, 14JJ6039), the Scientific Research Fund of Hunan Provincial Education Department (17A031, 13B004), the Science and Technology Plan Project of Hunan Province (Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, 2016TP1020) and China Scholarship Council.

Author information

Authors and Affiliations

College of Mathematics and Statistics, Hengyang Normal University, Hengyang, Hunan, 421008, China
Long Li & Zuqiang Long
Department of Mathematics, University of Texas Rio Grande Valley, Edinburg, TX, 78539, USA
Zhijun Qiao

Authors

Long Li
View author publications
You can also search for this author inPubMed Google Scholar
Zhijun Qiao
View author publications
You can also search for this author inPubMed Google Scholar
Zuqiang Long
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Long Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

First, let us make an estimation of matrix norm as preparation for the proof of our convergence theorem.

According to (12) and (13), we know that all second partial derivatives of $\widetilde{E}(w)$ exist in $R^n$ for any $t>0$ and its second partial derivative $\frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}$ is given by

$$\begin{aligned} \frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}= & {} \sum \limits _{s=1}^S\left\{ \lambda _j^s(w,t)\lambda _i^s(w,t)x^s_jx^s_i+\frac{1}{t} \left[ T^s-\widetilde{g}^s(w,t)\right] \lambda _j^s(w,t)\lambda _i^s(w,t)x^s_jx^s_i\right\} \nonumber \\= & {} \sum \limits _{s=1}^S \lambda _j^s(w,t)\lambda _i^s(w,t)x^s_jx^s_i\left[ 1+\frac{1}{t} (T^s-\widetilde{g}^s(w,t))\right] \end{aligned}$$

(22)

where $\lambda _i^s(w,t)=\frac{\exp \left( x^s_iw_i/t\right) }{\sum \nolimits _{j=1}^n\exp \left( x^s_jw_j/t\right) }$, $\widetilde{g}^s(w,t)=t\ln \sum \nolimits _{i=1}^n \exp \left( \frac{x_i^sw_i}{t}\right) $.

Then the Hessian matrix H(w) of $\widetilde{E}(w)$ is a square n-by-n matrix and defined as

$$\begin{aligned} H(w)=\left( \frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}\right) _{n\times n} \end{aligned}$$

(23)

It is easy to verify that all second partial derivatives of $\widetilde{E}(w)$ are continuous and the following equation holds for all $i,j=1,2,\ldots ,n$

$$\begin{aligned} \frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}=\frac{\partial ^2 \widetilde{E}(w) }{\partial w_j\partial w_i} \end{aligned}$$

Therefore, the Hessian matrix H(w) is a real symmetric matrix.

Suppose that $\lambda _1(w),\lambda _2(w),\ldots ,\lambda _n(w)$ are n real eigenvalues of matrix H(w) and the norm $\Vert \bullet \Vert _2$ defined for a matrix A by

$$\begin{aligned} \Vert {A}\Vert _2=\max \{\sqrt{\lambda }:\ \lambda \ is\ an\ eigenvalue\ of\ A^TA\} \end{aligned}$$

is taken as the matrix norm considered in the following discussion. We can obtain an estimation of the norm of matrix H(w) as shown in Lemma 2.

Lemma 2

Let H(w) be the Hessian matrix of $\widetilde{E}(w)$ and defined in (22) and (23). There exits a constant $C_1$ such that for all $w\in D$

$$\begin{aligned} \Vert H(w)\Vert _2\leqslant C_1 \end{aligned}$$

where D is a bounded set.

Proof

Since matrix H(w) is a real symmetric matrix and $\lambda _1(w),\lambda _2(w),\ldots ,\lambda _n(w)$ are n real eigenvalues of it, there exits an unitary matrix Q such that

$$\begin{aligned} H(w)=Q^Tdiag\left( \lambda _1(w),\lambda _2(w),\ldots ,\lambda _n(w)\right) Q \end{aligned}$$

Then

$$\begin{aligned} \left( H(w)\right) ^TH(w)=Q^Tdiag\left( \lambda _1^2(w),\lambda _2^2(w),\ldots ,\lambda _n^2(w)\right) Q \end{aligned}$$

It is easy to know that all eigenvalues of $\left( H(w)\right) ^TH(w)$ are

$$\begin{aligned} \lambda _1^2(w),\lambda _2^2(w),\ldots ,\lambda _n^2(w) \end{aligned}$$

Thus, we have

$$\begin{aligned} \Vert H(w)\Vert _2=\max \{|\lambda _i(w)|:i=1,2,\ldots ,n\} \end{aligned}$$

It follows from (22), (23) and the set D is bounded that $\lambda _i(w)$ is bounded for all $i=1,2,\ldots ,n$ and all $w\in D$, namely there exits a constant $C_1$ such that

$$\begin{aligned} |\lambda _i(w)|\leqslant C_1 \end{aligned}$$

for all $i=1,2,\ldots ,n$ and all $w\in D$. Therefore, we have

$$\begin{aligned} \Vert H(w)\Vert _2\leqslant C_1 \end{aligned}$$

This completes the proof of Lemma 2. $\square $

The following lemma is also crucial for the proof of our convergence theorem. This result is almost the same as Lemma 5.3 in [32] and the detail of its proof is omitted.

Lemma 3

Suppose that $F : \mathbb {R}^p\rightarrow \mathbb {R}^q,\ p\ge 1,\ q\ge 1$ is continuous on a bounded closed region $\mathbf{D}\subset \mathbb {R}^n$, and that $\mathbf{D_0}=\{\mathbf {z}\in \mathbf{D}:\ F(\mathbf {z})=0\}$. The projection of $\mathbf{D_0}$ on each coordinate axis does not contain any interior point. If a sequence $\{\mathbf {z}^k\}\subset \mathbf{D}$ satisfies

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \Vert F(\mathbf {z}^k)\Vert =0,\ \ \lim \limits _{k\rightarrow \infty }\Vert \mathbf {z}^{k+1}-\mathbf {z}^k\Vert =0. \end{aligned}$$

then there exists a unique $\mathbf {z}^*\in \mathbf{D_0}$ such that $\lim \nolimits _{k\rightarrow \infty }\mathbf {z}^k=\mathbf {z}^*$.

Now we are ready to prove the main theorems in terms of the above two lemmas.

Proof to Theorem 1

The proof is divided into four parts, dealing with (18)–(21) respectively.

Proof to (18). Expanding $\widetilde{E}(w^{k+1})$ with Taylor formula, we have for all $k=0,1,2,\ldots $ that

$$\begin{aligned} \widetilde{E}(w^{k+1})-\widetilde{E}(w^{k})= & {} (\nabla \widetilde{E}(w^{k}))^T\Delta w^{k}+\frac{1}{2}(\Delta w^{k})^TH(\xi )(\Delta w^{k})\\\leqslant & {} -\eta \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2+\frac{\Vert H(\xi )\Vert }{2}\eta ^2 \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2\\= & {} -\left( \eta -\frac{\Vert H(\xi )\Vert }{2}\eta ^2\right) \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2 \end{aligned}$$

where $\xi $ lies between $w^k$ and $w^{k+1}$. Write $\alpha =\eta -\frac{\Vert H(\xi )\Vert }{2}\eta ^2$. Then

$$\begin{aligned} \widetilde{E}(w^{k+1})\leqslant \widetilde{E}(w^{k})-\alpha \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2 \end{aligned}$$

(24)

We require the learning rate $\eta $ to satisfy

$$\begin{aligned} 0<\eta \leqslant \frac{1}{C} \end{aligned}$$

(25)

where $C=\frac{C_1}{2}$. By virtue of Lemma 2, we have

$$\begin{aligned} \alpha =\eta -\frac{\Vert H(\xi )\Vert }{2}\eta ^2=\eta \left( 1-\frac{\Vert H(\xi )\Vert }{2}\eta \right) \geqslant 0 \end{aligned}$$

This together with (24) leads to

$$\begin{aligned} \widetilde{E}(w^{k+1})\leqslant \widetilde{E}(w^k),\quad k=0,1,2,\ldots \end{aligned}$$

(26)

This completes the proof of (18).

Proof to (19). By the definition of $\widetilde{E}(w)$ in (12), it is easy to see that $\widetilde{E}(w^k)\ge 0$ for $\ k=0,1,2,\ldots $. Combining with (26), we then conclude by monotone convergence theorem that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\widetilde{E}(w^k)=\widetilde{E}^* \end{aligned}$$

(27)

where $\widetilde{E}^*=\inf _{w\in \Omega }\widetilde{E}(w)$. This proves (19).

Proof to (20). According to (24), it is easy to get

$$\begin{aligned} \widetilde{E}(w^{k+1})\leqslant \widetilde{E}(w^{k})-\alpha \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2 \leqslant \cdots \leqslant \widetilde{E}(w^{0})-\alpha \sum \limits _{t=0}^k\Vert \nabla \widetilde{E}(w^{t}))\Vert ^2 \end{aligned}$$

Since $\widetilde{E}(w^{k+1})\geqslant 0$, we have

$$\begin{aligned} \alpha \sum \limits _{t=0}^k\Vert \nabla \widetilde{E}(w^{t}))\Vert ^2\leqslant \widetilde{E}(w^{0}) \end{aligned}$$

Letting $k\rightarrow \infty $ results in

$$\begin{aligned} \sum \limits _{t=0}^\infty \Vert \nabla \widetilde{E}(w^{t}))\Vert ^2\leqslant \frac{1}{\alpha }\widetilde{E}(w^{0})<\infty \end{aligned}$$

This immediately gives

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Vert \nabla \widetilde{E}(w^k)\Vert =0 \end{aligned}$$

(28)

(20) is proved.

Proof to (21). By virtue of (14) and (28), we can get that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Vert w^{k+1}-w^{k}\Vert = \lim \limits _{k\rightarrow \infty }\Vert -\eta \nabla \widetilde{E}(w^k)\Vert =0. \end{aligned}$$

(29)

Note that the error function $\widetilde{E}(w)$ defined in (12) is continuous and differentiable. According to (13) and (15), it is easy to verify that $\nabla \widetilde{E}(w)$ is continuous on $w\in \mathbb {R}^n$. As (28), (29) and Assumption (A3) are valid, it follows immediately from Lemma 3 that (21) holds, that is, there exists a unique fixed point $w^*\in \Omega $ such that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }w^k=w^*. \end{aligned}$$

And this completes the proof of Theorem 1. $\square $

Proof to Theorem 2

Using the Taylor expansion, we can also get for all $k=0,1,2,\ldots $ that

$$\begin{aligned} \overline{E}(w^{k+1})-\overline{E}(w^{k})=(\nabla \overline{E}(w^{k}))^T\Delta w^{k}+\frac{1}{2} (\Delta w^{k})^TH(\xi )(\Delta w^{k}) \end{aligned}$$

where $\xi $ lies between $w^k$ and $w^{k+1}$. Since Hessian matrix H(w) of $\overline{E}(w)$ and $\widetilde{E}(w)$ have the same mathematical properties, we can follow the proof process of Theorem 1 and get the same results as in Theorem 1 for the error function $\overline{E}(w)$ and the sequence $\{w^k\}$ generated by SAMM. This completes the proof of Theorem 2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Qiao, Z. & Long, Z. A Smoothing Algorithm with Constant Learning Rate for Training Two Kinds of Fuzzy Neural Networks and Its Convergence. Neural Process Lett 51, 1093–1109 (2020). https://doi.org/10.1007/s11063-019-10135-4

Download citation

Published: 24 October 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11063-019-10135-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Smoothing Algorithm with Constant Learning Rate for Training Two Kinds of Fuzzy Neural Networks and Its Convergence

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel computational approach to approximate fuzzy interpolation polynomials

A New Combination Method for Fuzzy Optimal Control

Fuzzy Logic Programming for Tuning Neural Networks

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Lemma 2

Proof

Lemma 3

Proof to Theorem 1

Proof to Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now