Abstract
In this paper, a smoothing algorithm with constant learning rate is presented for training two kinds of fuzzy neural networks (FNNs): max-product and max-min FNNs. Some weak and strong convergence results for the algorithm are provided with the error function monotonically decreasing, its gradient going to zero, and weight sequence tending to a fixed value during the iteration. Furthermore, conditions for the constant learning rate are specified to guarantee the convergence. Finally, three numerical examples are given to illustrate the feasibility and efficiency of the algorithm and to support the theoretical findings.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baruch IS, Lopez RB, Guzman J-LO, Flores JM (2008) A fuzzy-neural multi-model for nonlinear systems identification and control. Fuzzy Sets Syst 159:2650–2667
Hengjie S, Chunyan M, Zhiqi S, Yuan M, Lee B-S (2009) A fuzzy neural network with fuzzy impact grades. Neurocomputing 72:3098–3122
Castro JR, Castillo O, Melin P, Rodríguez-Díaz A (2009) A hybrid learning algorithm for a class of interval type-2 fuzzy neural networks. Inf Sci 179:2175–2193
Juang C-F, Yang-YinLin, Chiu-ChuanTu (2010) A recurrent self-evolving fuzzy neural network with local feedbacks and itsapplication to dynamic system processing. Fuzzy Sets Syst 161:2552–2568
Khajeh A, Modarress H (2010) Prediction of solubility of gases in polystyrene by adaptive neuro-fuzzy inference system and radial basis function neural network. Expert Syst Appl 37:3070–3074
Nandedkar AV, Biswas PK (2007) A fuzzy min-max neural network classifier with compensatory neuron architecture. IEEE Trans Neural Netw 18:42–54
Sonule PM, Shetty BS (2017) An enhanced fuzzy min-max neural network with ant colony optimization based-rule-extractor for decision making. Neurocomputing 239:204–213
Peeva K (2013) Resolution of fuzzy relational equations-method, algorithm and software with applications. Inf Sci 234(10):44–63
Brouwer RK (2002) A discrete fully recurrent network of max product units for associative memory and classification. Int J Neural Syst 12(3–4):247–262
Wong W-K, Loo C-K, Lim W-S, Tan P-N (2010) Thermal condition monitoring system using log-polar mapping, quaternion correlation and max-product fuzzy neural network classification. Neurocomputing 74(1–3):164–177
Li Y, Zhong-Fu W (2008) Fuzzy feature selection based on min-max learning rule and extension matrix. Pattern Recognit 41:217–226
Wong W, Loo C, Lim W, Tan P (2010) Thermal condition monitoring system using log-polar mapping, quaternion correlation and max-product fuzzy neural network classification. Neurocomputing 74:164–177
Liu J, Ma Y, Zhang H, Hanguang S, Xiao G (2017) A modified fuzzy minCmax neural network for data clustering and its application on pipeline internal inspection data. Neurocomputing 238:56–66
Mohammeda MF, Lim CP (2017) A new hyperbox selection rule and a pruning strategy for the enhanced fuzzy min-max neural network. Neural Netw 86:69–79
Shindea S, Kulkarni U (2016) Extracting classification rules from modified fuzzy minCmax neuralnetwork for data with mixed attributes. Appl Soft Comput 40:364–378
Quteishat A, Lim CP (2008) A modified fuzzy min-max neural network with rule extraction and its application to fault detection and classification. Appl Soft Comput 8:985–995
Park J-H, Kim T-H, Sugie T (2011) Output feedback model predictive control for LPV systems based on quasi-min-max algorithm. Automatica 47:2052–2058
Iliadis LS, Spartalis S, Tachos S (2008) Application of fuzzy T-norms towards a new artificial neural networks? Evaluation framework: a case from wood industry. Inf Sci 178:3828–3839
Marks RJ II, Oh S, Arabshahi P, Caudell TP, Choi JJ, Song BG (1992) Steepest descent adaptation of min-max fuzzy if-then rules. In: Proc. IJCNN, Beijing, China, vol. III, pp 471–477
Stoeva S, Nikov A (2000) A fuzzy backpropagation algorithm. Fuzzy Sets Syst 112:27–39
Nikov A, Stoeva S (2001) Quick fuzzy backpropagation algorithm. Neural Netw 14:231–244
Blanco A, Delgado M, Requena I (1995) Identification of fuzzy relational equations by fuzzy neural networks. Fuzzy Sets Syst 71:215–226
Zhang X, Hang C-C (1996) The min-max function differentiation and training of fuzzy neural networks. IEEE Trans Neural Netw 7(5):1139–1149
Li L, Qiao Z, Liu Y, Chen Y (2017) A convergent smoothing algorithm for training max-min fuzzy neural networks. Neurocomputing 260:404–410
Peng J-M, Lin Z (1999) A non-interior continuation method for generalized linear complementarity problems. Math Program Ser A 86:533–563
Tong X, Qi L, Felix W, Zhou H (2010) A smoothing method for solving portfolio optimization with CVaR and applications in allocation of generation asset. Appl Math Comput 216:1723–1740
Zhang H, Wu W, Liu F, Yao M (2009) Boundedness and convergence of online gradient method with penalty for feedforward neural networks. IEEE Trans Neural Netw 20:1050–1054
Wei W, Li L, Yang J, Liu Y (2010) A modified gradient-based neuro-fuzzy learning algorithm and its convergence. Inf Sci 180:1630–1642
Shao HM, Zheng GF (2011) Boundedness and convergence of online gradient method with penalty and momentum. Neurocomputing 74:765C770
Loetamonphong J, Fang S-C (1999) An efficient solution procedure for fuzzy relation equations with max-product composition. IEEE Trans Fuzzy Syst 7:441–445
Yeh CT (2008) On the minimal solutions of max-min fuzzy relational equations. Fuzzy Sets Syst 159:23–39
Wang J, Wei W, Zurada JM (2011) Determinisitc convergence of conjugate gradient method for feedforward neural networks. Neurocomputing 74:2368–2376
Acknowledgements
This project is partially supported by the Natural Science Foundation of China (11401185), the Hunan Provincial Natural Science Foundation of China (2017JJ2011, 14JJ6039), the Scientific Research Fund of Hunan Provincial Education Department (17A031, 13B004), the Science and Technology Plan Project of Hunan Province (Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, 2016TP1020) and China Scholarship Council.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
First, let us make an estimation of matrix norm as preparation for the proof of our convergence theorem.
According to (12) and (13), we know that all second partial derivatives of \(\widetilde{E}(w)\) exist in \(R^n\) for any \(t>0\) and its second partial derivative \(\frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}\) is given by
where \(\lambda _i^s(w,t)=\frac{\exp \left( x^s_iw_i/t\right) }{\sum \nolimits _{j=1}^n\exp \left( x^s_jw_j/t\right) }\), \(\widetilde{g}^s(w,t)=t\ln \sum \nolimits _{i=1}^n \exp \left( \frac{x_i^sw_i}{t}\right) \).
Then the Hessian matrix H(w) of \(\widetilde{E}(w)\) is a square n-by-n matrix and defined as
It is easy to verify that all second partial derivatives of \(\widetilde{E}(w)\) are continuous and the following equation holds for all \(i,j=1,2,\ldots ,n\)
Therefore, the Hessian matrix H(w) is a real symmetric matrix.
Suppose that \(\lambda _1(w),\lambda _2(w),\ldots ,\lambda _n(w)\) are n real eigenvalues of matrix H(w) and the norm \(\Vert \bullet \Vert _2\) defined for a matrix A by
is taken as the matrix norm considered in the following discussion. We can obtain an estimation of the norm of matrix H(w) as shown in Lemma 2.
Lemma 2
Let H(w) be the Hessian matrix of \(\widetilde{E}(w)\) and defined in (22) and (23). There exits a constant \(C_1\) such that for all \(w\in D\)
where D is a bounded set.
Proof
Since matrix H(w) is a real symmetric matrix and \(\lambda _1(w),\lambda _2(w),\ldots ,\lambda _n(w)\) are n real eigenvalues of it, there exits an unitary matrix Q such that
Then
It is easy to know that all eigenvalues of \(\left( H(w)\right) ^TH(w)\) are
Thus, we have
It follows from (22), (23) and the set D is bounded that \(\lambda _i(w)\) is bounded for all \(i=1,2,\ldots ,n\) and all \(w\in D\), namely there exits a constant \(C_1\) such that
for all \(i=1,2,\ldots ,n\) and all \(w\in D\). Therefore, we have
This completes the proof of Lemma 2. \(\square \)
The following lemma is also crucial for the proof of our convergence theorem. This result is almost the same as Lemma 5.3 in [32] and the detail of its proof is omitted.
Lemma 3
Suppose that \(F : \mathbb {R}^p\rightarrow \mathbb {R}^q,\ p\ge 1,\ q\ge 1\) is continuous on a bounded closed region \(\mathbf{D}\subset \mathbb {R}^n\), and that \(\mathbf{D_0}=\{\mathbf {z}\in \mathbf{D}:\ F(\mathbf {z})=0\}\). The projection of \(\mathbf{D_0}\) on each coordinate axis does not contain any interior point. If a sequence \(\{\mathbf {z}^k\}\subset \mathbf{D}\) satisfies
then there exists a unique \(\mathbf {z}^*\in \mathbf{D_0}\) such that \(\lim \nolimits _{k\rightarrow \infty }\mathbf {z}^k=\mathbf {z}^*\).
Now we are ready to prove the main theorems in terms of the above two lemmas.
Proof to Theorem 1
The proof is divided into four parts, dealing with (18)–(21) respectively.
Proof to (18). Expanding \(\widetilde{E}(w^{k+1})\) with Taylor formula, we have for all \(k=0,1,2,\ldots \) that
where \(\xi \) lies between \(w^k\) and \(w^{k+1}\). Write \(\alpha =\eta -\frac{\Vert H(\xi )\Vert }{2}\eta ^2\). Then
We require the learning rate \(\eta \) to satisfy
where \(C=\frac{C_1}{2}\). By virtue of Lemma 2, we have
This together with (24) leads to
This completes the proof of (18).
Proof to (19). By the definition of \(\widetilde{E}(w)\) in (12), it is easy to see that \(\widetilde{E}(w^k)\ge 0\) for \(\ k=0,1,2,\ldots \). Combining with (26), we then conclude by monotone convergence theorem that
where \(\widetilde{E}^*=\inf _{w\in \Omega }\widetilde{E}(w)\). This proves (19).
Proof to (20). According to (24), it is easy to get
Since \(\widetilde{E}(w^{k+1})\geqslant 0\), we have
Letting \(k\rightarrow \infty \) results in
This immediately gives
(20) is proved.
Proof to (21). By virtue of (14) and (28), we can get that
Note that the error function \(\widetilde{E}(w)\) defined in (12) is continuous and differentiable. According to (13) and (15), it is easy to verify that \(\nabla \widetilde{E}(w)\) is continuous on \(w\in \mathbb {R}^n\). As (28), (29) and Assumption (A3) are valid, it follows immediately from Lemma 3 that (21) holds, that is, there exists a unique fixed point \(w^*\in \Omega \) such that
And this completes the proof of Theorem 1. \(\square \)
Proof to Theorem 2
Using the Taylor expansion, we can also get for all \(k=0,1,2,\ldots \) that
where \(\xi \) lies between \(w^k\) and \(w^{k+1}\). Since Hessian matrix H(w) of \(\overline{E}(w)\) and \(\widetilde{E}(w)\) have the same mathematical properties, we can follow the proof process of Theorem 1 and get the same results as in Theorem 1 for the error function \(\overline{E}(w)\) and the sequence \(\{w^k\}\) generated by SAMM. This completes the proof of Theorem 2. \(\square \)
Rights and permissions
About this article
Cite this article
Li, L., Qiao, Z. & Long, Z. A Smoothing Algorithm with Constant Learning Rate for Training Two Kinds of Fuzzy Neural Networks and Its Convergence. Neural Process Lett 51, 1093–1109 (2020). https://doi.org/10.1007/s11063-019-10135-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10135-4