Approximation of two-variable functions using high-order Takagi–Sugeno fuzzy systems, sparse regressions, and metaheuristic optimization

Wiktorowicz, Krzysztof; Krzeszowski, Tomasz

doi:10.1007/s00500-020-05238-3

Approximation of two-variable functions using high-order Takagi–Sugeno fuzzy systems, sparse regressions, and metaheuristic optimization

Foundations
Open access
Published: 05 September 2020

Volume 24, pages 15113–15127, (2020)
Cite this article

Download PDF

You have full access to this open access article

Soft Computing Aims and scope Submit manuscript

Approximation of two-variable functions using high-order Takagi–Sugeno fuzzy systems, sparse regressions, and metaheuristic optimization

Download PDF

2922 Accesses
8 Citations
Explore all metrics

Abstract

This paper proposes a new hybrid method for training high-order Takagi–Sugeno fuzzy systems using sparse regressions and metaheuristic optimization. The fuzzy system is considered with Gaussian fuzzy sets in the antecedents and high-order polynomials in the consequents of fuzzy rules. The fuzzy sets can be chosen manually or determined by a metaheuristic optimization method (particle swarm optimization, genetic algorithm or simulated annealing), while the polynomials are obtained using ordinary least squares, ridge regression or sparse regressions (forward selection, least angle regression, least absolute shrinkage and selection operator, and elastic net regression). A quality criterion is proposed that expresses a compromise between the prediction ability of the fuzzy model and its sparsity. The conducted experiments showed that: (a) the use of sparse regressions and/or metaheuristic optimization can reduce the validation error compared with the reference method, and (b) the use of sparse regressions may simplify the fuzzy model by zeroing some of the coefficients.

Sparse regressions and particle swarm optimization in training high-order Takagi–Sugeno fuzzy systems

Article Open access 08 July 2020

T–S Fuzzy Model Identification with Sparse Bayesian Techniques

Article 02 July 2019

Identification of time series models using sparse Takagi–Sugeno fuzzy systems with reduced structure

Article Open access 05 January 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Many different methods have been developed for automatic training fuzzy systems from observed data. In this paper, we propose a novel approach to train Takagi–Sugeno fuzzy systems for function approximation. This approach is based on sparse regressions and metaheuristic optimization. Sparse regressions give sparse solutions, which means that some of the model coefficients are exactly zero. Such models are easier to interpret (Sjöstrand et al. 2018), more compact and therefore easier to implement. In addition, sparse regressions provide regularization, and therefore, they can be used when the problem is ill-conditioned (e.g., when the number of variables exceeds the number of observations). Metaheuristics are modern nature-inspired algorithms widely used in global optimization problems (Glover and Kochenberger 2003). Metaheuristic means that in an algorithm, there is a “master strategy” at a higher level that guides heuristics applied in local search.

1.1 Related works

The literature on using metaheuristic optimization methods to train fuzzy systems is extensive. Further are discussed the papers in which hybrid methods using metaheuristics and regressions were applied. In such methods, the antecedents are trained by metaheuristic methods, while the consequents are trained by regressions.

One of the most commonly used algorithms to train fuzzy systems is particle swarm optimization (PSO). An approach presented in Li and Wu (2011) combines particle swarm optimization and recursive least squares estimator to obtain a fuzzy approximation. The PSO is used to train the antecedent part of the first-order T–S system, whereas the consequent part is trained by the RLSE method. Building a type-2 neural-fuzzy system was discussed in Yeh et al. (2011). In the first step, a fuzzy clustering method is used to partition the dataset into clusters. Then, a type-2 fuzzy Takagi–Sugeno–Kang (TSK) rule is derived from each cluster. The parameters are refined using PSO and a divide-and-merge-based least squares. In Ying et al. (2011), an approach to function approximation using robust fuzzy regression and particle swarm optimization is proposed. A fuzzy regression is used to construct the first-order TSK fuzzy model, whereas particle swarm optimization is used to tune its parameters. A self-learning complex neuro-fuzzy system that uses Gaussian complex fuzzy sets was proposed in Li et al. (2012). The knowledge base consists of the T–S fuzzy rules with complex fuzzy sets in the antecedent part and linear models in the consequent part. The antecedent parameters and the consequent parameters are trained by the particle swarm optimization algorithm and recursive least squares, respectively. In the paper (Soltani et al. 2012), a method for fuzzy c-regression model clustering was proposed. The method combines the advantages of two algorithms: clustering and particle swarm optimization. The consequent parameters of the first-order T–S fuzzy rules are estimated by the orthogonal least squares method. A self-constructing radial basis function neural-fuzzy system was proposed in Yang et al. (2013). The proposed method uses particle swarm optimization for generating the antecedent parameters and the least-Wilcoxon norm for the consequent parameters, instead of the traditional least squares estimation. A two-step fuzzy model building algorithm based on particle swarm optimization and kernel ridge regression was presented in Boulkaibet et al. (2017). In the first step, the clustering based on particle swarm optimization separates the input data into clusters and obtains the antecedent parameters. In the second step, the consequent parameters are calculated using a kernel ridge regression. In Taieb et al. (2018), the adaptive chaos particle swarm optimization algorithm (ACPSO) using weighted recursive least squares was proposed. The ACPSO is used to optimize the parameters of the model, and then, the obtained parameters are used to initialize the fuzzy c-regression model. A fuzzy model identification method was proposed in Tsai and Chen (2018). Firstly, the fuzzy c-means algorithm is used to determine the rule number. Next, the initial fuzzy sets and the consequent parameters are obtained by particle swarm optimization. The final parameters are obtained using fuzzy c-regression and orthogonal least squares methods. In the paper (Tu and Li 2018), there is proposed a complex-fuzzy machine learning approach to function approximation. Particle swarm optimization is used to select the premise parameters of the first-order fuzzy model, while the recursive least squares estimator is used to find the consequent parameters.

Other commonly used methods are genetic algorithms (GA). A two-step approach to the construction of first-order fuzzy rules from data was proposed in Setnes and Roubos (2000). In the first step, fuzzy clustering and the weighted least squares are used to obtain an initial fuzzy model. In the second step, this model is optimized by a real-coded genetic algorithm that allows simultaneous tuning of the rule antecedents and consequents. In Wang et al. (2005), a scheme based on multi-objective hierarchical GA (MOHGA) is proposed. This scheme is used to extract interpretable rule-based knowledge from data. First, fuzzy clustering is applied to generate an initial rule-based model. Then, the MOHGA and the recursive least squares estimator (RLSE) are used to obtain the optimized fuzzy models. A fuzzy modeling approach for the identification of nonlinear control processes was discussed in Yusof et al. (2011). This approach is based on a combination of genetic algorithm and recursive least squares. The antecedent parameters of the first-order T–S model are tuned by a genetic algorithm, whereas the consequent part is identified by recursive least squares estimator.

Various other approaches to train fuzzy models using metaheuristic optimization can be found in Almaraashi et al. (2016), Cordón et al. (2000, (2001), Cheung et al. (2014), Juang and Lo (2008), Khayat et al. (2009), Khosla et al. (2005, (2007), Lin (2008), Lin et al. (2016), Martino et al. (2014), Niu et al. (2008), Prado et al. (2010), Rastegar et al. (2017), Shihabudheen et al. (2018), Yanar and Akyürek (2011), Zhao et al. (2010). Advantages and disadvantages of the reviewed methods and the proposed method are presented in Table 1.

Table 1 Advantages and disadvantages of the methods used in related papers

Full size table

1.2 Contributions

From the literature review, it is seen that at most first-order polynomials are used in the consequent part to train fuzzy systems. In this paper, we propose to use high-order fuzzy systems for two-variable function approximation. In such systems, higher-order polynomials in the consequent part of rules are used, which can give greater flexibility in the selection of system parameters. Moreover, there is no use of sparse regressions for two-variable function approximation. Sparse regressions can generate sparse models (Sjöstrand et al. 2018), which are more compact, easier to interpret and implement. Summarizing, the main contributions of this paper can be stated as:

The definition of high-order Takagi–Sugeno fuzzy systems with two input variables,
The use of sparse regressions and metaheuristic optimization to train these systems.

In the proposed method, the premise parameters are determined manually or by metaheuristic optimization methods such as particle swarm optimization (PSO), genetic algorithm (GA), and simulated annealing (SA). The consequent parameters are calculated by ordinary least squares (OLS), ridge regression (RIDGE), and sparse regressions. The following sparse regressions have been used: forward selection (FS), least angle regression (LAR), least absolute shrinkage and selection operator (LASSO), and elastic net regression (ENET). The OLS regression was used as a reference model. This paper is a continuation of the work (Wiktorowicz and Krzeszowski 2020), where the approximation of one-variable functions was considered.

1.3 Paper structure

The structure of this paper is as follows. Section 2 describes the Takagi–Sugeno fuzzy system with two inputs and with high-order polynomials in the consequent parts of the fuzzy rules. Section 3 presents the training methods of the consequent parameters when using the OLS, RIDGE, and sparse regressions. Section 4 presents the training methods of the antecedent parameters when using the PSO, GA, and SA methods. The performance criterion is described in Sect. 5. Section 6 contains the design procedure for training fuzzy models. In Sect. 7, the experimental results are presented. Finally, the conclusions are given in Sect. 8.

2 High-order Takagi–Sugeno fuzzy system

We consider a Takagi–Sugeno (T–S) fuzzy system (Takagi and Sugeno 1985) with two inputs $x_1$, $x_2$ and one output y described by r fuzzy inference rules

$$\begin{aligned} \begin{aligned} R_{j}:&\text { IF } x_1\in F_j(x_1) \text { AND } x_2\in G_j(x_2) \\&\text { THEN } y = P_j(x_1,x_2), \end{aligned} \end{aligned}$$

(1)

where $j=1,2,\ldots ,r$, $F_j(x_1)$, $G_j(x_2)$ are fuzzy sets, and $P_j(x_1,x_2)$ is the polynomial of degree d.

Definition 1

The T–S system with the rules (1) is called:

Zero-order if $P_j(x_1,x_2)=b_j$, where $b_j\in {\mathbb {R}}$, which means that the consequent functions are constants (polynomial degree d is equal to zero) (Takagi and Sugeno 1985),
First-order if $P_j(x_1,x_2)=w_{1j}x_1+v_{1j}x_2+b_j$, where $w_{1j},v_{1j}\in {\mathbb {R}}$, which means that the consequent functions are linear (polynomial degree d is equal to one) (Takagi and Sugeno 1985),
High-order if $P_j(x_1,x_2)=w_{mj}x_1^m +\ldots +w_{1j}x_1+v_{mj}x_2^m +\ldots +v_{1j}x_2+b_j$, where $m\ge 2$, $w_{kj},v_{kj}\in {\mathbb {R}}$, and $k=2,3,\ldots ,m$, which means that the consequent functions are nonlinear (polynomial degree d is greater than one).

In this paper, we use Gaussian membership functions that can be unevenly spaced in the universe of discourse (see Fig. 1). These functions are defined by

$$\begin{aligned}&\begin{aligned} A_k(x_1)&= {\mathrm {gauss}}(x_1;p_k,\sigma _k)\\&= \exp \left( {-\frac{1}{2}\left( \frac{x_1-p_k}{\sigma _k}\right) ^2}\right) , \end{aligned} \end{aligned}$$

(2)

$$\begin{aligned}&\begin{aligned} B_k(x_2)&= {\mathrm {gauss}}(x_2;q_k,\delta _k)\\&= \exp \left( {-\frac{1}{2}\left( \frac{x_2-q_k}{\delta _k} \right) ^2}\right) , \end{aligned} \end{aligned}$$

(3)

where $x_1\in {\mathbb {X}}_1=[p_1,p_\rho ]$, $x_2\in {\mathbb {X}}_2=[q_1,q_\rho ]$, $k=1,2,\ldots ,\rho $, $\rho $ is the number of fuzzy sets for the inputs, $p_k$, $q_k$ are the peaks, and $\sigma _k,\delta _k>0$ are the widths. Using the definitions of fuzzy sets $A_k(x_1)$ and $B_k(x_2)$, the fuzzy rules (1) are written as table presented in Table 2, where $r = \rho ^2$. The output of the T–S system is computed by

$$\begin{aligned} y= \dfrac{\sum _{j=1}^r F_j(x_1)G_j(x_2)P_j(x_1,x_2)}{\sum _{j=1}^r F_j(x_1)G_j(x_2)}. \end{aligned}$$

(4)

Table 2 Fuzzy rules for the Takagi–Sugeno system

Full size table

Definition 2

Wang and Mendel (1992) The fuzzy basis function (FBF) for the jth rule is the function $\xi _j(x_1,x_2)$ given by

$$\begin{aligned} \xi _j(x_1,x_2)=\dfrac{F_j(x_1)G_j(x_2)}{\sum _{j=1}^{r} F_j(x_1)G_j(x_2)}. \end{aligned}$$

(5)

Applying (5), the output of the T–S system can be written:

For the zero-order system as
$$\begin{aligned} y = \sum _{j=1}^{r} \xi _j(x_1,x_2)b_j, \end{aligned}$$
(6)
For the first-order and high-order systems as
$$\begin{aligned} \begin{aligned} y&= \sum _{j=1}^{r} \xi _j(x_1,x_2)x_1^m w_{mj} + \ldots + \xi _j(x_1,x_2)x_1 w_{1j} \\&\quad + \xi _j(x_1,x_2)x_2^m v_{mj} + \ldots + \xi _j(x_1,x_2)x_2 v_{1j} \\&\quad + \xi _j(x_1,x_2)b_j. \end{aligned} \end{aligned}$$
(7)

Because in (7) the FBFs are multiplied by $x_1^l$ and $x_2^l$ where $l=1,2,\ldots ,m$, we define a modified fuzzy basis function.

Definition 3

The modified FBF (MFBF) for the jth rule is the function $h_{lj}(x_1,x_2)$ or $g_{lj}(x_1,x_2)$ given by

$$\begin{aligned} h_{lj}(x_1,x_2)&= \xi _j(x_1,x_2)x_1^l, \end{aligned}$$

(8)

$$\begin{aligned} g_{lj}(x_1,x_2)&= \xi _j(x_1,x_2)x_2^l. \end{aligned}$$

(9)

Applying (8) and (9) we obtain

$$\begin{aligned} \begin{aligned} y&= \sum _{j=1}^{r} h_{mj}(x_1,x_2)w_{mj} + \ldots + h_{1j}(x_1,x_2)w_{1j} \\&\quad + g_{mj}(x_1,x_2)v_{mj} + \ldots + g_{1j}(x_1,x_2) v_{1j} \\&\quad + \xi _j(x_1,x_2)b_j. \end{aligned} \end{aligned}$$

(10)

We introduce the following vectors:

For the zero-order system as
$$\begin{aligned} {\mathbf {h}}_j(x_1,x_2)&= \xi _j(x_1,x_2), \end{aligned}$$
(11)
$$\begin{aligned} {\mathbf {w}}_j&= b_j, \end{aligned}$$
(12)
For the first-order and high-order systems as
$$\begin{aligned} {\mathbf {h}}_j(x_1,x_2)&= [h_{mj},\ldots ,h_{1j},g_{mj},\ldots ,g_{1j},\xi _j], \end{aligned}$$
(13)
$$\begin{aligned} {\mathbf {w}}_j&= [w_{mj},\ldots ,w_{1j},v_{mj},\ldots ,v_{1j},b_j]^T, \end{aligned}$$
(14)
where $\dim ({\mathbf {h}}_j)=\dim ({\mathbf {w}}_j^T)=2d+1$.

The output of the T–S system can now be written as

$$\begin{aligned} \begin{aligned} y&= [{\mathbf {h}}_1(x_1,x_2),\ldots ,{\mathbf {h}}_r(x_1,x_2)] \begin{bmatrix} {\mathbf {w}}_1 \\ \vdots \\ {\mathbf {w}}_r\\ \end{bmatrix}\\&={\mathbf {h}}(x_1,x_2){\mathbf {w}}, \end{aligned} \end{aligned}$$

(15)

where

$$\begin{aligned} {\mathbf {h}}(x_1,x_2)&=[{\mathbf {h}}_1(x_1,x_2),\ldots ,{\mathbf {h}}_r(x_1,x_2)], \end{aligned}$$

(16)

$$\begin{aligned} {\mathbf {w}}&=[{\mathbf {w}}_1, \ldots , {\mathbf {w}}_r]^T. \end{aligned}$$

(17)

The vector ${\mathbf {w}}$ contains $p=r(2d+1)$ parameters of the T–S fuzzy model to be determined.

3 Training the consequent parameters

We assume as known the observations $([(x_1)_i,(x_2)_i]^T,y_i)$, where $i=1,\dots ,n$ and n is the number of observations. We introduce the regression matrix

$$\begin{aligned} \underset{n\times r(2d+1)}{{\mathbf {X}}} = \begin{bmatrix} {\mathbf {h}}_1((x_1)_1,(x_2)_1),\ldots ,{\mathbf {h}}_r((x_1)_1,(x_2)_1)\\ {\mathbf {h}}_1((x_1)_2,(x_2)_2),\ldots ,{\mathbf {h}}_r((x_1)_2,(x_2)_2)\\ \vdots \\ {\mathbf {h}}_1((x_1)_n,(x_2)_n),\ldots ,{\mathbf {h}}_r((x_1)_n,(x_2)_n) \end{bmatrix},\nonumber \\ \end{aligned}$$

(18)

where ${\mathbf {h}}_j((x_1)_i,(x_2)_i)$ is given by (11) or (13).

3.1 Ordinary least squares

The cost function to be minimized in the OLS is the sum of squared errors

$$\begin{aligned} J_{\mathrm {OLS}} = \sum _{i=1}^n\big (y_i-{\hat{y}}_i\big )^2 = \sum _{i=1}^n\big (y_i-{\mathbf {h}}((x_1)_i,(x_2)_i){\mathbf {w}}\big )^2,\nonumber \\ \end{aligned}$$

(19)

where ${\hat{y}}_i={\mathbf {h}}((x_1)_i,(x_2)_i){\mathbf {w}}$ is the estimated output of the system (see Eq. 15) for the ith observation. The optimal solution is given by Bishop (2006)

$$\begin{aligned} {\mathbf {w}}=\big ({\mathbf {X}}^T{\mathbf {X}}\big )^{-1}{\mathbf {X}}^T{\mathbf {y}}, \end{aligned}$$

(20)

where ${\mathbf {y}}=[y_1,\ldots ,y_n]^T$. Because the model parameters are computed directly from all the data contained in ${\mathbf {X}}$ and ${\mathbf {y}}$, this method is a batch least squares.

3.2 Ridge regression

The cost function in the ridge regression (Hoerl and Kennard 1970) is the penalized sum of squared errors

$$\begin{aligned} J_{\mathrm {RIDGE}}&= \sum _{i=1}^n\big (y_i-{\hat{y}}_i\big )^2 + \lambda {{\mathbf {w}}^T}{\mathbf {w}} \end{aligned}$$

(21)

$$\begin{aligned}&= \sum _{i=1}^n\big (y_i-{\mathbf {h}}((x_1)_i,(x_2)_i){\mathbf {w}}\big )^2 + \lambda {{\mathbf {w}}^T}{\mathbf {w}}, \end{aligned}$$

(22)

where $\lambda \ge 0$ is a regularization parameter. The fuzzy model weights are given by

$$\begin{aligned} {\mathbf {w}}=\big ({\mathbf {X}}^T{\mathbf {X}}+\lambda {\mathbf {I}}\big )^{-1}{\mathbf {X}}^T{\mathbf {y}}, \end{aligned}$$

(23)

where ${\mathbf {I}}$ is the identity matrix. The ridge regression is applied in this paper because it can be used for ill-conditioned problems, that is when the matrix ${\mathbf {X}}^T{\mathbf {X}}$ is close to singular. The ridge regression, similarly as the OLS, is a one-pass method, and therefore it is very fast.

3.3 Sparse regressions

The sparse regressions briefly described in this section allow the coefficients of a model to be exactly zero (Sjöstrand et al. 2018). These regressions lead to simplified models that are easier to interpret.

In the forward selection, that is an example of stepwise regression, the variables are added one by one to the model. In the beginning, all coefficients are equal to zero, and then a particular variable is chosen. The next variable to include can be chosen based on a number of criteria. For example, it can be the one that has the highest correlation with the current residual vector (Sjöstrand et al. 2018).

The least angle regression (Efron et al. 2004; Sjöstrand et al. 2018) works similarly to the FS procedure, but the algorithm does not move in the direction of one variable. In the LAR, the estimated parameters are calculated in a direction in which the angles with each of the variables currently in the model are equal. This algorithm is the basis for other sparse methods, such as the LASSO and elastic net regression.

The least absolute shrinkage and selection operator regression (Sjöstrand et al. 2018; Tibshirani 1996) has a mechanism that implements a coefficient shrinkage and variable selection. The cost function combines the sum of the squared errors and the penalty function based on the $L_1$ norm:

$$\begin{aligned} J_{\mathrm {LASSO}}({\mathbf {w}},\lambda )&= {\Vert {\mathbf {y}}-{\mathbf {X}}{\mathbf {w}} \Vert }_2^2 + \lambda {\Vert {\mathbf {w}}\Vert }_1, \end{aligned}$$

(24)

where $ \lambda $ is a nonnegative regularization parameter.

The elastic net regression (Sjöstrand et al. 2018; Zou and Hastie 2005) combines the features of the ridge regression and the LASSO. The cost function includes a penalty term related to both the $L_1$ and the $L_2$ norms:

$$\begin{aligned} J_{\mathrm {ENET}}({\mathbf {w}},\delta ,\lambda )&= {\Vert {\mathbf {y}}-{\mathbf {X}}{\mathbf {w}} \Vert }_2^2 + \delta {\Vert {\mathbf {w}}\Vert }_2^2 + \lambda {\Vert {\mathbf {w}}\Vert }_1, \end{aligned}$$

(25)

where $\lambda $ and $\delta $ are nonnegative regularization parameters. The solution is found by the LARS-EN algorithm, which is based on the LARS algorithm (Efron et al. 2004).

Example 1

Consider a simple regression problem for a small amount of data. We have four observations ($n=4$) in the form of vectors ${\mathbf {x}}=[1, 2, 3, 4]^T$ and ${\mathbf {y}}=[6, 5, 7, 10]^T$. The goal is to build a regression model $y=ax+b$, where $\varvec{\beta } = [a,b]$ is the vector of the model coefficients. To obtain a model with the intercept term (the constant b different from zero), we add the column of ones to the regression matrix, which has the form

$$\begin{aligned} \underset{4\times 2}{{\mathbf {X}}} = \begin{bmatrix} 1,1\\ 2,1\\ 3,1\\ 4,1\\ \end{bmatrix}. \end{aligned}$$

(26)

It is easy to check that the OLS method gives the solution $y=1.4x+3.5$, where $a=1.4$ and $b=3.5$. Applying the FS, we obtain three solutions in the coefficient path

$$\begin{aligned} \varvec{\beta }_1 = [0, 0],\; \varvec{\beta }_2 = [2.567, 0],\; \varvec{\beta }_3 = [1.4, 3.5]. \end{aligned}$$

(27)

The LAR and the LASSO methods generate

$$\begin{aligned} \varvec{\beta }_1 = [0, 0],\; \varvec{\beta }_2 = [2.45, 0],\; \varvec{\beta }_3 = [1.4, 3.5] \end{aligned}$$

(28)

and using the ENET with $\delta =0.1$ we obtain

$$\begin{aligned} \varvec{\beta }_1 = [0, 0],\; \varvec{\beta }_2 = [2.682, 0],\; \varvec{\beta }_3 = [1.678, 3.421]. \end{aligned}$$

(29)

We can see that in the solution $\varvec{\beta }_2$, the coefficient b is exactly zero, that results from using the sparse regressions. The selection of one of the solutions is based on a specific criterion, e.g., cross-validation, Akaike’s information criterion, or Bayesian information criterion (Sjöstrand et al. 2018).

4 Training the antecedent parameters

The following metaheuristic optimization methods were used to train the antecedent parameters: particle swarm optimization, (Eberhart and Shi 2000; Kennedy and Eberhart 1995; MathWorks 2019a), genetic algorithm (Holland 1992; Whitley 1994; MathWorks 2019a), and simulated annealing (Kirkpatrick et al. 1983; MathWorks 2019a).

4.1 Particle swarm optimization

Particle swarm optimization is a population-based algorithm developed by Kennedy and Eberhart Eberhart and Shi (2000), Kennedy and Eberhart (1995). It is based on the social behavior of living organisms that live in large groups like birds flock or fish school. In PSO, a group of particles (a population) forms a swarm, in which each particle represents a hypothetical solution. The particle remembers its best position ${\mathbf {pbest}}$ and has access to the best position ${\mathbf {gbest}}$ in the swarm. The best local and global positions are selected using an objective function (Sect. 5). The learning scheme is based on two components:

Cognition component—attracts particles toward the local best position,
Social component—attracts particles toward the best position in the swarm.

The velocity ${\mathbf {v}}_k$ and the position ${\mathbf {x}}_k$ of the kth particle are calculated based on the following equations (Eberhart and Shi 2000; MathWorks 2019a):

$$\begin{aligned} {\mathbf {v}}^{l+1}_{k}= & {} \omega {\mathbf {v}}^{l}_{k}+c_1 {\mathbf {r}}_{1}({\mathbf {pbest}}^l_{k}-{\mathbf {x}}^{l}_{k})+c_2 {\mathbf {r}}_{2}({\mathbf {gbest}}^l-{\mathbf {x}}^{l}_{k}), \end{aligned}$$

(30)

$$\begin{aligned} {\mathbf {x}}^{l+1}_{k}= & {} {\mathbf {x}}^{l}_{k}+{\mathbf {v}}^{l+1}_{k}, \end{aligned}$$

(31)

where $\omega $ is the inertia weight, ${\mathbf {r}}_{1}$, ${\mathbf {r}}_{2}$ are vectors of random numbers uniformly distributed within [0,1], l is the current iteration number, and $c_1$, $c_2$ are the cognitive and social coefficients, respectively.

4.2 Genetic algorithm

Genetic algorithm (Holland 1992; MathWorks 2019a; Whitley 1994) is a method for solving optimization problems inspired by the biological process of Darwinian evolution, where selection, crossover, and mutation play a major role. The GA repeatedly modifies a population to achieve new and possibly better solutions. In each generation of the GA, the individuals are randomly selected from the current population to be “parents” and used to obtain “children” for the next generation. In subsequent generations, the population “evolves” toward the optimal solution.

The GA uses three main types of rules to create the next generation from the current population:

Selection—during this process, individuals called “parents” are selected through a fitness-based process. Individuals with a good value of the objective function (Sect. 5) are more often chosen for the next generation,
Crossover (recombination)—combines two “parents” to form “children” for the next generation; it is analogous to the crossover that takes place during sexual reproduction in biology. The new individuals have the characteristics of both parents,
Mutation—during the mutation process, an individual mutates that is random changes are introduced into the genotype. The purpose of this rule is to introduce diversity in the population that prevents the premature convergence of the algorithm.

Crossover and mutation characterize the explorative and exploitative features of GA. Maintaining a balance between these two features is crucial to speed up the search process and to achieve high-quality solutions.

4.3 Simulated annealing

Simulated annealing (Kirkpatrick et al. 1983; MathWorks 2019a) is a method for solving unconstrained and bound-constrained optimization problems. This method was originally inspired by the process of annealing in metallurgy. The SA models the process of heating a material and then gradually lowering the temperature in order to reduce defects. The goal is to move the system from the initial state to the state with minimum energy. As the algorithm runs, a new state is randomly generated and accepted with a certain probability. The acceptance probability is a function that depends on the energies of the two states and the temperature

$$\begin{aligned} p({\varDelta }E,T) = \frac{1}{1+\exp ({\varDelta }E/T)}, \end{aligned}$$

(32)

where ${\varDelta }E$ is the difference of energies of the present and previous solution (${\varDelta }E = E_{k+1}-E_k$) and T is the current temperature. The algorithm systematically decreases the temperature and stores the best state found so far. The energy determines how good the solution is, and it corresponds to the value of the objective function (Sect. 5).

5 Performance criterion

The objective function for all methods is the square root of the mean square error

$$\begin{aligned} \mathrm {RMSE} =\sqrt{\frac{1}{V}\sum _{k=1}^V\left( y_k-{\hat{y}}_k\right) ^2}, \end{aligned}$$

(33)

where V denotes the number of observations in the validation set, $y_k$ denotes the kth output data in the validation set, and ${\hat{y}}_{k}$ denotes the output of the fuzzy model obtained for the kth input data in the validation set. The fuzzy model used to calculate the estimate ${\hat{y}}_{k}$ is obtained based on the observations in the training set.

Fuzzy models used in this paper may be sparse, which means they may have some coefficients equal to zero. To describe the sparsity of a fuzzy model, we propose the following definition.

Definition 4

The sparsity of a T–S fuzzy model is defined as

$$\begin{aligned} S = \frac{z}{r(2d+1)}, \end{aligned}$$

(34)

where $S\in [0,1]$, z is the number of zero-valued coefficients in the polynomials, r is the number of rules, and d is the polynomial degree.

Definition 5

The density of a T–S fuzzy model is defined as one minus the sparsity:

$$\begin{aligned} D = 1-S. \end{aligned}$$

(35)

In this paper, the best T–S model is chosen by minimizing a quality criterion in which the goal is to make the objective function (33) and the density as small as possible:

$$\begin{aligned} Q = \alpha \frac{\mathrm {RMSE}}{ {\overline{\mathrm {{RMSE}_{OLS}}}} } + (1-\alpha )D, \end{aligned}$$

(36)

where $\alpha \in [0,1]$. The $\overline{\mathrm {{RMSE}_{OLS}}}$ is the mean value of $\mathrm {RMSE}$ for the OLS regression that is treated as the reference method. The quality index (36) expresses a compromise between the prediction ability of the model and its sparsity.

6 Design procedure for training fuzzy models

The following methods for building fuzzy models are applied in this paper:

Non-sparse methods:
- OLS: the method in which the fuzzy sets are defined by the user, while the polynomials are determined by the OLS regression,
- RIDGE: the method in which the fuzzy sets are defined by the user, while the polynomials are determined by the ridge regression,
- PSO-OLS: the method in which the fuzzy sets are determined by the PSO algorithm, while the polynomials are determined by the OLS regression,
- PSO-RIDGE: the method in which the fuzzy sets are determined by the PSO algorithm, while the polynomials are determined by the ridge regression,
- GA-OLS: the method in which the fuzzy sets are determined by the GA, while the polynomials are determined by the OLS regression,
- GA-RIDGE: the method in which the fuzzy sets are determined by the GA, while the polynomials are determined by the ridge regression,
- SA-OLS: the method in which the fuzzy sets are determined by the SA algorithm, while the polynomials are determined by the OLS regression,
- SA-RIDGE: the method in which the fuzzy sets are determined by the SA algorithm, while the polynomials are determined by the ridge regression,
Sparse methods:
- SR: the method in which the fuzzy sets are defined by the user, while the polynomials are determined by a sparse regression (SR), e.g., FS, LAR, LASSO or ENET,
- PSO-SR: the method in which the fuzzy sets are determined by the PSO algorithm, while the polynomials are determined by a sparse regression,
- GA-SR: the method in which the fuzzy sets are determined by the GA, while the polynomials are determined by a sparse regression,
- SA-SR: the method in which the fuzzy sets are determined by the SA algorithm, while the polynomials are determined by a sparse regression.

Table 3 Performance comparison for Experiment 1; $\overline{\mathrm {RMSE}}$ is the mean of the validation error, std is the standard deviation, min is the minimum value, max is the maximum value, p is the p-value of Wilcoxon test, ${\overline{S}}$ is the mean of the model sparsity, ${\overline{Q}}$ is the mean of the quality index

Full size table

The design procedure for training fuzzy models is presented in Fig. 2. In Block 1, the Gaussian fuzzy sets are proposed. In the OLS, RIDGE, and SR methods, one proposition is generated in such a way that these sets are distributed evenly in the spaces ${\mathbb {X}}_1$, ${\mathbb {X}}_2$, and the cross-point of two adjacent sets is equal to 0.5. In the PSO-OLS, PSO-RIDGE, GA-OLS, GA-RIDGE, SA-OLS, SA-RIDGE and PSO-SR, GA-SR, SA-SR methods, 10 propositions are generated by the PSO, GA or SA algorithms. The outputs of Block 1 are the vectors ${\mathbf {p}}$, $\varvec{\sigma }$, and ${\mathbf {q}}$, $\varvec{\delta }$. In Block 2, the regression matrix ${\mathbf {X}}$ (18) is determined. In Block 3, the coefficient path for one of the SR methods is generated. In Block 4, no-sparse methods are validated. As a result of validating the OLS method, the value of $\mathrm {RMSE_{OLS}}$ in the quality criterion (36) is obtained. In Block 5, sparse methods are validated. The validation is done along the coefficient path. For all propositions, the $\mathrm {RMSE}$, the sparsity S, and the quality index Q are calculated. Then, the smallest value of Q is chosen with the constraint that the $\mathrm {RMSE}$ is not greater than $\mathrm {RMSE_{OLS}}$.

7 Experimental results

7.1 Experimental setup

This section gives examples of two-variable nonlinear function approximation. The following parameters were used in all experiments. The number of observations $([(x_1)_i,(x_2)_i]^T,y_i)$ was $n=81$, and they were evenly distributed in the space ${\mathbb {X}}_1\times {\mathbb {X}}_2$. The best method was selected using the Monte-Carlo cross-validation (MCCV) (Picard and Cook 1984), in which the data set was divided randomly into some fraction of data to form the training set and to assign the rest of the points to the validation set. This process was repeated 10 times, generating new training and validation partitions in the proportion of 70% test data and 30% validation data. Statistical analysis was carried out using Wilcoxon signed rank test for differences in $\mathrm {RMSE}$ results between all methods and the reference method (OLS). For the inputs of the fuzzy system, three fuzzy sets were defined, which gave nine fuzzy inference rules. The widths of fuzzy sets were bounded in the intervals $[\sigma _{min},\sigma _{max}]=[\delta _{min},\delta _{max}]=[0.0849,2.123]$. The degree of the polynomials in the consequent part was set to two. For the ridge regression (23), $\lambda =1\mathrm {e-}08$ and the ENET regression (25), $\delta =1\mathrm {e-}8$. The number of objective function evaluations was 6000. The parameter in the quality criterion (36) was $\alpha =0.5$. For metaheuristic algorithms, default parameter values were adopted in accordance with the implementation contained in the Matlab toolbox. The experiments were carried out on a mobile computer equipped with Intel(R) Core(TM) i5-7200U and 8GB RAM.

Table 4 Parameters of fuzzy systems in Experiment 1; p, q—peaks of membership functions, $\sigma $, $\delta $ – widths of membership functions, $w_1$, $w_2$, $v_1$, $v_2$, b – polynomial coefficients in the consequent part

Full size table

7.2 Implementation

The function regress from the Matlab Statistics and Machine Learning Toolbox (MathWorks 2019b) has been used to apply the OLS regression. The ridge regression has been implemented in Matlab using a custom function.

The sparse regressions have been implemented in Matlab using the toolbox SpaSM (Sjöstrand et al. 2018). From this toolbox, the following functions have been used: forwardselection, lar, lasso, and elasticnet. These functions take the regression matrix ${{\mathbf {X}}}$ and the vector ${{\mathbf {y}}}$ as arguments. Moreover, the function elasticnet has the regularization parameter $\delta $. As the output, the described functions return the solution path in the form of the coefficients ${{\mathbf {w}}}$, from which the best solution can be selected.

The metaheuristic methods have been implemented using the Global Optimization Toolbox in Matlab (MathWorks 2019a). From this toolbox, the following functions have been used: particleswarm, ga, and simulannealbnd. These functions allow the solution to be obtained subject to the bounds defined by the user. They operate on the vector that contains the parameters of Gaussian membership functions:

where $p_1, \ldots , p_{\rho }$, $q_1, \ldots , q_{\rho }$ are the peaks of membership functions, $\sigma _1, \ldots , \sigma _{\rho }$, $\delta _1, \ldots ,\delta _{\rho }$ are the widths of membership functions.

7.3 Results of experiment 1

We consider the nonlinear function (Yeh et al. 2011)

$$\begin{aligned} y = x_1^2\sin (\pi x_2), \end{aligned}$$

(37)

Table 5 Training time comparison for Experiment 1 and Experiment 2

Full size table

where $x_1\in [-1,1]$ and $x_2\in [0,1]$. The results of Experiment 1 are presented in Table 3. The statistical analysis of the $\mathrm {RMSE}$ showed that most of the calculated models generate significantly different results ($p < 0.05$) compared to the OLS model. The exception is the LAR, LASSO and ENET models. The smallest value of the quality index ${\overline{Q}}$ is equal to 0.1450 and was obtained for the PSO-ENET method. For this method, the validation error $\overline{\mathrm {RMSE}}$ is $1.864\mathrm {e-}03$, and it is smaller than for the reference model for which this error is equal to $4.800\mathrm {e-}02$. The sparsity ${\overline{S}}$ is 0.7489 that means that the PSO-ENET method zeroed out 75% of 45 coefficients. Thanks to this, the model is easier to interpret and implement. Table 4 contains the parameters of the fuzzy systems obtained by the OLS and the PSO-ENET methods. Based on this table, the fuzzy rules for the PSO-ENET model can be written as

$$\begin{aligned} \begin{aligned} R_{1}:&\text { IF } x_1\in {\mathrm {gauss}}(x_1,-0.9053,2.106)\\&\text { AND } x_2\in {\mathrm {gauss}}(x_2,0.2191,0.4545) \\&\text { THEN } y = -1.991x_1^2,\\ R_{2}:&\text { IF } x_1\in {\mathrm {gauss}}(x_1,-0.9053,2.106)\\&\text { AND } x_2\in {\mathrm {gauss}}(x_2,0.5484,0.3625) \\&\text { THEN } y = 5.924x_1^2,\\&\ldots \\ R_{9}:&\text { IF } x_1\in {\mathrm {gauss}}(x_1,0.0284,1.959)\\&\text { AND } x_2\in {\mathrm {gauss}}(x_2,0.9693,0.3919) \\&\text { THEN } y = -2.719x_1^2 + 0.0057x_1.\ \end{aligned} \end{aligned}$$

(38)

Table 6 Performance comparison for Experiment 2; $\overline{\mathrm {RMSE}}$ is the mean of the validation error, std is the standard deviation, min is the minimum value, max is the maximum value, p is the p-value of Wilcoxon test, ${\overline{S}}$ is the mean of the model sparsity, ${\overline{Q}}$ is the mean of the quality index

Full size table

It is worth noting that in the PSO-ENET model, three rules ($R_4$, $R_5$, $R_6$) have the zero polynomial in the consequent part. Figures 3, 4, 5 show the goal function y, the estimator ${\hat{y}}$, and the approximation error $y-{\hat{y}}$ for the best model. The average time of training high-order T–S fuzzy systems using PSO-ENET method for one MCCV data subset was about 40.28 s (Table 5). The methods with manually chosen fuzzy sets (OLS, RIDGE, FS, LAR, LASSO, ENET) have the shortest calculation times, while the longest times have been obtained by algorithms using SA.

7.4 Results of experiment 2

This experiment applies the nonlinear function (Yeh et al. 2011)

$$\begin{aligned} y = \sin (\pi x_1)\sin (\pi x_2), \end{aligned}$$

(39)

where $x_1\in [-1,1]$ and $x_2\in [0,1]$. The results are presented in Table 6. The statistical analysis of the $\mathrm {RMSE}$ showed that all calculated models generate significantly different results ($p < 0.05$) compared to the OLS model. The smallest value of the quality index ${\overline{Q}}$ is equal to 0.1515 and was obtained for the PSO-FS method. For this method, the validation error $\overline{\mathrm {RMSE}}$ is $3.403\mathrm {e-}02$, and the sparsity ${\overline{S}}$ is 0.7956. The PSO-FS method zeroed out 80% of 45 coefficients. The OLS method achieved the validation error $\overline{\mathrm {RMSE}}$ equal to $3.457\mathrm {e-}01$. Table 7 contains the parameters of the fuzzy systems obtained by the OLS and the PSO-FS methods. The fuzzy rules for the PSO-FS model can be written as

Table 7 Parameters of fuzzy systems in Experiment 2; p, q—peaks of membership functions, $\sigma $, $\delta $—widths of membership functions, $w_1$, $w_2$, $v_1$, $v_2$, b—polynomial coefficients in the consequent part

Full size table

$$\begin{aligned} \begin{aligned} R_{1}:&\text { IF } x_1\in {\mathrm {gauss}}(x_1,-0.7318,0.4725)\\&\text { AND } x_2\in {\mathrm {gauss}}(x_2,0.6572,1.025) \\&\text { THEN } y = 0,\\ R_{2}:&\text { IF } x_1\in {\mathrm {gauss}}(x_1,-0.7318,0.4725)\\&\text { AND } x_2\in {\mathrm {gauss}}(x_2,0.6618,1.034) \\&\text { THEN } y = -7.586x_1,\\&\ldots \\ R_{9}:&\text { IF } x_1\in {\mathrm {gauss}}(x_1,1,0.3468)\\&\text { AND } x_2\in {\mathrm {gauss}}(x_2,0.5186,0.3502) \\&\text { THEN } y = -7.655x_1^2+7.450.\ \end{aligned} \end{aligned}$$

(40)

It is seen that two rules ($R_1$ and $R_7$) have the zero polynomial in the consequent part. Figures 6, 7 and 8 show the goal function y, the estimator ${\hat{y}}$, and the approximation error $y-{\hat{y}}$ for the best model. The average time of training high-order T–S fuzzy systems using the PSO-FS method for one MCCV data subset was about 48.48 s (Table 5). As in Experiment 1, the methods with manually chosen fuzzy sets (OLS, RIDGE, FS, LAR, LASSO, ENET) have the shortest calculation times, while the longest times have been obtained by algorithms using SA.

8 Conclusions

A method of training high-order Takagi–Sugeno systems for two-variable function approximation has been proposed. The method is based on sparse regressions and metaheuristic optimization. The antecedent parameters of the fuzzy rules are set manually or by metaheuristic optimization methods such as particle swarm optimization, genetic algorithm, or simulated annealing. The consequent parameters are determined by ordinary least squares, ridge regression or sparse regressions such as forward selection, least angle regression, least absolute shrinkage and selection operator or elastic net. Ordinary least squares regression is used as a reference method. A quality criterion based on sparsity measure has been proposed to assess the quality of the fuzzy models. Compared with the reference method, the conducted experiments showed that: (a) the use of sparse regressions and/or metaheuristic optimization methods can reduce the validation error; (b) the use of sparse regressions may simplify the fuzzy model by setting some of the coefficients to zero.

References

Almaraashi M, John R, Hopgood A, Ahmadi S (2016) Learning of interval and general type-2 fuzzy logic systems using simulated annealing: Theory and practice. Inform Sci 360:21–42. https://doi.org/10.1016/J.INS.2016.03.047
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Information science and statistics. Springer, New York
MATH Google Scholar
Boulkaibet I, Belarbi K, Bououden S, Marwala T, Chadli M (2017) A new T–S fuzzy model predictive control for nonlinear processes. Expert Syst Appl 88:132–151. https://doi.org/10.1016/j.eswa.2017.06.039
Article Google Scholar
Cheung NJ, Ding XM, Shen HB (2014) OptiFel: a convergent heterogeneous particle swarm optimization algorithm for Takagi–Sugeno fuzzy modeling. IEEE Trans Fuzzy Syst 22(4):919–933. https://doi.org/10.1109/TFUZZ.2013.2278972
Article Google Scholar
Cordón O, Herrera F, Villar P (2000) Analysis and guidelines to obtain a good uniform fuzzy partition granularity for fuzzy rule-based systems using simulated annealing. Int J Approx Reason 25(3):187–215. https://doi.org/10.1016/S0888-613X(00)00052-9
Article MATH Google Scholar
Cordón O, Herrera F, Villar P (2001) Generating the knowledge base of a fuzzy rule-based system by the genetic learning of the data base. IEEE Trans Fuzzy Syst 9(4):667–674. https://doi.org/10.1109/91.940977
Article MATH Google Scholar
Eberhart RC, Shi Y (2000) Comparing inertia weights and constriction factors in particle swarm optimization. In: Proceedings of the 2000 congress on evolutionary computation, vol 1, pp 84–88. https://doi.org/10.1109/CEC.2000.870279
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Others: least angle regression. Ann Stat 32(2):407–499
Article Google Scholar
Glover FW, Kochenberger GA (2003) Handbook of metaheuristics. Springer, Berlin
Book Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, Cambridge
Book Google Scholar
Juang CFF, Lo C (2008) Zero-order TSK-type fuzzy system learning using a two-phase swarm intelligence algorithm. Fuzzy Sets Syst 159(21):2910–2926. https://doi.org/10.1016/j.fss.2008.02.003
Article MathSciNet Google Scholar
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol 4. IEEE Press, Piscataway, NJ, pp 1942–1948
Khayat O, Ebadzadeh MM, Shahdoosti HR, Rajaei R, Khajehnasiri I (2009) A novel hybrid algorithm for creating self-organizing fuzzy neural networks. Neurocomputing 73(1–3):517–524. https://doi.org/10.1016/j.neucom.2009.06.013
Article Google Scholar
Khosla A, Kumar S, Aggarwal KK (2005) A framework for identification of fuzzy models through particle swarm optimization algorithm. In: 2005 Annual IEEE India Conference-Indicon, pp 388–391
Khosla A, Kumar S, Ghosh KR (2007) A comparison of computational efforts between particle swarm optimization and genetic algorithm for identification of fuzzy models. In: NAFIPS 2007–Annual meeting of the North American fuzzy information processing society, pp 245–250. https://doi.org/10.1109/NAFIPS.2007.383845
Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680. https://doi.org/10.1126/science.220.4598.671
Article MathSciNet MATH Google Scholar
Li C, Wu T (2011) Adaptive fuzzy approach to function approximation with PSO and RLSE. Expert Syst Appl 38(10):13266–13273. https://doi.org/10.1016/j.eswa.2011.04.145
Article Google Scholar
Li C, Wu T, Chan FTT (2012) Self-learning complex neuro-fuzzy system with complex fuzzy sets and its application to adaptive image noise canceling. Neurocomputing 94:121–139. https://doi.org/10.1016/j.neucom.2012.04.011
Article Google Scholar
Lin CJ (2008) An efficient immune-based symbiotic particle swarm optimization learning algorithm for TSK-type neuro-fuzzy networks design. Fuzzy Sets Syst 159(21):2890–2909
Article MathSciNet Google Scholar
Lin G, Zhao K, Wan Q (2016) Takagi–Sugeno fuzzy model identification using coevolution particle swarm optimization with multi-strategy. Appl Intell 45(1):187–197. https://doi.org/10.1007/s10489-015-0752-0
Article Google Scholar
Martino FD, Loia V, Sessa S, Di Martino F, Loia V, Sessa S (2014) Multi-species PSO and fuzzy systems of Takagi–Sugeno–Kang type. Inform Sci 267(Supplement C):240–251. https://doi.org/10.1016/j.ins.2014.01.017
Article MathSciNet MATH Google Scholar
MathWorks (2019a) Global Optimization Toolbox: User’s Guide
MathWorks (2019b) Statistics and Machine Learning Toolbox: User’s Guide
Niu B, Zhu Y, He X, Shen H (2008) A multi-swarm optimizer based fuzzy modeling approach for dynamic systems processing. Neurocomputing 71(7–9):1436–1448. https://doi.org/10.1016/j.neucom.2007.05.010
Article Google Scholar
Picard RR, Cook RD (1984) Cross-validation of regression models. J Am Stat Assoc 79(387):575–583. https://doi.org/10.1080/01621459.1984.10478083
Article MathSciNet MATH Google Scholar
Prado RP, García-Galán S, Munoz Exposito JE, Yuste AJ (2010) Knowledge acquisition in fuzzy-rule-based systems with particle-swarm optimization. IEEE Trans Fuzzy Syst 18(6):1083–1097. https://doi.org/10.1109/TFUZZ.2010.2062525
Article Google Scholar
Rastegar S, Araujo R, Mendes J (2017) Online identification of Takagi–Sugeno fuzzy models based on self-adaptive hierarchical particle swarm optimization algorithm. Appl Math Modell 45(Supplement C):606–620. https://doi.org/10.1016/j.apm.2017.01.019
Article MathSciNet MATH Google Scholar
Setnes M, Roubos H (2000) GA-fuzzy modeling and classification: complexity and performance. IEEE Trans Fuzzy Syst 8(5):509–522. https://doi.org/10.1109/91.873575
Article Google Scholar
Shihabudheen KV, Mahesh M, Pillai GN (2018) Particle swarm optimization based extreme learning neuro-fuzzy system for regression and classification. Expert Syst Appl 92:474–484. https://doi.org/10.1016/j.eswa.2017.09.037
Article Google Scholar
Sjöstrand K, Clemmensen L, Larsen R, Einarsson G, Ersbøll B (2018) SpaSM: a MATLAB toolbox for sparse statistical modeling. J Stat Softw Articles 84(10):1–37. https://doi.org/10.18637/jss.v084.i10
Article Google Scholar
Soltani M, Chaari A, Ben Hmida F (2012) A novel fuzzy c-regression model algorithm using a new error measure and particle swarm optimization. Int J Appl Math Comput Sci 22(3):617–628. https://doi.org/10.2478/v10006-012-0047-0
Article MathSciNet MATH Google Scholar
Taieb A, Soltani M, Chaari A (2018) A fuzzy C-regression model algorithm using a new PSO algorithm. Int J Adapt Control Signal Process 32(1):115–133. https://doi.org/10.1002/acs.2829
Article MATH Google Scholar
Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern SMC–15(1):116–132. https://doi.org/10.1109/TSMC.1985.6313399
Article MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B (Methodological) 58(1):267–288
MathSciNet MATH Google Scholar
Tsai SHH, Chen YWW (2018) A novel identification method for Takagi–Sugeno fuzzy model. Fuzzy Sets Syst 338:117–135. https://doi.org/10.1016/j.fss.2017.10.012
Article MathSciNet MATH Google Scholar
Tu CH, Li C (2018) Multiple function approximation—a new approach using complex fuzzy inference system. In: Nguyen NT, Hoang DH, Hong TP, Pham H, Trawiński B (eds) Intelligent information and database systems. Springer, Cham, pp 243–254
Chapter Google Scholar
Wang H, Kwong S, Jin Y, Wei W, Man KF (2005) Multi-objective hierarchical genetic algorithm for interpretable fuzzy rule-based knowledge extraction. Fuzzy Sets Syst 149(1):149–186. https://doi.org/10.1016/j.fss.2004.07.013
Article MathSciNet MATH Google Scholar
Wang L, Mendel JM (1992) Fuzzy basis functions, universal approximation, and orthogonal least-squares learning. IEEE Trans Neural Netw 3(5):807–814
Article Google Scholar
Whitley DCSU (1994) A genetic algorithm tutorial by Darrell Whitley. Stat Comput 4:65–85. https://doi.org/10.1007/BF00175354
Article Google Scholar
Wiktorowicz K, Krzeszowski T (2020) Training high-order Takagi–Sugeno fuzzy systems using batch least squares and particle swarm optimization. Int J Fuzzy Syst 22(1):22–34. https://doi.org/10.1007/s40815-019-00747-2
Article Google Scholar
Yanar TA, Akyürek Z (2011) Fuzzy model tuning using simulated annealing. Expert Syst Appl 38(7):8159–8169. https://doi.org/10.1016/J.ESWA.2010.12.159
Article Google Scholar
Yang YKK, Sun TYY, Huo CLL, Yu YHH, Liu CCC, Tsai CHH (2013) A novel self-constructing radial basis function neural-fuzzy system. Appl Soft Comput 13(5):2390–2404. https://doi.org/10.1016/j.asoc.2013.01.023
Article Google Scholar
Yeh CY, Jeng WHR, Lee SJ (2011) Data-based system modeling using a type-2 fuzzy neural network with a hybrid learning algorithm. IEEE Trans Neural Netw 22(12):2296–2309. https://doi.org/10.1109/TNN.2011.2170095
Article Google Scholar
Ying KCC, Lin SWW, Lee ZJJ, Lee ILL (2011) A novel function approximation based on robust fuzzy regression algorithm model and particle swarm optimization. Appl Soft Comput 11(2):1820–1826. https://doi.org/10.1016/j.asoc.2010.05.028
Article Google Scholar
Yusof R, Abdul Rahman RZ, Khalid M, Ibrahim MF (2011) Optimization of fuzzy model using genetic algorithm for process control application. J Franklin Inst 348(7):1717–1737. https://doi.org/10.1016/j.jfranklin.2010.10.004
Article MATH Google Scholar
Zhao L, Qian F, Yang Y, Zeng Y, Su H (2010) Automatically extracting T–S fuzzy models using cooperative random learning particle swarm optimization. Appl Soft Comput 10(3):938–944. https://doi.org/10.1016/j.asoc.2009.10.012
Article Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B (Stat Methodol) 67(2):301–320
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical and Computer Engineering, Rzeszow University of Technology, al. Powstancow Warszawy 12, 35-959, Rzeszow, Poland
Krzysztof Wiktorowicz & Tomasz Krzeszowski

Authors

Krzysztof Wiktorowicz
View author publications
You can also search for this author inPubMed Google Scholar
Tomasz Krzeszowski
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Krzysztof Wiktorowicz.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by A. Di Nola.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wiktorowicz, K., Krzeszowski, T. Approximation of two-variable functions using high-order Takagi–Sugeno fuzzy systems, sparse regressions, and metaheuristic optimization. Soft Comput 24, 15113–15127 (2020). https://doi.org/10.1007/s00500-020-05238-3

Download citation

Published: 05 September 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00500-020-05238-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Approximation of two-variable functions using high-order Takagi–Sugeno fuzzy systems, sparse regressions, and metaheuristic optimization

Abstract

Similar content being viewed by others

Sparse regressions and particle swarm optimization in training high-order Takagi–Sugeno fuzzy systems

T–S Fuzzy Model Identification with Sparse Bayesian Techniques

Identification of time series models using sparse Takagi–Sugeno fuzzy systems with reduced structure

Explore related subjects

1 Introduction

1.1 Related works

1.2 Contributions

1.3 Paper structure

2 High-order Takagi–Sugeno fuzzy system

Definition 1

Definition 2

Definition 3

3 Training the consequent parameters

3.1 Ordinary least squares

3.2 Ridge regression

3.3 Sparse regressions

Example 1

4 Training the antecedent parameters

4.1 Particle swarm optimization

4.2 Genetic algorithm

4.3 Simulated annealing

5 Performance criterion

Definition 4

Definition 5

6 Design procedure for training fuzzy models

7 Experimental results

7.1 Experimental setup

7.2 Implementation

7.3 Results of experiment 1

7.4 Results of experiment 2

8 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords