Abstract
Industrial data contain a lot of noisy information, which cannot be well suppressed in deep learning models. The current industrial data classification models are problematic in terms of feature incompleteness and inadequate self-adaptability, insufficient capacity for approximation of classifier and weak robustness. To this end, this paper proposes an intelligent classification method based on self-attention learning features and stochastic configuration networks (SCNs). This method imitates human cognitive mode to regulate feedback so as to achieve ensemble learning. In particular, firstly, at the feature extraction stage, a fused deep neural network model based on self-attention is constructed. It adopts a self-attention long short-term memory (LSTM) network and self-attention residual network with adaptive hierarchies and extracts the fault global temporal features and local spatial features of the industrial time-series dataset after noise suppression, respectively. Secondly, at the classifier design stage, the fused complete feature vectors are sent to SCNs with universal approximation capability to establish general classification criteria. Then, based on generalized error and entropy theory, the performance indexes for real-time evaluation of credibility of uncertainty classified results are established, and the adaptive adjustment mechanism of self-attention fusion networks for the network hierarchy is built to realize the self-optimization of multi-hierarchy complete features and their classification criteria. Finally, fuzzy integral is used to integrate the classified results of self-attention fusion network models with different hierarchies to improve the robustness of the classification model. Compared with other classification models, the proposed model performs better using rolling bearing fault dataset.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availibility
The datasets analysed during the current study are available in the the following public domain resources: [https://github.com/yyxyz/CaseWesternReserveUniversityData]
Abbreviations
- \({\varvec{x}}_t\) :
-
Input of the time-series sample at the moment t
- \({\varvec{c}}_t\) :
-
Cellular memory state at the moment t in LSTM cell structure
- \({\varvec{h}}_t\) :
-
Hidden state output at the moment t in LSTM cell structure
- \({\varvec{i}}_t\) :
-
Input gate at the moment t in LSTM cell structure
- \({\varvec{f}}_t\) :
-
Forgetting gate at the moment t in LSTM cell structure
- \({\varvec{o}}_t\) :
-
Output gate at the moment t in LSTM cell structure
- \(\tilde{{\varvec{c}}}_t\) :
-
Intermediate candidate vector of tanh layer in LSTM cell structure
- \({\varvec{x}}^{q=1}\) :
-
Time-series dataset input to the self-attention LSTM network (\(q=1\))
- \({\varvec{h}}^{q=1}\) :
-
Hidden layer output of the self-attention LSTM network (\(q=1\))
- \(\varvec{\varphi }_1^{q=1}\) :
-
Global contextual feature of the self-attention LSTM network (\(q=1\))
- \(\varvec{\varphi }_2^{q=1}\) :
-
Intermediate feature by Sigmoid function of the self-attention LSTM network (\(q=1\))
- \(\varvec{\tau }^{q=1}\) :
-
Time-scale threshold for \({\varvec{h}}^{q=1}\) of the self-attention LSTM network (\(q=1\))
- \({\varvec{h}}_{\text{filter}}^{q=1}\) :
-
Filtered feature vector output for \({\varvec{h}}^{q=1}\)
- \({\varvec{H}}^{q=1}\) :
-
Output of the self-attention LSTM network (\(q=1\))
- \({\varvec{f}}^{q=1}\) :
-
Feature map output after two convolutions of the self-attention residual network (\(q=1\))
- \({\varvec{f}}_1^{q=1}\) :
-
Global contextual feature of the self-attention residual network (\(q=1\))
- \({\varvec{f}}_2^{q=1}\) :
-
Intermediate feature by Sigmoid function of the self-attention residual network (\(q=1\))
- \(\varvec{\varepsilon }^{q=1}\) :
-
Channel threshold for \({\varvec{f}}^{q=1}\) of the self-attention residual network (\(q=1\))
- \({\varvec{f}}_{\text{filter}}^{q=1}\) :
-
Filtered feature vector output for \({\varvec{f}}^{q=1}\)
- \({\varvec{Y}}^{q=1}\) :
-
Output of self-attention residual network (\(q=1\))
- \(\varvec{\lambda }_{L-1}\) :
-
Output of the \(L-1{\text{th}}\) hidden node in SCNs
- \({\varvec{Z}}\) :
-
Fused feature vector of fusion deep network with self-attention for N samples
- k :
-
Dimension of the fused feature vector \({\varvec{Z}}_j,j\in [1,N]\)
- \(L_{\text{max}}\) :
-
Maximum hidden node number of SCNs
- \({\varvec{w}}_j\) :
-
Input weight of the \(j{\text{th}}\) hidden node of SCNs
- \({\varvec{b}}_j\) :
-
Bias of the \(j{\text{th}}\) hidden node of SCNs
- p :
-
Number of fault categories
- \(\varvec{\beta }_j\) :
-
Output weight matrix of the \(j{\text{th}}\) hidden node of SCNs
- \(g_j(\cdot )\) :
-
Activation function of the \(j{\text{th}}\) hidden node of SCNs
- \({\varvec{e}}_{L-1}({\varvec{Z}})\) :
-
Error output of the \(j{\text{th}}\) hidden node of SCNs
- \({\varvec{g}}_L({\varvec{Z}})\) :
-
Activation output of the \(L{\text{th}}\) hidden node of SCNs for \({\varvec{Z}}\)
- \({\varvec{G}}_L\) :
-
Output matrix of the \(L{\text{th}}\) hidden layer of SCNs
- \(\varvec{\xi }_{L,a}\) :
-
Inequality constraint variables for hidden parameters of SCNs
- \(\varvec{\beta }^*\) :
-
Output weight matrix for L hidden nodes based on the least square method
- \({\varvec{G}}_L^{\dag }\) :
-
Moore–Penrose generalized inverse of matrix \({\varvec{G}}_L\)
- U :
-
Training time-series dataset of rolling bearing
- \(M_q\) :
-
Self-attention fusion network models with q hierarchy
- \({\varvec{Z}}_j^i\) :
-
Fusion feature of the \(j{\text{th}}\) sample via \(M_q\)
- \(\tilde{{\varvec{Z}}}_j^i\) :
-
Fusion latent semantic feature for \({\varvec{Z}}_j^i\)
- X :
-
Any sample in U
- \(\tilde{{\varvec{C}}}\) :
-
Fusion latent semantic feature of X
- \(E_i\) :
-
Fusion latent semantic error entropy of X and \(U_i\)
- \({\varvec{S}}\) :
-
Covariance matrix of \(\left[ \tilde{{\varvec{C}}}; \tilde{{\varvec{Z}}}^{i}\right] ^{\mathrm {T}}\)
- E :
-
Fusion latent semantic error entropy of X and U
- m :
-
Feedback number
- \(q_0\) :
-
Initial network hierarchy of the self-attention fusion deep network
- \(q_{\text{max}}\) :
-
Maximum of the adaptive adjustment of the network hierarchy
- thres:
-
Error threshold of SCNs
- \(\mu _{\text{max}}\) :
-
Iteration maximum of network training
- num:
-
Sample number of U
- \(\gamma \) :
-
Sample credibility threshold
- V :
-
Trusted sample dataset
- A :
-
Fusion network model set
- T :
-
Intermediate training dataset
- v :
-
Fuzzy measure of fusion deep network model \(A_i\)
- \(X'\) :
-
Testing time-series dataset of rolling bearing
- \(\sigma \) :
-
Fuzzy integral of testing sample
References
Rai A, Upadhyay SH (2016) A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings. Tribol Int 96:289–306
Xu G, Liu M, Jiang Z, Shen W, Huang C (2020) Online fault diagnosis method based on transfer convolutional neural networks. IEEE Trans Instrum Measure 69(2):509–520
Qu J, Zhang Z, Gong T (2016) A novel intelligent method for mechanical fault diagnosis based on dual-tree complex wavelet packet transform and multiple classifier fusion. Neurocomputing 171:837–853
Gong W, Wang Y, Zhang M, Mihankhah E, Chen H, Wang D (2021) A fast anomaly diagnosis approach based on modified CNN and multi-sensor data fusion. IEEE Trans Ind Electr. https://doi.org/10.1109/TIE.2021.3135520
Muruganatham B, Sanjith MA, Krishnakumar B, Satya Murty SAV (2013) Roller element bearing fault diagnosis using singular spectrum analysis. Mech Sys Sig Process 35(1–2):150–166
Li B, Chow MY, Tipsuwan Y, Hung JC (2000) Neural-network-based motor rolling bearing fault diagnosis. IEEE Trans Ind Electr 47(5):1060–1069
Maurya S, Singh V, Verma NK (2020) Condition monitoring of machines using fused features from EMD-based local energy with DNN. IEEE Sens J 20(15):8316–8327
Harmouche J, Delpha C, Diallo D (2015) Incipient fault detection and diagnosis based on Kullback-Leibler divergence using principal component analysis: part II. Sig Process 109(1):334–344
Guo Y, Wu X, Na J, Fung RF (2015) Incipient faults identification in gearbox by combining kurtogram and independent component analysis. Appl Mech Mater 764–765:309–313
Yang Y, Yu DJ, Cheng JS (2006) A roller bearing fault diagnosis method based on EMD energy entropy and ANN. J Sound Vibrat 294(1–2):269–277
Souza JDS, dos Santos MVL, Suzuki Bayma R, Amarante Mesquita AL (2021) Analysis of window size and statistical features for SVM-based fault diagnosis in bearings. IEEE Latin Am Trans 19(02):243–249
Sun J, Yan C, Wen J (2018) Intelligent bearing fault dagnosis method combining compressed data acquisition and deep learning. IEEE Trans Instr Measur 67(1):185–195
Zhao ZZ, Xu QS, Jia MP (2016) Improved shuffled frog leaping algorithm-based BP neural network and its application in bearing early fault diagnosis. Neur Comput Appl 27:375–385
Yu L, Qu J, Gao F et al (2019) A novel hierarchical algorithm for bearing fault diagnosis based on stacked LSTM. Shock Vib 2019:1–11
Pan H, He X, Tang S et al (2018) An improved bearing fault diagnosis method using one-dimensional CNN and LSTM. J Mech Eng 64(7–8):443–452
Aljemely AH, Xuan JP, Azzawi OA, Jawad FKJ (2022) Intelligent fault diagnosis of rolling bearings based on LSTM with large margin nearest neighbor algorithm. Neur Comp Appl. https://doi.org/10.1007/s00521-022-07353-8
Fan W, Zhou Q, Li J, Zhu Z (2018) A wavelet-based statistical approach for monitoring and diagnosis of compound faults with application to rolling bearings. IEEE Trans Auto Sci Eng 15(4):1563–1572
Zhang W, Peng G, Li C (2017) Rolling element bearings fault intelligent diagnosis based on convolutional neural networks using raw sensing Signal. Adv Intell Infor Hid Multim Sig Process 64:77–84
Xia M, Li T, Xu L et al (2017) Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks. IEEE/ASME Trans Mechatr 23(1):101–110
Hoang DT, Kang HJ (2020) A motor current signal-based bearing fault diagnosis using deep learning and information fusion. IEEE Trans Instr Measur 69(6):3325–3333
Li X, Zhang W, Ding Q (2018) Cross-domain fault diagnosis of rolling element bearings using deep generative neural networks. IEEE Trans Ind Electr 66(7):5525–5534
Li Y (2021) Exploring real-time fault detection of high-speed train traction motor based on machine learning and wavelet analysis. Neur Comp Appl 34:9301–9314
Cao R, Fang L, Lu T, He N (2021) Self-attention-based deep feature fusion for remote sensing scene classification. IEEE Geosci Remot Sens Lett 18(1):43–47
Gao CX, Zhang N, Li YR, Bian F, Wan HYY (2022) Self-attention-based time-variant neural networks for multi-step time series forecasting. Neur Comput Appl 34:8737–8754
Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329
Dai W, Li DP, Zhou P, Chai TY (2019) Stochastic confifiguration networks with block increments for data modeling in process industries. Infor Sci 484:367–386
Li WT, Tao H, Li H, Chen KQ, Wang JP (2019) Greengage grading using stochastic configuration networks and a semi-supervised feedback mechanism. Infor Sci 488:1–12
Lu J, Ding JL (2020) Mixed-distribution-based robust stochastic confifiguration networks for prediction interval construction. IEEE Trans Ind Infor 16(8):5099–5109
Scardapane S, Wang D (2017) Randomness in neural networks: an overview. WIREs Data Min Knowl Discov 7(2):1–18
Wang D (2016) Editorial: randomized algorithms for training neural networks. Infor. Sci. 126–128:364–365
Zhang Q, Li WT, Li H, Wang JP (2020) Self-blast state detection of glass insulators based on stochastic configuration networks and a feedback transfer learning mechanism. Info Sci 522:259–274
El-Thalji I, Jantunen E (2015) A summary of fault modelling and predictive health monitoring of rolling element bearings. Mech Sys Sig Process 60–61:252–272
Zhang W, Gao LP, Chuan HL et al (2017) A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors 17(03):425–441
Cheng Y, Yuan H, Liu H et al (2017) Fault diagnosis for rolling bearing based on SIFT-KPCA and SVM. Eng Comput 34(1):53–65
Tan, J., Lu, W., An, J.: Fault diagnosis method study in roller bearing based on wavelet transform and stacked auto-encoder. 27th Chinese Control and Decision Conference, 4608-4613, (2015)
Ince T, Kiranyaz S, Eren L (2016) Real-time motor fault detection by 1-d convolutional neural networks. IEEE Trans Ind Electr 63(11):7067–7075
Wei Z, Peng G, Li C (2017) Rolling element bearings fault intelligent diagnosis based on convolutional neural networks using raw sensing signal. Springer, Berlin 11:77–84
Yu L, Qu J, Gao F et al (2019) A novel hierarchical algorithm for bearing fault diagnosis based on stacked LSTM. Shock Vib 2019:1–10
Pan H, He X, Tang S et al (2018) An improved bearing fault diagnosis method using one-dimensional CNN and LSTM. J Mech Eng 64(7/8):443–452
Zhao R, Yan R, Wang J et al (2017) Learning to monitor machine health with convolutional bi-directional LSTM networks. Sensors 17(2):273
Jafari H, Poshtan J (2019) Fault detection and isolation based on fuzzy-integral fusion approach. IET Sci Measur Tech 13(2):296–302
Wu SL, Liu YT, Hsieh TY et al (2017) Fuzzy integral with particle swarm optimization for a motor-imagery-based brain Ccomputer interface. IEEE Trans Fuzzy Sys 25(1):21–28
Li WT, Wang DH, Chai TY (2012) Flame image-based burning state recognition for sintering process of rotary kiln using heterogeneous features and fuzzy integral. IEEE Trans Ind Infor 8(4):780–790
Cao, Y., Xu, J., Lin, S., et al.: GCNet: non-local networks meet squeeze-excitation networks and beyond, (2019) arXiv:1904.11492
Wang D, Li M (2017) Stochastic configuration networks: fundamentals and algorithms. IEEE Trans Cyber 47(10):3466–3479
Wernecke SJ (2016) Two-dimensional maximum entropy reconstruction of radio brightness. Rad Sci 12(5):831–844
Zhang W, Li C, Peng G et al (2018) A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech Sys Sig Process 100:439–453
Acknowledgements
This work is supported in part by grants from the National Natural Science Foundation of China (62173120, 52077049, 51877060), National Key R & D Program of China under Grant No. (2018AAA0100304), Anhui Provincial Natural Science Foundation (2008085UD04, 2108085UD07, 2108085UD11), and 111 Project No. (BP0719039).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest Statement
The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, W., Deng, Y., Ding, M. et al. Industrial data classification using stochastic configuration networks with self-attention learning features. Neural Comput & Applic 34, 22047–22069 (2022). https://doi.org/10.1007/s00521-022-07657-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07657-9