1 Introduction

In the realm of practical engineering applications, uncertainties are omnipresent due to myriad factors including geometric tolerances, material properties, and operational loading conditions [1]. These uncertainties can generally be categorized into two distinct types: aleatory uncertainty and epistemic uncertainty [2]. Aleatory uncertainty, also recognized as statistical uncertainty, pertains to the intrinsic variability inherent in probabilistic processes. This form of uncertainty is immutable and cannot be diminished through the acquisition of additional knowledge or information. Conversely, epistemic uncertainty arises from a deficiency in understanding, such as incomplete knowledge of the underlying mechanisms, partial understanding of the phenomena, or imprecise evaluation of associated characteristics. Also referred to as systematic uncertainty, epistemic uncertainty can be mitigated by gathering more comprehensive knowledge or information. The interplay of these uncertainties significantly impacts the response of engineering products, occasionally leading to product failures [3]. Therefore, the accurate quantification of these uncertainties is crucial for the precise evaluation of product performance and behavior.

Uncertainty quantification (UQ) aims to quantify and understand the impact of pervasive uncertainties in engineering products, and it has become a prominent topic in both theoretical research and practical engineering applications. The primary tasks of UQ encompass uncertainty identification, characterization, forward and inverse uncertainty propagation, model calibration, and validation, among others [4]. Quantitative characterization of uncertainty through mathematical models forms the foundation of these UQ tasks. Currently, numerous models have been developed to characterize uncertainty in various contexts, which can be categorized into three types: probabilistic models, non-probabilistic models, and imprecise probabilistic models. Probabilistic models, which mainly include random models and stochastic process models, are the most widely used tools for quantifying aleatory uncertainty. These models are firmly rooted in classical probability theory, a mathematical discipline concerned with probability. Non-probabilistic models, on the other hand, are primarily employed to address epistemic uncertainty using set-theoretic approaches. Notable examples of such models include interval models [5], convex models [6], and fuzzy models [7]. Imprecise probabilistic models represent a class of uncertainty models that extend and generalize classical precise probabilistic models by allowing partial probability specifications. These models are based on the theory of imprecise probability, which, unlike classical probability theory that relies on a single probability measure, uses a set of probability measures to represent uncertainty [8]. This approach enables imprecise probabilistic models to accommodate both aleatory and epistemic uncertainties. Representative imprecise probabilistic models include the probability box (p-box) models [9], evidence theory-based models [10], and fuzzy probability models [11]. It is noteworthy that the afore-mentioned uncertainty characterization models can be used independently or in combination, depending on the data or uncertainty information available in the engineering structures or systems.

The process of UQ in complex engineering structures or systems necessitates numerous invocations of physical models for analysis, which typically imposes a substantial computational burden. Consequently, traditional numerical and simulation methods, such as the Monte Carlo simulation (MCS), often become impractical for engineering applications due to their computational demands [12]. In recent times, machine learning (ML) techniques have garnered significant attention for their ability to strike a remarkable balance between computational accuracy and efficiency in addressing engineering challenges. Fundamentally, ML is understood as the science of employing computational algorithms to execute specific tasks through data analysis without explicit programming [13]. Depending on the nature and availability of sample data, ML methodologies can be segmented into various algorithm categories: supervised, unsupervised, semi-supervised, and reinforcement learning algorithms [14]. Supervised learning algorithms are employed when sample data contain input–output pairs, known as labeled data, derived from the targeted function or distribution. Unsupervised learning algorithms are predominantly used when data only include input values, whereas semi-supervised algorithms are applied in scenarios where some sample data lack output values. In reinforcement learning, outputs are presented as reward feedback for actions based on selected inputs. A plethora of ML models are currently available, including the Kriging model [15], Gaussian process regression (GPR) [16], polynomial chaos expansion (PCE) [17], support vector machine (SVM) [18], artificial neural network (ANN) [19], Bayesian neural network (BNN) [20], physics-informed neural network (PINN) [21], among others [22,23,24]. Kriging and GPR both utilize covariance functions to model dependencies and offer predictions with associated uncertainty, a feature advantageous for calculating prediction confidence levels and constructing adaptive learning frameworks to enhance ML models’ computational precision. PCE is a mathematical approach for UQ that expresses random variables as a series of orthogonal polynomials based on the probability distributions of input uncertainties, facilitating efficient uncertainty propagation and analysis in complex models. SVM, known for its robustness in classification and regression tasks, excels in identifying the optimal hyperplane for class separation and offers flexibility through kernel functions, making it widely used across various applications. ANN, inspired by the neural architecture of the human brain and comprising interconnected node layers, excel in pattern recognition, classification, and regression by learning from examples. Notably, the ANN has become a predominant subject in both academic research and engineering applications. Differing from the ANN, the BNN treats model parameters as random variables rather than fixed quantities, thereby directly providing predictions with uncertainty similar to Kriging and GPR, enhancing the BNN’s adaptability in UQ challenges. Contrasting with these traditional ML models, the PINN integrates physical information into the ANN’s training regimen, offering more precise predictions while mitigating the need for labeled data to some extent. Numerous studies and engineering applications have confirmed the PINN’s efficacy in approximating physical model calculations, especially for models represented by original differential equations (ODEs)/partial differential equations (PDEs).

This paper presents a comprehensive survey of ML approaches for addressing UQ tasks of engineering systems. The first focus is a review of ML methods for forward UQ analysis, including response and reliability assessments, as detailed in Sect. 2. For response analysis, we examine various ML approaches addressing aleatory, epistemic, and hybrid uncertainties. In reliability analysis, we review two prominent categories of ML techniques: data-driven neural networks and PINNs. Section 3 then shifts to ML strategies for inverse UQ analysis, covering key areas such as probabilistic model updating (PMU) and design optimization under uncertainty. For PMU, topics like Markov chain Monte Carlo (MCMC) sampling, variational Bayesian inference (VBI), Bayesian updating with structural reliability analysis (BUS), transport maps, and Bayesian filtering are discussed. In terms of design optimization under uncertainty, we review two major objectives: reliability-based design optimization (RBDO) and robust design optimization (RDO). Following this, Sect. 4 addresses ML-based fault diagnosis, with a focus on cross-domain recognition techniques leveraging transfer learning. Section 5 evaluates the application of ML in resilience assessment and management planning, highlighting domain-embedded ML methods such as physics-guided, physics-enhanced, physics-corrected, and physics-constrained approaches. Section 6 presents an in-depth discussion of current challenges, while Sect. 7 offers conclusions and potential avenues for future research.

2 Forward uncertainty quantification analysis

The forward analysis investigates uncertainty propagation across structures, characterizing their responses under uncertainties. This section divides forward analysis into two components: response analysis and reliability analysis. Response analysis calculates the structural response's values, statistical moments, and probability density function (PDF) [25]. Reliability analysis focuses on estimating a structure's capability to fulfill a particular function under defined conditions [3]. Recent advancements in response and reliability analyses, leveraging ML techniques, are outlined below.

2.1 Response analysis

2.1.1 Response analysis with aleatory uncertainty

The fundamental framework to leveraging ML techniques for analyzing structural response involves initially developing a surrogate model. This model serves to approximate the input–output mapping relationship, thereby enabling the estimation of structural response characteristics. This estimation is achieved through the surrogate model, which offers a time-efficient alternative to the labor-intensive ODEs/PDEs or the intricacies of finite element models. Traditionally, ML-assisted methodologies for delineating response characteristics have relied on the direct construction of a surrogate model from training samples, with no subsequent updates using new data. This approach is known as the one-step sampling strategy for constructing ML models. Within this context, Bhattacharyya [26] introduced a ML technique that integrates proper orthogonal decomposition with Kriging (POD–Kriging) model to analyze the mean and standard deviation of dynamic systems. This approach utilizes proper orthogonal decomposition to project the response quantity using a minimal number of bases. However, the POD–Kriging model may produce erroneous results for the dynamical systems with random field or white noise. Lu et al. [27] formulated a multi-output GPR method to calculate the mean, standard deviation, and PDF of frequency response functions. Their method initially applies modal decomposition and low-dimensional representation to mitigate the curse of dimensionality in frequency response functions, followed by employing a multi-output GPR model for response reconstruction at any frequency level. Gibanica and Abrahamsson [28] crafted a comprehensive ML strategy for predicting response values and the PDF of frequency response functions, wherein individual local surrogate models for eigenfrequencies and residue matrix elements are trained to construct the overarching ML model. The results shown that this method generally obtains good accuracy but the antiresonances are mispredicted occasionally. Hong et al. [29] devised a framework that converts PDEs with stochastic parameters into their deterministic counterparts, further enhancing it with a Bayesian augmented space learning (BASL) method to deduce the statistical attributes of model responses. Liu et al. [30] introduced a PDF-based sampling technique for generating premium training samples, facilitating the training of a BNN to compute the mean, variance, and PDF of structural response. This approach was also adapted for dynamic systems through integration with an arbitrary stochastic process simulation technique [31]. The solutions demonstrated a good performance of the method in dealing with high-dimensional problems. Tripathy and Bilionis [32] parameterized the structure of deep neural networks, enabling these networks to interpret and recover low-dimensional nonlinear manifolds, thereby efficiently solving the statistical characteristics and PDF of high-dimensional PDEs. To address the first four moments of structural response, Shi et al. [33] derived analytical expressions based on the Bayesian Monte Carlo (BMC) model, which can avoids the post-processing errors in traditional ML methods when numerical simulation techaniues are combined for uncertainty propagation. Yao et al. [34] developed a semi-supervised deep adaptive arbitrary PCE method, training a deep neural network model with a limited amount of labeled data and a wealth of unlabeled data to efficiently solve the first four moments of structural response. Shi et al. [35] proposed a novel gradient-enhanced Bayesian neural network (GEBNN) framework by incorporating the gradient information into the model training process for calculating the first four moments and PDF of structural response. By integrating gradient information into the GEBNN, it typically outperforms traditional ANN and BNN in terms of prediction accuracy. Figure 1 illustrates a comparative analysis of various neural networks in approximating the performance function \(Y={\mathrm{cos}}^{3}(5{X}_{1})+\mathrm{exp}({\mathrm{sin}}^{2}(2{X}_{2}))\), utilizing identical 50 training samples. The approaches employing a one-step sampling strategy are typically easy to implement and can be directly applied to available data, making them widely applicable in engineering practices.

Fig. 1
figure 1

Response and absolute error results based on different neural networks (a) Response results with ANN (b) Response results with BNN (c) Response results with GEBNN (d) Absolute errors with ANN (e) Absolute errors with BNN (f) Absolute errors with GEBNN

The principal limitation of the afore-mentioned one-step sampling strategy in ML applications is the uncertainty in the computational accuracy of predictions, primarily due to the absence of prior knowledge regarding the optimum number of training samples required for constructing ML models that yield accurate predictions without escalating computational demands. In response to this challenge, the adaptive learning strategy has gained significant attention for its ability to iteratively refine ML models. This process involves the selection of new training samples in each iteration to update the ML models until a predefined termination criterion is met. The general framework of adaptive learning in ML is depicted in Fig. 2. When contrasted with the one-step sampling strategy, ML models developed through adaptive learning usually achieve superior predictive accuracy. Furthermore, the adaptive ML approaches are more efficient than the one-step sampling methods in dealing with the nonlinear problems. In this vein, Liu et al. [36] devised a comprehensive ML methodology, integrating principal component analysis with the Kriging model (PCA-Kriging) to predict the time-dependent responses of dynamic systems. Herein, the k-fold cross-validation technique is recommended for selecting new training samples to bolster the ML model. Wei et al. [37] introduced an adaptive algorithm leveraging GPR, which employs a learning function derived from the posterior variance of the mean to calculate the mean structural response under stochastic inputs. This learning function proves instrumental in pinpointing the design point that, when included in the training dataset, maximally reduces the posterior variance of the mean. Chen et al. [38] unveiled an innovative adaptive framework employing the Kriging model to compute the mean and variance of structural responses using closed-form expressions, thereby circumventing additional post-processing computational expenses and errors. Zhou and Peng [39] formulated a comprehensive adaptive Bayesian quadrature method for estimating the first four moments of the equivalent extreme value of structural responses. Additionally, they derived the PDF of the equivalent extreme value from the shifted generalized lognormal distribution, informed by the statistical moment analysis. Nonetheless, the authors acknowledged that the method my be incapable of dealing with multimodel problems. It should be noted that implementing adaptive ML approaches is usually more complex than one-step sampling methods, and their effectiveness largely depends on the design of the adaptive framework.

Fig. 2
figure 2

The general framework of adaptive learning in ML

However, the above-mentioned methods primarily utilize training samples to develop ML models, yet they overlook the structural physical information. Integrating this physical information into ML models to create physics-informed machine learning (PIML) models typically yields more precise and robust predictions. In PIML models, physical principles are often harnessed as prior knowledge to augment the efficacy of ML models. Among these, PINNs are increasingly recognized for their potential to lower computational expenses and enhance modeling adaptability. Broadly, PINN methodologies are categorized based on how physics-based prior knowledge is integrated into the models. The first category, known as the physics-informed loss function approach, incorporates physical information as constraints within the neural network model’s loss function [40]. The second category, termed physics-informed architecture, embeds physical information directly into the neural network model’s architecture, including elements like neurons [41], hidden layers [42], and training samples [43], etc. Figure 3 illustrates the schematic of various PINN model types. These PINN models can integrate diverse physical principles knowledge, including PDEs, boundary conditions, failure mode physics, and statistical properties. Among these, the physics-informed loss function approach is most commonly applied in engineering practices. To address the uncertainty in physical system responses due to input randomness or observational noise, Yang and Perdikaris [44] developed a PINN framework utilizing latent variable models for probabilistic system state representations and introduced an adversarial inference method for training with data and physical laws represented by PDEs. Although the proposed adversarial inference framework demonstrates significant flexibility in probabilistic computations, its performance is contingent on meticulous tuning of the interaction between the generator and discriminator networks. Chen and Liu [45] introduced a probabilistic PINN for estimating probabilistic fatigue S–N curves uncertainty, employing both failure and runout data in the training and incorporating various physics constraints in the loss function. Zhou et al. [46] developed a probabilistic PINN method to estimate the uncertainty of probabilistic fatigue S–N curves, leveraging the neural network’s inherent backpropagation for automatic physical knowledge derivative computation. Alberts and Bilionis [47] extended information field theory to physics-informed information field theory by incorporating functional priors encoded with information about physical laws. They developed a variant of stochastic gradient Langevin dynamics to sample from both the field posterior and the posterior distribution of model parameters. Their results demonstrated that the method remains robust even when the representation of the underlying physics is incorrect, provided sufficient data are available. However, the precise mechanisms underlying this capability remain poorly understood. Recognizing the limitations of traditional PINN methods, particularly in simulating dynamical systems with multi-scale, chaotic, or turbulent behavior, Wang et al. [48] proposed a reformulation of the PINN loss function to explicitly incorporate physical causality during model training. Their solutions achieved a remarkable 10- to 100-fold improvement in accuracy over competing methods, thereby enabling the application of PINNs to previously intractable problems. Nonetheless, the authors acknowledged that treating PINNs as forward PDE solvers remains significantly more time-consuming than traditional numerical approaches. Recently, the integration of Bayesian theory with PINNs has given rise to a promising class of UQ methods known as Bayesian PINNs. Sun and Wang [49] introduced a Bayesian PINN framework to reconstruct flow fields from sparse, noisy velocity data, incorporating equation-based constraints through the likelihood function and estimating uncertainties in the flow. Their results demonstrated the method’s ability to accurately reconstruct flow fields while effectively handling noisy measurements. Similarly, Yang et al. [50] developed a Bayesian PINN that leverages both physical laws and scattered noisy measurements to make predictions and quantify uncertainty arising from measurement noise. They emphasized that the choice of prior distributions for Bayesian PINNs has a significant impact on posterior predictions, particularly in small-data scenarios. Further advancements were made by Linka et al. [51], who proposed a Bayesian PINN framework that seamlessly integrates data and physical laws. Their approach performed well in both interpolation and extrapolation tasks, even with limited and noisy datasets. However, the method’s performance was found to be sensitive to appropriate scaling, preconditioning, and the size of the training dataset. Additionally, Stock et al. [52] applied Bayesian PINNs to robustly identify power system inertia and damping, introducing this framework's strengths and limitations in comparison to existing methods.

Fig. 3
figure 3

Schematic diagram of different types of PINN models

The afore-mentioned ML approaches primarily focus on learning mappings between finite-dimensional Euclidean spaces for specific instances of PDEs [53]. Recently, a novel class of methods, termed neural operators, has emerged, enabling the direct learning of mappings from functional parametric dependencies to solutions. Unlike traditional ML approaches, neural operators generalize across entire families of PDEs [54, 55]. Neural operators offer several notable advantages. First, they require only a single training phase, after which obtaining a solution for a new instance of parameters involves merely a forward pass through the network. This significantly mitigates the computational overhead associated with traditional ML approaches, which typically require retraining for each new instance. Second, neural operators are purely data-driven and do not necessitate prior knowledge of the underlying PDE. Owing to these strengths, neural operators have been widely explored for both forward and inverse UQ tasks. Yang et al. [56] proposed an effective method for posterior UQ in the deep operator network (DeepONet) [54]. Their approach employed a frequentist framework based on randomized prior ensembles and introduced a vectorized implementation to enable fast parallel inference on accelerated hardware. Results demonstrated reliable uncertainty estimations for multi-scale function pairs, even with limited data. Ma et al. [57] developed a risk-controlling quantile neural operator method, providing a theoretical calibration guarantee for coverage rates on the function domain, where the true values lie within the predicted uncertainty bounds. Empirical studies showed that this approach outperformed baseline methods in terms of both calibrated coverage and the efficiency of uncertainty bands. Moya et al. [58] introduced a conformal prediction framework for obtaining confidence intervals with coverage guarantees in DeepONet regression named Conformalized-DeepONet. Their work established the Quantile-DeepONet, which enables a more natural integration of split conformal prediction. Guo et al. [59] presented an information bottleneck framework for UQ (IB-UQ) in both deep neural networks and DeepONets. Their method identified a confidence-aware encoder as the bottleneck and used a Gaussian decoder to predict the mean and variance of outputs. To address uncertainties caused by noisy input–output pairs in PINNs and neural operators, Zou et al. [60] proposed a Bayesian framework. Additionally, Zou et al. [61] developed an open-source Python library, NeuralUQ, specifically designed for handling UQ tasks. Despite these advances, neural operators still face several challenges. Training neural operators demands substantial computational resources due to the high-dimensional nature of function spaces and the large datasets required. Moreover, while neural operators are designed to generalize across function spaces, they often struggle to extrapolate beyond the range of their training data, limiting their robustness in unseen scenarios.

In summary, four categories of approaches including one-step sampling methods, adaptive learning methods, PINNs, and neural operators, are reviewed and discussed in this subsection for response analysis under aleatory uncertainty. It is important to note that these classifications are not rigid, as some methods may overlap across categories. For instance, certain PINNs can also be considered one-step sampling methods. The classification adopted in this article is intended to provide a clearer framework for presenting the latest advancements, particularly PINNs and neural operators. A general comparison of the reviewed methods is provided in Table 1. Given the diversity of specific implementations within each category, it is challenging to comprehensively enumerate the merits and limitations of every individual approach. Therefore, only their common features are summarized.

Table 1 The comparison of approaches for response analysis with aleatory uncertainty

2.1.2 Response analysis with epistemic uncertainty

In engineering practice, the PDFs of structural parameters are often not available due to the limited information on a structural system. Under such conditions, the uncertainty of structural parameters can be represented through epistemic uncertainty, characterized by models such as interval, convex, and fuzzy models. These models offer a substantial advantage in conducting response analysis under data scarcity. Interval models, being the most basic type of mathematical models for delineating uncertainty, define an interval scalar as a bounded set of real numbers. Convex models, on the other hand, are delineated by a convex set allowing for the consideration of dependencies among multiple uncertain variables. Generally, interval models are considered a specific instance of convex models. Figure 4 presents a straightforward comparison between interval and convex models, where \({X}_{i}^{L}(i=\mathrm{1,2})\) and \({X}_{i}^{U}(i=\mathrm{1,2})\) denote the lower and upper bounds of variable \({X}_{i}(i=\mathrm{1,2})\), respectively. Fuzzy models constitute a category of uncertainty models grounded in fuzzy set theory, wherein a fuzzy set assigns each element a membership grade ranging from zero to one.

Fig. 4
figure 4

Comparison of interval models and convex models

To explore the response characteristics of non-deterministic harmonic and transient dynamic structural systems under fuzzy or interval variables, De Munck et al. [62] developed an innovative optimization procedure that integrates the Kriging model with deterministic numerical analysis codes. This approach allows for conducting interval analysis under specific membership levels for fuzzy variables, as illustrated in Fig. 5, where \(\mu (\cdot )\) denotes the membership function. For uncertain variables characterized by ellipsoid convex models, Wang et al. [63] established a comprehensive multi-ellipsoid modeling framework using statistical data from existing samples to derive explicit ellipsoid model expressions. Concurrently, a backpropagation ANN was developed as a surrogate for the original, computationally intensive simulation model, enhancing the efficiency of uncertain response analysis. Wang and Matthies [64] introduced an efficient, sample-driven method to formulate explicit mathematical expressions for parallelepiped convex models of structural parameters, employing radial basis functions to approximate the original computational model for expedited response analysis. Notably, interval models are prevalently applied in engineering due to their simplicity, requiring only the lower and upper bounds of inputs and outputs. In this vein, Wang et al. [65, 66] proposed an ANN-based analysis to determine structural response bounds, deriving first and second-order partial derivative formulas for the response relative to structural parameters using the backward chain rule of partial differentiation. Utilizing these partial derivatives with the parameter perturbation method enables efficient approximation of structural response extrema. Additionally, the subinterval method is introduced for more accurate and reliable results with significantly uncertain parameters. Wang et al. [67] innovated an unbiased estimation method using statistical moments of experimental observations to quantify system uncertainty bounds, leveraging Legendre PCE for response approximation to boost computational efficiency. Moreover, an interval fitting degree metric was established to assess the concordance between computational and experimental response intervals. Utilizing a GPR model, Wan and Ni [68] presented a Bayesian optimization approach with an acquisition function balancing exploration and exploitation for dynamic response bound estimation in train-bridge systems. This optimization transforms a complex black-box function into a tractable maximization problem. Similarly, Dang et al. [69] developed an infill sample criterion within the Bayesian optimization framework to identify promising points for minimization or maximization based on previous observations. Cicirello and Giunta [70] evaluated various acquisition functions, including probability of improvement, expected improvement, and confidence bounds, for their accuracy and efficiency in calculating response bounds. Most recently, Shi and Beer [71] introduced a deep learning-driven method combining deep neural networks with intelligent optimization algorithms for interval analysis. An innovative extremal value-oriented weighting technique variably weights training samples within the loss function, enhancing the neural network accuracy in predicting structural output extremal values. An adaptive framework was also established to optimally balance global exploration and local exploitation, yielding a robust and accurate predictive model.

Fig. 5
figure 5

Response analysis under fuzzy input variables

2.1.3 Response analysis with hybrid uncertainties

Aleatory and epistemic uncertainties can independently affect the input parameters of a structural system, necessitating response analysis methods tailored to each uncertainty type, as discussed in Sects. 2.1.1 and 2.1.2. However, these uncertainties can also coexist within the structural system's input parameters, presenting a scenario of hybrid uncertainties. Traditional response analysis for structural systems under hybrid uncertainties typically involves nested sampling, where each epistemic distribution sample in the outer loop triggers an inner loop of sampling across aleatory probability distributions, applied to the actual physical model. This approach imposes a substantial computational demand, particularly when dealing with time-intensive finite element models. Consequently, exploring response analysis methods that leverage ML models for structural systems facing hybrid uncertainties offers considerable engineering significance.

Leveraging PCE, Eldred et al. [72] developed a sophisticated response analysis methodology that merges stochastic expansions for statistical computation with interval optimization for determining bounds, interval-valued probability, and second-order probability. For challenges involving hybrid uncertainties delineated by parameterized probability-boxes, Liu et al. [73] introduced an uncertainty propagation technique to ascertain the probability bounds of system response. This method initially employs a univariant dimension reduction approach for estimating statistical moment bounds of the response, followed by adopting a family of Johnson distributions to model the response distribution function, and culminating with an optimization strategy to pinpoint the probability bounds of the response function. Addressing the analysis of periodic composite structural–acoustic problems under hybrid uncertainties, where aleatory uncertainties are represented by bounded random variables and epistemic uncertainties by interval and evidence variables, Chen et al. [74] proposed an evidence-theory-based polynomial expansion method. This method is predicated on transforming all such variables into evidence variables. Additionally, Chen et al. [75] devised a homogenization-based Gegenbauer polynomial expansion technique for calculating the mean and variance bounds of the response, building upon a homogenization-based hybrid stochastic interval perturbation method for forecasting periodic composite structural–acoustic systems under hybrid uncertainties. For structures with input hybrid uncertainties characterized by both random and interval variables, Wang and Matthies [76] introduced two distinct interval-random models alongside a universal numerical response analysis method. The first model treats uncertain parameters as either random or interval variables with deterministic distributions, whereas the second model quantifies uncertain parameters as random interval variables, with interval variable bounds represented as random variables. Moreover, a cross-validation-based adaptive radial basis function method is developed to enhance computational efficiency. Utilizing GPR, Wei et al. [77] crafted a non-intrusive imprecise probabilistic integration method for analyzing structures with hybrid uncertainties defined by distributional probability-boxes, considering numerical errors as a form of epistemic uncertainty to derive posterior variance predictions and establish an adaptive learning framework for improved computational precision. Hong et al. [78] introduced a Bayesian optimization technique to integrate three categories of uncertainty models including probability models, imprecise probability models, and non-probabilistic models, and to bound arbitrary probabilistic response measures. This method employs Bayesian optimization in the epistemic uncertainty subspace and Bayesian cubature in the aleatory uncertainty subspace, adaptively generating training samples in the joint uncertainty space to bolster computational accuracy. For engineering structures with hybrid uncertainties encompassing random variables, interval variables, and fuzzy variables, Liu et al. [79] proposed an efficient hybrid uncertainty propagation method based on a multi-fidelity Kriging surrogate model. This approach derives analytical solutions for response mean and variance under different epistemic uncertainty realizations based on the surrogate model, and establishes an adaptive learning framework based on the obtained analytical solutions to enhance the accuracy of hybrid uncertainty propagation estimations.

2.2 Reliability analysis

Currently, reliability analysis methodologies leveraging conventional ML techniques, such as Kriging, GPR, PCE, and SVM, have been extensively explored, with numerous review articles on these topics published in Refs. [80,81,82]. Given the exceptional capability of neural networks to tackle intricate engineering challenges and considering the scarce number of review articles focusing on neural network-assisted reliability analysis approaches, this section is dedicated to reviewing and summarizing reliability analysis methodologies grounded in neural networks. These approaches are categorized based on the types of information utilized to train the neural networks for reliability analysis. They are divided into data-driven neural network approaches, which rely solely on data information for constructing the neural networks, and physics-informed neural network approaches, which incorporate physical information in the construction process. The recent progress in both categories is reviewed in the following subsections.

2.2.1 Data-driven neural networks for reliability analysis

Typically, the fundamental approach to utilizing neural networks for solving reliability analysis problems involves initially creating a surrogate model to emulate the actual performance function or finite element model. This surrogate model is then integrated with classical reliability analysis methodologies such as the first-order reliability method (FORM), second-order reliability method (SORM), or MCS, among others, for conducting reliability analysis [83]. By circumventing the time-intensive process of invoking the finite element models, neural networks significantly enhance the computational efficiency of reliability analysis. For example, Papadrakakis et al. [84] studied the integration of the ANN with MCS for analyzing the reliability of complex structural systems, specifically those prone to failure due to plastic collapse. At the turn of the century, Hurtado and Alvarez [85] further explored the potential of various neural network architectures, loss functions, sampling strategies, and application scenarios in the context of structural reliability analysis. Recently, Beheshti Nezhad et al. [86] devised a neural network-based response surface method by amalgamating it with FORM, SORM, and MCS for reliability analysis. This method employs all neurons from preceding layers to generate neurons in the subsequent layer, while simultaneously optimizing the neural network’s structure and model parameters during the training phase, thereby enhancing the precision of computational predictions. Clearly, the performance of this method is inherently influenced by the effectiveness of FORM, SORM, and MCS. Cheraghi et al. [87] integrated ANN, MCS, and the control variable technique to ascertain the reliability of various engineering systems, employing importance sampling and uniform sampling methods to create training samples. Xiang et al. [88] introduced a reliability analysis technique by merging an interpretable deep generative network with the importance sampling technique, where the network comprises a deep generative network and a monotonic network, facilitating the network’s ability to fit and sample from target distributions while remaining interpretable. The results indicate that the method may entail higher computational costs when addressing problems with small failure probabilities. Utilizing the BNN, Zhang et al. [89] determined the failure curve and its associated uncertainty in civil structures by combining the BNN with MCS, and Wang et al. [90] calculated seismic reliability by merging BNN with first passage probability theory. Li et al. [91] crafted two types of surrogate performance function models based on stochastic configuration networks, integrating these models with MCS for reliability analysis. To enhance the method’s efficacy for high-dimensional reliability analysis, an improved high-dimensional model representation is employed to decompose the multivariate function into several low-dimensional functions, with the stochastic configuration networks approximating all the low-dimensional component functions simultaneously. Beyond the afore-mentioned methods of computing reliability by merging neural networks with various classical reliability analysis techniques, another approach to determining reliability derives from the definition of reliability represented by integration. In this vein, Li et al. [92] developed a method using multiple correlation neural networks, where a set of ANNs is constructed to approximate the safety domain of structural reliability, which then connects with improved dual neural networks to form the multiple correlation neural networks method. Through the correlation of ANNs, the reliability solution, represented by multiple definite integrals with implicit integral domains, can be obtained.

The reliability analysis methods discussed previously adhere to a one-step sampling strategy. As outlined in Sect. 2.1.1, a significant drawback of this approach in ML applications is its inability to guarantee the computational accuracy of reliability analysis outcomes. To address this and ensure computational accuracy, the adaptive learning strategy has emerged as a prominent solution. This strategy enhances and refines ML models by iteratively incorporating newly identified samples as additional training data throughout the process of reliability analysis until a specified accuracy level is achieved. Generally, three key principles guide the identification of new training samples: (i) pinpointing points proximate to the limit state surface; (ii) selecting points characterized by significant prediction errors or variances; (iii) identifying points situated at a considerable distance from existing training samples. Employing any combination of these principles enables the formulation of adaptive learning ML models. Figure 6 presents a comparative analysis of employing these different principles for the identification of new training samples.

Fig. 6
figure 6

A comparative analysis of different principles for the identification of new training samples

Adhering to the principle of selecting points proximate to the limit state surface for new training samples, Shao and Murotsu [93] proposed an adaptive learning algorithm for reliability analysis. This method, which represents one of the earliest studies on adaptive learning in reliability analysis using the ANN, integrates the ANN with orthogonal factorial designs [94] to enhance computational efficiency and accuracy. Lieu et al. [95] crafted an adaptive surrogate model based on the ANN within the MCS framework for structural reliability analysis. This model unfolds across two principal stages: the global prediction phase and the local prediction phase. Initially, critical points along the boundary of the limit state surface and their immediate surroundings are pinpointed through the ANN's global predictions. Subsequently, a threshold is set to transition from a global predictive model to a localized one, aimed at refining the approximation of the limit state surface by eliminating previously considered non-critical and noisy points. The results demonstrate that the approach can significantly reduce computational time, particularly for problems involving complex and large-scale finite element models. Pursuing the strategy of selecting points distanced from current training samples, Li and Wang [96] engineered a high-dimensional data abstraction framework to navigate high-dimensional reliability analysis challenges. This framework introduces a failure-informed autoencoder network to diminish dimensionality and an ANN to bridge high-dimensional input parameters with low-dimensional latent variables. In this method, a distance-based sampling strategy is used to iteratively identify critical training samples for improving the reliability analysis accuracy. The results highlight the framework's strong performance in addressing high-dimensional reliability analysis problems. Targeting the selection of points both near the limit state surface and remote from existing training samples, Xiang et al. [97] unveiled an adaptive reliability analysis methodology leveraging the ANN and a weighted sampling technique. This method updates the ANN to pinpoint candidate experimental points near the limit state surface, utilizing a weighted sampling approach to select new training samples from these candidates. The applications demonstrate the method's strong potential for effectively addressing highly nonlinear problems. Expanding this adaptive learning paradigm, Bao et al. [98] introduced a subset searching-based neural network methodology for reliability analysis. Implementing a global-to-local strategy, this method iteratively identifies crucial subsets adjacent to the limit state surface, updating the ANN model to enhance reliability analysis. Furthermore, Liu et al. [99] developed an adaptive stochastic configuration network ensemble strategy for reliability analysis, amalgamating three stochastic configuration network base models and assigning weights for response prediction. This strategy mitigates outliers, bolstering prediction accuracy and stability. An adaptive learning function that incorporates data concerning points near the limit state surface and the distance between untrained and existing training points is utilized to discern new training samples.

It's important to understand that while the Kriging model or GPR inherently provides prediction variance for each predicted response, the traditional ANN does not offer a direct means to acquire such prediction variance. Consequently, establishing an adaptive learning framework that aims to select points with substantial prediction errors or variances as new training samples on the basis of ANN is not straightforward. To navigate this challenge, Gomes [100, 101] suggested the development of multiple ANN models, each varying in the number of neurons within the hidden layer, to predict the response for each input sample. Thus, for each input sample, there would be multiple predicted responses, with each prediction corresponding to a specific state, i.e., safety or failure, of the structures. This structural state information can then be leveraged to determine the misclassification probability for each input sample, effectively quantifying the prediction error or variance in classification for each input. Subsequently, points exhibiting significant prediction errors or variances can be identified as new training samples to update the ensemble of ANN models. The process for this adaptive learning framework is depicted in Fig. 7. Additionally, Ren et al. [102] formulated another adaptive learning approach predicated on prediction errors or variances relative to input samples. This approach involves a surrogate ensemble method that synergizes the Kriging model with the ANN model. Through the use of cross-validation and Jackknife techniques to approximate prediction errors, points with considerable prediction inaccuracies and proximate to the limit state surface are earmarked as new training samples. It is important to note that, due to the need to construct multiple ML models simultaneously, the training cost of these approaches [100,101,102] is typically high.

Fig. 7
figure 7

The adaptive framework of using multiple ANN models for reliability analysis

2.2.2 Physics-informed neural networks for reliability analysis

Contrasted with data-driven neural network methodologies, PINN techniques integrate physical information directly into the neural network's training process, often requiring minimal or no labeled data. This attribute renders PINNs exceptionally efficient for tackling reliability analysis challenges by bypassing the laborious task of acquiring extensive labeled datasets. In this domain, Meng et al. [103] synergized the PINN technique with the FORM to forge an effective solution for structural reliability analysis, eliminating the need for calculating actual structural responses. Their approach employs a loss function model that incorporates elements related to the PDE itself, the initial and boundary conditions, the limit state surface, and the most probable point, utilizing an adaptive weighting strategy to harmonize the contributions of different loss function components. The results indicate that the method's superiority becomes more evident as the complexity and nonlinearity of the PDEs increase. Bai and Song [104] introduced a first-passage reliability analysis method utilizing PINNs, where PDEs articulate the dynamic characteristics of the first-passage models and are harnessed to formulate the neural network's training loss functions. Das and Tesfamariam [105] merged the PINN technique with the probability density evolution method for resolving generalized density evolution equations and assessing the reliability of stochastic dynamic systems, thereby mitigating the need for interpolation or coordinate transformation typically encountered in numerical schemes and reducing computational demands. Zhou et al. [106] crafted a PINN-based method for system reliability analysis, enabling the embedding of the system’s properties within the neural network configuration and its training process. The method provides a distinct ability to integrate measurement data into reliability models. Li et al. [107] proposed a physics-informed ensemble learning strategy for fatigue reliability analysis of aero-engine blade-disc systems. This method deconstructs the complex multi-component reliability analysis into manageable single-component analyses, determining system reliability by quantifying failure correlation using the Copula function. Shi and Beer [108] developed a physics-informed neural network classification framework by simultaneously considering the output values and states for reliability analysis. The framework offers a novel strategy for measuring prediction uncertainty and developing an adaptive learning approach for reliability analysis.

The most established and straightforward PINN-assisted reliability analysis method is the approach known as PINN for reliability analysis (PINN-RA), developed by Chakraborty [109]. In this method, not only spatial and temporal variables but also random variables that could influence the ingress process are considered as inputs to the neural network. The training framework follows the traditional PINN approach, constructing the loss function using PDE loss, initial condition loss, and boundary condition loss. Collocation points are utilized to estimate the loss for each component. Once the PINN model is properly constructed, it is combined with MCS for reliability analysis. As PINN-RA avoids direct computation of responses based on PDEs, it is often referred to as a simulation-free method and demonstrates enhanced efficiency in analysis. However, a primary drawback of PINN-RA is its reliance on traditional neural network training methods, which may result in insufficient accuracy if collocation points near the limit state surface are sparse, leading to poor performance in identifying the limit state surface accurately. To address this limitation, Zhang and Shafieezadeh [110] developed the adaptively trained PINN for reliability analysis (AT-PINN-RA) approach. AT-PINN-RA dynamically learns from the PINN model's responses to identify the limit state surface and adaptively refocuses training on regions critical for failure probability characterization. A key innovation of this method is the introduction of a small set of uniformly distributed anchor points in the design space of random variables. Collocation points are then grouped based on their proximity to these anchor points. During each training iteration, responses for these anchor points are used to estimate the probability of selecting new points in various groups. This adaptive sampling process ensures that training points closer to the limit state surface receive higher sampling priority, thereby concentrating the training on critical regions. It is important to note that in the AT-PINN-RA approach, once the PINN model is well-constructed, it is combined with MCS for reliability analysis. Consequently, this approach may lose efficiency in problems with small failure probabilities. To overcome this efficiency challenge, Song et al. [111] proposed the importance sampling-based adaptive training PINN for reliability analysis (IAT-PINN-RA) approach. The core idea of IAT-PINN-RA involves a pre-training phase to generate a proposal importance sampling distribution and corresponding samples. Initially, it generates a smaller set of collocation points in the design space to train a preliminary PINN model, capturing the general trend of the field variable of interest. Subsequently, a proposal importance sampling distribution is established to generate more samples near the domain adjacent to the limit state surface, enhancing the efficiency of failure probability calculation. However, the applicability of the method to high-dimensional problems remains to be thoroughly investigated. The comparison of PINN-RA, AT-PINN-RA, and IAT-PINN-RA is shown in Fig. 8 and Table 2.

Fig. 8
figure 8

The comparison of PINN-RA, AT-PINN-RA, and IAT-PINN-RA

Table 2 The key innovations and demerits of PINN-RA, AT-PINN-RA, and IAT-PINN-RA

3 Inverse uncertainty quantification analysis

Conceptually, inverse analysis aims at identifying/inferring, from a set of (noisy or noise-free) observations, the values or other features of causal factors that result in them. Inspired by this definition, several typical subtasks in the UQ framework, such as probabilistic model updating/calibration, global sensitivity analysis, and design optimization under uncertainty, can be attributed as inverse problems. The probabilistic model updating (PMU) is commonly formulated as inferring probability models for the parameters and/or bias of a computational model, this way to reduce the uncertainty of the model predictions; the global sensitivity analysis (GSA) methods, also called variable importance analysis (VIA) methods, are developed for studying the relative contributions of input parameters to that of model outputs, so as to rank the importance of input parameters, identify the parameters to be calibrated, and identify the input parameters which can be fixed without affecting the uncertainties of model responses; and the design optimization under uncertainties is the process of searching the global optimal values of design parameters given the targets and/or constraints formulated by probabilities, and this set of problems include, e.g., reliability-based design optimization (RBDO), robust design optimization (RDO), expected life-cycle optimization, and optimal experiment design under uncertainty.

Many review articles have been published in the past decade with a focus on one of the above three groups of inverse UQ subtasks, or even only one branch of one subtask, as that, the above problems exist in broad disciplines, and different scenarios, due to the variety of sources and alternative types of uncertainties as well as the complexities of problems, may be of interest. For example, for PMU, Bi et al. [112] have presented an overview of the general Bayesian model updating (BMU) framework for stochastic model parameters by introducing the imprecise probability modelling framework; Lye et al. [113] reviewed the sampling-based methods for BMU of deterministic-but-unknown parameters; Wu et al. [114] presented an overview of the developments for probabilistic updating of both model parameters and bias, with the specific focus on application to nuclear system thermal–hydraulics codes; Winter et al. [115] developed an overview of broaden categories of ML methods, which includes normalizing flows, Bayesian coresets, distributed Bayesian inference and variational Bayesian inference, for probabilistic updating of model parameters. For GSA, Wei et al. [116], Borgonovo and Plischke [117], Iooss and Lemaître [118], as well as Antoniadis et al. [119], presented an overview of general methodology frameworks and algorithms under the standard probabilistic model and/or data-driven frameworks. In terms of design optimization under uncertainty, both Ling et al. [120] as well as Moustapha and Sudret [121] have presented a comprehensive review of the surrogate model based methods for solving the nested probabilistic RBDO problem, and Jerez et al. [122] developed a comprehensive investigation on the methods for RBDO of structures under stochastic and dynamic excitations; Ryan et al. [123] has developed a survey of both methodology framework and algorithms for Bayesian optimal experiment design, of which the aim is to search the most effective design of experiments or the optimal placement of sensors for reducing the uncertainties of model parameter via BMU. Besides, Thelen et al. [124] have recently presented an overview of methods for all the above-mentioned inverse UQ problems with a special focus on digital twins.

Despite the above-mentioned reviews on inverse UQ analysis, no publication has focused on all these inverse UQ subtasks. In what follows of this subsection, the formulations and state-of-the-art developments for two types of inverse UQ problems, i.e., PMU and design optimization under uncertainties, under standard probabilistic model framework will be reviewed, with a special focus on ML based developments.

3.1 Probabilistic model updating

As stated above, model updating aims at reducing the model prediction uncertainty by calibrating the model parameters and/or bias by inference from observations of model response. Here, we only concern the Bayesian (probabilistic) model updating framework. Let \({\boldsymbol{x}}={({x}_{1},{x}_{2},\dots ,{x}_{d})}^{\rm T}\in {\mathbb{X}}\subset {\mathbb{R}}^{d}\) and \({\boldsymbol{\theta}}={({\theta }_{1},{\theta }_{2},\dots ,{\theta }_{n})}^{T}\in {\mathbb{T}}\subset {\mathbb{R}}^{n}\) represent the d-dimensional input variables (e.g., time/location variables, controllable excitations, etc.) and the n-dimensional deterministic-but-unknown model parameters (e.g., material properties, boundary/initial conditions, etc.), respectively. Further, denoted by \(\varepsilon ({\boldsymbol{x}})\) the experimental observation and \(\mathcal{M}=({\boldsymbol{x}},{\boldsymbol{\theta}})\) the model prediction, both at \({\boldsymbol{x}}\), then the difference between observation \(\varepsilon\) and model prediction \({y}^{M}\) can be formulated, universally, as:

$$\varepsilon ({\boldsymbol{x}})=\lambda ({\boldsymbol{x}})\mathcal{M}({\boldsymbol{x}},{\boldsymbol{\theta}})+\delta ({\boldsymbol{x}})+\epsilon ({\boldsymbol{x}})$$
(1)

where \(\lambda ({\boldsymbol{x}})\) and \(\delta ({\boldsymbol{x}})\) refer to the multiplicative and additive model bias, respectively, both of which represent the systematic model uncertainty caused by simplifications of physical processes, and the stochastic process \(\epsilon ({\boldsymbol{x}})\) indicates the random measurement error. For simplicity, the multiplicative bias is omitted and the random measurement error is assumed to be independent and identically distributed across the support \({\mathbb{X}}\), Eq. (1) can then be simplified as:

$$\varepsilon ({\boldsymbol{x}})=\mathcal{M}({\boldsymbol{x}},{\boldsymbol{\theta}})+\delta ({\boldsymbol{x}})+\epsilon$$
(2)

Given a set of observations \(\mathcal{Y}={({{\boldsymbol{y}}}^{(1)},{{\boldsymbol{y}}}^{(2)},\dots ,{{\boldsymbol{y}}}^{(N)})}^{T}\) at a set of input sites \(X={({{\boldsymbol{x}}}^{(1)},{{\boldsymbol{x}}}^{(2)},\dots ,{{\boldsymbol{x}}}^{(N)})}^{T}\) with \({{\boldsymbol{y}}}^{(k)}=\varepsilon ({{\boldsymbol{x}}}^{(k)})\), as well as some prior assumptions on model parameters \({\boldsymbol{\theta}}\), model bias \(\delta ({\boldsymbol{x}})\) and measurement noise \(\epsilon\), the process of updating of the probabilistic models of \({\boldsymbol{\theta}}\) and \(\delta ({\boldsymbol{x}})\) using a Bayesian inference scheme is known as the Kennedy and O’Hagan (KOH) framework. Methods towards this problem have been reviewed by Wu et al. [114]. In this section, due to the limited space, we only concern the updating of the model parameters \({\boldsymbol{\theta}}\). For this setting, the problems can still be grouped into two categories. For the first case, \({\boldsymbol{\theta}}\) is assumed to be deterministic-but-unknown, and thus only epistemic uncertainty appears and is characterized by probabilistic models. For the second case, as explained by Bi et al. [112], \({\boldsymbol{\theta}}\) is intrinsically random with unknown probability distribution, and thus both aleatoric and epistemic exist, and the aim of model updating is then to reduce the epistemic uncertainty concerning the probability distribution types and parameters. Below only the first case is of concern.

Given the observations \(\mathcal{D}(\mathcal{X},\mathcal{Y})\) and probability density \(f\left(\epsilon \right)\) of the measurement noise \(\epsilon\), the likelihood function can be formulated as \(p(\mathcal{D}|{\boldsymbol{\theta}})={\prod }_{k=1}^{{N}_{o}}f({y}^{(k)}-\mathcal{M}({{\boldsymbol{x}}}^{(k)},{\boldsymbol{\theta}}))\) since the model bias \(\delta \left({{\boldsymbol{x}}}^{(k)}\right)\) at each site is omitted. Then, given the prior density of \({\boldsymbol{\theta}}\) as \(p\left({\boldsymbol{\theta}}\right)\), its posterior density can be generated based on Bayes formula as:

$$p(\mathcal{D}|{\boldsymbol{\theta}})=\frac{p(\mathcal{D}|{\boldsymbol{\theta}})p\left({\boldsymbol{\theta}}\right)}{p\left(\mathcal{D}\right)}$$
(3)

where \(p(\mathcal{D})={\int }_{\mathbb{T}}^{\phantom{0}}p(\mathcal{D}|{\varvec{\theta}})p\left({\varvec{\theta}}\right)d{\varvec{\theta}})\) is the normalizing constant, also called marginal likelihood and evidence, which indeed acts as a measure of the quality of the fitness of the model to the data \(\mathcal{D}\). The core of BMU is to estimate the posterior density \(p({\boldsymbol{\theta}}|\mathcal{D})\), which generally cannot be derived analytically unless \(p(\mathcal{D}|{\boldsymbol{\theta}})\) and \(p\left({\boldsymbol{\theta}}\right)\) are a pair of conjugate distributions. Developing efficient numerical algorithms for estimating \(p({\boldsymbol{\theta}}|\mathcal{D})\) or generating samples following it has then been one of the main directions of BMU, and ML has played an important role. The methods developed for this purpose can be grouped into five categories: 1) Markov chain Monte Carlo (MCMC) sampling, 2) Variational Bayesian inference (VBI), 3) Bayesian updating with structural reliability analysis (BUS), 4) Transport map, and 5) Bayesian filter.

3.1.1 Markov chain Monte Carlo sampling

The MCMC sampling scheme, as its name reveals, is composed of Markov chain and Monte Carlo simulation, and uses the discrete Markov chain to sequentially produce random samples \({{\boldsymbol{\theta}}}^{\left(k\right)}\) for \(k=\mathrm{1,2},\dots ,N\) following target density \(p({\boldsymbol{\theta}}|\mathcal{D})\). It is primely based on the Markov property, which means that the next state \({{\boldsymbol{\theta}}}^{\left(k+1\right)}\) is uniquely dependent on the current state \({{\boldsymbol{\theta}}}^{\left(k\right)}\), and is independent of all the states prior to \({{\boldsymbol{\theta}}}^{\left(k\right)}\). Practically, given \({{\boldsymbol{\theta}}}^{\left(k\right)}\), a candidate sample \({{\boldsymbol{\theta}}}^{cand}\) is generated from a proposal density \(q({\boldsymbol{\theta}}|{{\boldsymbol{\theta}}}^{(k)})\), which is required to be symmetric and can be set as, e.g., Gaussian or uniform distribution centered at \({{\boldsymbol{\theta}}}^{\left(k\right)}\); \({{\boldsymbol{\theta}}}^{cand}\) is then accepted as the next state \({{\boldsymbol{\theta}}}^{\left(k+1\right)}\) with probability computed by:

$$\alpha =\mathrm{min}(1,\frac{p({{\boldsymbol{\theta}}}^{cand}|\mathcal{D})}{p({{\boldsymbol{\theta}}}^{(k)}|\mathcal{D})}\cdot \frac{q({{\boldsymbol{\theta}}}^{(k)}|{{\boldsymbol{\theta}}}^{cand})}{q({{\boldsymbol{\theta}}}^{cand}|{{\boldsymbol{\theta}}}^{(k)})})=\mathrm{min}(1,\frac{p({{\boldsymbol{\theta}}}^{cand}|\mathcal{D})}{p({{\boldsymbol{\theta}}}^{(k)}|\mathcal{D})})$$
(4)

where the second quality results from the symmetry of the proposal density. Intuitively, the above two steps prefer to produce samples approaching the high probability mass areas of the posterior density. Furthermore, as has been proved theoretically, given the symmetric property of the proposal density, initialized with a state may or may not follow the target distribution, the Markov chain will for sure asymptotically converge to stationary distribution which is exactly \(p({\boldsymbol{\theta}}|\mathcal{D})\) [125]. The process approaching the stationary distribution is called “burn-in period”, and states in this period need to be discarded, but it is commonly non-trivial to identify its end point. The above two-step sampling scheme is called Metropolis–Hastings (MH) sampler [126]. It is obvious that the computation of the evidence has been avoided in the two steps, making it the most attractive method for all numerical Bayesian inverse inference tasks.

The main issues with the MH sampler are the difficulty in determining the burn-in period and low acceptance rate, especially in the cases where the target density departs from the prior density, the dimension of \({\boldsymbol{\theta}}\) is high, and/or the target density intrinsically shows multi-modal behavior. These are mainly due to the correlation between states. Some improvements have been proposed for alleviating the above limitations, and typical methods include, the transitional MCMC (TMCMC) [127, 128], which improves the acceptance rate by adaptively approaching the target density with a set of bridge densities, and the Hamiltonian Monte Carlo (HMC) [129,130,131], which uses the Hamiltonian dynamics to produce candidate samples in the augmented space of position (e.g., \({\boldsymbol{\theta}}\)) and momentum, and is of superiority in terms of efficiency due to the high acceptance rate and short burin-in period, even for distant target density, but is also of limitation due to the necessity of computing the gradient of likelihood function. Sone other improvements include the covariance-based MCMC [132], the randomize-then-optimize (RTO) approach [133], the pseudo-marginal MCMC [134], the orthogonal parallel MCMC [135], the multiple-try Metropolis [136], just to name a few. Despite these improvements, the high computational cost resulting from the repeated call of the expensive-to-evaluate simulator at each state prevents the application of these MCMC algorithms to large-scale engineering problems, and many ML schemes have been combined with them to accelerate the computation. For example, Drovandi et al. [137] presented a strategy for accelerating the pseudo-marginal MCMC algorithm by approximating the log-likelihood function with the GPR model which is trained using a short pilot run of the Markov chain with Metropolis algorithm. Based on the emulation of the log unnormalized posterior distribution with the GPR model, Paun and Husmeier [138] proposed a delayed acceptance learning strategy for accelerating the HMC algorithms. Conversely, the BMU methods driven by those MCMC algorithms have also been extended to the deterministic ML models, e.g., deep neural network, deep operator network and PINN, to enable them to quantification the prediction uncertainty, see e.g., Refs. [139, 140] for overviews.

3.1.2 Variational bayesian inference

VBI consists of a group of methods for approximating the posterior density with a parameterized density (called variational density) with the parameters estimated by minimizing the distance between variational and target distributions. Let denote by \({q}_{\vartheta }({\boldsymbol{\theta}})\) a variational density parameterized by \(\boldsymbol{\vartheta }\), which can assume to be e.g., a Gaussian mixture model. Then the Kullback–Leibler (KL) divergence between \({q}_{\vartheta }\) and \(p\) is formulated as:

$$KL({q}_{\vartheta },p)=\int {q}_{\vartheta }({\boldsymbol{\theta}})\mathrm{ln}\frac{{q}_{\vartheta }({\boldsymbol{\theta}})}{p({\boldsymbol{\theta}}|\mathcal{D})}d{\boldsymbol{\theta}}$$
(5)

which equals one if and only if \({q}_{\vartheta }({\boldsymbol{\theta}})\equiv p({\boldsymbol{\theta}}|\mathcal{D})\), and gets bigger given the higher divergence between \({q}_{\vartheta }\) and \(p\). The problem is then formulated as finding the global minima of \(KL({q}_{\vartheta },p)\), which is equivalent to maximize the evidence lower bound (ELBO) [141, 142]:

$$ELBO({q}_{\vartheta })=\int {q}_{\vartheta }({\boldsymbol{\theta}})\mathrm{ln}\frac{p(\mathcal{D}|{\boldsymbol{\theta}})p({\boldsymbol{\theta}})}{{q}_{\vartheta }({\boldsymbol{\theta}})}d{\boldsymbol{\theta}}$$
(6)

as \(\mathrm{log}p(\mathcal{D})=ELBO({q}_{\vartheta })+KL({q}_{\vartheta },p)\equiv \mathrm{constant}\). Under the above VBI framework, many gradient-based and gradient-free algorithms have been developed. For example, Acerbi [142] combined the VBI with the Bayesian Monte Carlo method for active learning of the evidence \(p(\mathcal{D})\); Chen et al. [143] developed proposed a gradient-free VBI method, called Gaussian mixture Kalman inversion (GMKI), based on the Kalman filter and Fisher-Rao gradient flow in probability space, for efficiently estimating the multi-modal posterior density; Ni et al. [144] proposed a stochastic gradient descent method by approximating the log likelihood function, the ELBO and its gradient with adaptive GPR model for searching the global optima of \(\boldsymbol{\vartheta }\). Hong et al. [145] proposed a gradient-free VBI scheme by combining Bayesian optimization and Bayesian quadrature sequentially for efficiently infering the posterior density and model evidence with desired accuracy. Besides the above-mentioned VBI solvers using ML method, the VBI has also been introduced for inferring parameters of ML models, for example, Bayesian deep operator network [146], to improve the interpretability of these models. This commonly requires estimating the posterior distribution of high-dimensional model parameters. Related developments in this direction include the automatic differentiation variation inference (ADVI), a user-friendly package for automatically deriving an efficient algorithm for inferring the posterior probability models [147], the black box variational inference scheme which is based on a stochastic optimization algorithm with the noisy gradient being derived from Monte Carlo samples from the variational distribution [148], etc. One can refer to Ref. [149] for a comprehensive review of related approximate inference methods. Besides, some recent developments, see, for example, Refs. [150,151,152], have also devoted themselves to the real-time prediction of the posterior, by learning inverse maps based on neural networks.

3.1.3 Bayesian updating with structural reliability analysis

BUS reformulates the BMU problem as a structural reliability problem, and then solves the problem with sampling-based or ML-based structural reliability analysis methods [153]. By introducing an auxiliary input parameter \(\vartheta\) following uniform distribution within \(\left[\mathrm{0,1}\right]\), the BMU problem is then reformulated as estimating the failure probability, denoted as \({p}_{f}\), that the artificial limit state function (LSF) \(g({\boldsymbol{\theta}},\vartheta )=\vartheta -c\cdot p(\mathcal{D}|{\boldsymbol{\theta}})\) being less than zero, given that \({\boldsymbol{\theta}}\) is distributed according to the prior distribution \(p({\boldsymbol{\theta}})\), and \(c\) is a constant satisfying \(c\cdot p(\mathcal{D}|{\boldsymbol{\theta}})\le 1\) for any \({\boldsymbol{\theta}}\in {\mathbb{T}}\), and can thus be specified as \({c}^{-1}=\underset{\theta \in {\mathbb{T}}}{\mathrm{max}}p(\mathcal{D}|{\boldsymbol{\theta}})\). Given the above reformulation, the posterior density is formulated as:

$$p({\boldsymbol{\theta}}|\mathcal{D})=\frac{1}{c\cdot p(\mathcal{D})}{\int }_{0}^{c\cdot p(\mathcal{D}|{\boldsymbol{\theta}})}p({\boldsymbol{\theta}})d\vartheta$$
(7)

where the evidence \(p(\mathcal{D})\) can be reformulated as:

$$p(\mathcal{D})=\frac{1}{c}\underset{\mathbb{T}}{\int }{\int }_{0}^{c\cdot p(\mathcal{D}|{\boldsymbol{\theta}})}p({\boldsymbol{\theta}})d\vartheta d{\boldsymbol{\theta}}=\frac{1}{c}{p}_{f}$$
(8)

The above reformulation enables solving the BUS problem by fixing two key issues, i.e., estimation of the constant \(c\), and generation of joint samples \(({{\boldsymbol{\theta}}}^{(j)},{\vartheta }^{(j)})\) distributed according to \(p({\boldsymbol{\theta}})q(\vartheta |{\boldsymbol{\theta}})\) with \(q(\vartheta |{\boldsymbol{\theta}})\) being the uniform distribution within support \([0,c\cdot p(\mathcal{D}|{\boldsymbol{\theta}})]\). Motivated by the above rationale, some advanced MCS method, such as the subset simulation [154], has been extended to solve the BMU problem. Further, ML methods, especially those equipped with active learning and originally developed for estimating small failure probability, have also been devised for the above purpose, typical examples include the combination of subset simulation and neural network [155], the combination of importance sampling, GPR model and active learning [156], as well as a two-step method [157], which first implements Bayesian optimization for estimating \(c\), and then a parallel and Bayesian active learning of subset simulation for estimating \({p}_{f}\) and generating the joint posterior samples. It should be noted that, in case the posterior and prior density have a large divergence, which happens with many observations, the resultant failure probability \({p}_{f}\) can be extremely small (e.g., 10–9), and the resultant LSF can be of weird behavior. Developing advanced ML methods for addressing the above two challenges with better performance is still a promising direction.

3.1.4 Transport map

The basic idea behind the transport map is to construct a deterministic invertible transformation map, named as transport map, between a target probability distribution (for BMU, the posterior distribution) and a simple reference probability distribution (for BMU, the prior distribution), and this way to enable generating arbitrarily many independent samples following the target distribution by transforming the reference samples through the map [158, 159] For convenience, the map, denoted as \({\boldsymbol{T}}({\boldsymbol{\theta}})\), usually takes a lower triangular form:

$${\boldsymbol{T}}({\boldsymbol{\theta}})=\left[\begin{array}{c}{T}_{1}({\theta }_{1})\\ {T}_{2}({{\boldsymbol{\theta}}}_{1:2})\\ \vdots \\ {T}_{n}({{\boldsymbol{\theta}}}_{1:n})\end{array}\right]\forall \theta \in {\mathbb{T}}\subseteq {\mathbb{R}}^{n}$$
(9)

with \({T}_{i}\) being the \(i\) th map component to create the \(i\) th dimension of target samples, and \({{\boldsymbol{\theta}}}_{1:i}={({\theta }_{1},{\theta }_{2},\cdots ,{\theta }_{i})}^{T}\). For ease of illustration, let denote by \({p}_{tar}({\boldsymbol{\theta}})=p({\boldsymbol{\theta}}|\mathcal{D})\) and \({p}_{ref}({\boldsymbol{\theta}})=p({\boldsymbol{\theta}})\) as the target and reference distribution respectively in the context of BMU, and further, \({P}_{tar}({\boldsymbol{\theta}})\) and \({P}_{ref}({\boldsymbol{\theta}})\) as the corresponding cumulative distribution function. By definition, \(T(\cdot )\) maps a sample of \({P}_{ref}\) to that of \({P}_{tar}\), it thus holds that \({P}_{tar}\left({\boldsymbol{\theta}}\right)={P}_{ref}({{\boldsymbol{T}}}^{-1}\left({\boldsymbol{\theta}}\right))\) and \({P}_{tar}\left(T\left({\boldsymbol{\theta}}\right)\right)={P}_{ref}({\boldsymbol{\theta}})\), taking gradient operation to both sides of which respectively yields the pushforward map [159]:

$${P}_{tar}\left({\boldsymbol{\theta}}\right)={P}_{ref}({{\boldsymbol{T}}}^{-1}\left({\boldsymbol{\theta}}\right))\left|det\nabla {{\boldsymbol{T}}}^{-1}\left({\boldsymbol{\theta}}\right)\right|\triangleq {{\boldsymbol{T}}}_{\#}{p}_{ref}$$
(10)

and pull-back map:

$${P}_{ref}\left({\boldsymbol{\theta}}\right)={P}_{tar}\left[{\boldsymbol{T}}\left({\boldsymbol{\theta}}\right)\right]\left|det\nabla {\boldsymbol{T}}\left({\boldsymbol{\theta}}\right)\right|\triangleq {{\boldsymbol{T}}}^{\#}{p}_{tar}$$
(11)

where \(\nabla {{\boldsymbol{T}}}^{-1}\) is the Jacobian of the inverse of the map, which is also in lower triangular form. Therefore, once the map \({\boldsymbol{T}}\) is built, given a set of independent samples \({({{\boldsymbol{\theta}}}^{(k)})}_{k=1}^{M}\) following the reference distribution \({P}_{ref}\), a set of corresponding independent samples following \({P}_{tar}\) can be generated as \({({\boldsymbol{T}}({{\boldsymbol{\theta}}}^{(k)}))}_{k=1}^{M}\). The map \({\boldsymbol{T}}\) is required to be a smooth function with smooth inverse, and to build such a map from any sufficiently smooth function \({f}_{\xi }({\boldsymbol{\theta}}):{\mathbb{R}}^{n}\to {\mathbb{R}}\) parameterized by \({\boldsymbol{\xi}}\), Baptista et al. [159] and Marzouk et al. [160] developed the so-called “rectifier operator” \({\mathcal{R}}_{i}[\cdot ]\) as:

$${\mathcal{R}}_{i}[{f}_{{\boldsymbol{\xi}}}]({{\boldsymbol{\theta}}}_{1:i})={f}_{{\boldsymbol{\xi}}}({{\boldsymbol{\theta}}}_{1:i-1},0)+{\int }_{0}^{{\theta }_{i}}s({\partial }_{i}{f}_{{\boldsymbol{\xi}}}({{\boldsymbol{\theta}}}_{1:i-1},x))dx\triangleq {T}_{i}({{\boldsymbol{\theta}}}_{1:i-1})$$
(12)

where \(s(\cdot ):{\mathbb{R}}\to {\mathbb{R}}^{+}\) is a rectifier which can be e.g., the exponential function. The resultant operator of Eq. (12) is exactly the \(i\) th dimension of the map \({\boldsymbol{T}}\). Hence, the problem is then simplified as estimating the smooth parameterized function \({f}_{{\boldsymbol{\xi}}}({\boldsymbol{\theta}})\), or given the form of \({f}_{{\boldsymbol{\xi}}}({\boldsymbol{\theta}})\), e.g., as polynomial or radial basis function, to estimate the parameters \({\boldsymbol{\xi}}\), under some specific principles. Practically, the problem is further transformed into estimating \({\boldsymbol{\xi}}\) by minimizing the KL divergence between \({{\boldsymbol{T}}}_{\#}{p}_{ref}\) and \({p}_{tar}\), which usually turns out to be a non-convex optimization problem. In Ref. [159], polynomial basis and wavelet basis have both been investigated for adaptively estimating the transport map using a greedy method, and it has been proved theoretically that the global minimizer of this approach corresponds to the well-known Knothe–Rosenblatt (KR) map between prior and posterior densities. The theoretical approximation accuracy is investigated by Baptista et al. [161] using regularity theory and approximation theory for function approximation. Further, Maurais and Marzouk proposed a new mean-filed ODE based method for sampling from the posterior in unit time with a particular kernel Fisher-Rao gradient flow (KFRFlow) [162]. The potential transport map is thus realized within in a potential reproducing kernel Hilbert space, which is usually sufficiently more rich than the traditional parametric surrogates. The performance of transport is compared with the TMCMC scheme by Grashorn et al. [163], and it is concluded transport map has a significant increase in accuracy and efficiency compared to TMCMC. This brings prospects for a bright future of the transport map theory. Besides, the transport map has also been extended to represent the non-Gaussianality, see for example [164], which is based on building an invertible map between Gaussian and non-Gaussian random fields. This has provided new possibilities for modeling and sampling non-Gaussian processes. One can refer to Ref. [164] for details.

3.1.5 Bayesian filter

Bayesian filter algorithms were originally developed for online and probabilistic prediction of the system states based on the response observations, thus have been widely used in automatic control, object tracking, structural health monitoring, system identification, etc., but have also been extended for BUM. The commonly used Bayesian filter algorithms include the Kalman filter (KF) and its advanced versions (e.g., Unscented KF, ensemble KF and cubature KF), particle filter and its many regularized improvements, functional Bayesian filter, etc. These algorithms are usually based on the state-space representation consisting of a state transfer function and an observation function as, and for BUM, these two functions are formulated as:

$$\begin{array}{c}State \; transfer\; equation:{{\boldsymbol{\theta}}}^{(k+1)}={{\boldsymbol{\theta}}}^{(k)}+{{\boldsymbol{w}}}^{(k)}\\ Observation \; equation:{y}^{(k+1)}=M({{\boldsymbol{x}}}^{(k+1)},{{\boldsymbol{\theta}}}^{(k+1)})+{{\boldsymbol{v}}}_{k+1}\end{array}$$
(13)

where \({{\boldsymbol{\theta}}}^{(k)}\) is the state of the model parameters fusing the observations up to the instant \(k\); \({{\boldsymbol{w}}}^{(k)}\) is the process noise commonly assumed to be Gaussian with zero mean; \({{\boldsymbol{v}}}_{k+1}\) refers to the observation noise which is also assumed to be Gaussian with zero mean. A Bayesian filter algorithm usually consists of two key steps, i.e., the prediction step which predicts a prior probability distribution for \({{\boldsymbol{\theta}}}^{(k+1)}\) based on fusing the state transfer equation and the posterior distribution of \({{\boldsymbol{\theta}}}^{(k)}\), and the updating step which infers the posterior density of \({{\boldsymbol{\theta}}}^{(k+1)}\) based on the observation equation by fusing the observation \({y}^{(k+1)}\) with the prior probability distribution of \({{\boldsymbol{\theta}}}^{(k+1)}\). The above procedure can be implemented sequentially by fusing only one observation a time, or fusing all observations at once. For example, Song et al. [165] developed two adaptive unscented KF strategies based on forgetting factor and moving window covariance-matching schemes for sequentially calibrating the model parameters and estimating the covariance matrix of the measurement noise. Yoshida et al. [166] proposed an active learning method by combining the GPR model with the particle filter algorithm, denoted as PR-GPR, for updating the model parameters using all the observations as a whole.

All the above-mentioned methods for BMU are based on numerical simulators such as finite element models. Under the framework of model verification, validation, and UQ for multi-physical simulation [167], it has implicitly assumed that the numerical simulator of interest has been verified to be precise enough as the solutions of the potential PDEs. Recent developments in machine learning have made it possible to directly solve the forward and inverse UQ simultaneously, that is to solve the PDEs and infer the physical parameters in one step, or even infer the PDEs from field data. Typical developments in this direction include the PINN [21], the physical-informed Gaussian process (PIGP) [29, 168, 169], and deep operator networks [53], where the former two groups of methods are for solving PDEs and its inverse problem, while the operator networks are used for learning mappings between function spaces, allowing learning an entire family of PDEs, in contrast to PINN and PIGP which only solves one instance of the PDEs. For both forward and inverse learning, the PIGP methods are naturally capable of capturing the prediction uncertainty concerning both the forward and inverse solutions due to the probabilistic nature, but the PINN and operator networks do not as they are originally deterministic methods. Some efforts toward this direction has been proposed in the past years, such as the one using the randomized priors [56], the information bottleneck scheme [59], the conformal prediction scheme [57, 58], the velocity continuation scheme [170], and the Bayesian scheme [50,51,52, 60, 171] which numerically realized with alternative schemes. Due to limited space, we don’t go into details of these developments, and one can refer to Ref. [139] for a review of available methods and Ref. [61] for a review and a practical Python package for implementing these schemes.

Till now, we reviewed five groups of methods for BMU. For sure, except these methods, some other methods, such as Bayesian cubature [172], have also been developed, and some new ideas may on the way. Besides, the contents on updating stochastic parameters and model bias are not considered in this section due to the limited space. Before the end of this BMU topic, we present a general comparison of the reviewed methods in Table 3. As there are many realized approaches in each group, it is hard to give exact merits and demerits for each approach, only their common features are summarized.

Table 3 The comparison of methods for BMU problems

3.2 Design optimization under uncertainty

Two types of design optimization problems under uncertainties, i.e., the RBDO and RDO, will be treated here. Intuitively, the RBDO problem is defined as searching the realization of design parameters \(\mathcal{z}\in {\mathbb{Z}}\subseteq {\mathbb{R}}^{m}\) that minimize the (expected) target cost function \(c\) (e.g., weight of structures), given the constraints that the probabilities of some unexpected failure events, indicated by \({g}_{i}(x)\le 0\) being less than pre-specific tolerance thresholds, with \(i=\mathrm{1,2},\cdots ,l\), i.e.,

$$\begin{array}{cc}\mathrm{find}& \mathcal{z}\in {\mathbb{Z}}\in {\mathbb{R}}^{m}\\ \mathrm{min}& c\left(\mathcal{z}\right), or {\mathbb{E}}_{\theta }[c(\mathcal{z},{\boldsymbol{\theta}})]\\ s.t.& {p}_{f,i}\left(\mathcal{z}\right)={\mathbb{P}}_{{\boldsymbol{\theta}}}\left[{g}_{i}\left(\mathcal{z},{\boldsymbol{\theta}}\right)<0\right]<{p}_{f,i}^{*} or {p}_{f}(\mathcal{z})={\mathbb{P}}_{\theta }[{\cap }_{i=1}^{l}{g}_{i}(\mathcal{z},{\boldsymbol{\theta}})<0]<{p}_{f}^{*}\end{array}$$
(14)

where \({\boldsymbol{\theta}}\in {\mathbb{T}}\subseteq {\mathbb{R}}^{m}\) refers to the \(n\)-dimensional random input parameters; \({\mathbb{E}}_{{\boldsymbol{\theta}}}[\cdot ]\) and \({\mathbb{P}}_{{\boldsymbol{\theta}}}[\cdot ]\) are the expectation operator and probability measure taken w.r.t. \({\boldsymbol{\theta}}\).

The RDO problem, on the other hand, seeks to find the realization of the design parameters \(\mathcal{z}\), which makes the best balance between the expectation \({\mu }_{f}\left(\mathcal{z}\right)\) and variance \({\sigma }_{f}^{2}\left(\mathcal{z}\right)\) of a performance function \(f(\mathcal{z},{\boldsymbol{\theta}})\), both taken w.r.t. \({\boldsymbol{\theta}}\), given deterministic or probabilistic constraints. Mathematically, it is formulated as:

$$\begin{array}{cc}\mathrm{find}& \mathcal{z}\in \overline{)Z}\subset {\mathbb{R}}^{m}\\ \mathrm{min}& H\left({\mu }_{f}\left(\mathcal{z}\right), {\sigma }_{f}\left(\mathcal{z}\right)\right)\\ s.t.& {g}_{i}\left(\mathcal{z}\right)<0 or {\mathcal{P}}_{\theta }[{g}_{i}(\mathcal{z},{\boldsymbol{\theta}})<0]<{p}_{f}^{*}\end{array}$$
(15)

where \({\mathcal{P}}_{\theta }[\bullet ]\) can be any probabilistic operator taken w.r.t. \({\boldsymbol{\theta}}\), and if the constraints are set to be as those for RBDO in Eq. (14), the problem is then called reliability-based robust design optimization (RRDO); \(H(\cdot ,\cdot )\) is a composite function of \({\mu }_{f}\) and \({\sigma }_{f}\), and the simplest case can be assumed to be a linear combination, i.e. \(\alpha {\mu }_{f}\left(\mathcal{z}\right)+\left(1-\alpha \right){\sigma }_{f}\left(\mathcal{z}\right)\), where \(\alpha \in [\mathrm{0,1}]\) is a weighting factor. The performance function \(f(\mathcal{z},{\boldsymbol{\theta}})\) can be e.g., the maximum stress/displacement responses, thus, instead of minimizing the stress/displacement responses as commonly formulated in deterministic design optimization, the RDO seeks to find a realization of \(\mathcal{z}\) balancing the performance and its variations such that the performance will not change sharply given the fluctuation of \({\boldsymbol{\theta}}\), which may be caused by the randomness of, e.g., manufacturing errors and service environments. It should be noted that the RDO problem can also be formulated in a multi-objective setting. One can refer to Ref [173]. for the definitions and evolution of the above typical design optimization problems.

A common feature of the RBDO problem in Eq. (14) and the RDO problem in Eq. (15) is that they are both formulated in double-loop nested structure, that is, to solve the problem, it is required to iteratively search \(\mathcal{z}\) in the design parameter space in the outer loop, and then for each realization of \(\mathcal{z}\), compute the probabilistic features, e.g., expectation and variance of targeted/constraints functions, or the probabilities of events, each of which can be computationally challenging as they are commonly governed by expensive-to-evaluate functions \(f\) or \({g}_{i}\). This can be even more tough when it comes to rare failure events, e.g., probabilistic tolerances \({p}_{f,i}^{*}\) and \({p}_{f}^{*}\) are extremely small values (typically less than 10–4). Suppose that outer-loop optimization requires \(N\) iterations to reach the global optimal, and for each iteration, the probabilistic analysis in the inner loop consumes \(M\) function calls, and then the total number of required expensive-to-evaluate function calls is at least \(N\times M\). This is unquestionably unacceptable when applied to real-world engineering structures as each function call requires time-consuming finite element analysis. Even if the cost is affordable, achieving global convergence for both outer-loop optimization and inner-loop probabilistic analysis can be an issue. Attempts to devise efficient and robust numerical algorithms with the least function calls as well as theoretical or empirical guarantee of global convergence for both loops is then an enduring topic in this direction. Other related research topics include developing algorithms for time-variant cases, situations with mixed uncertainties, cases with high-dimensional design and/or random parameters, parallelization of algorithms, etc.

Over the past half-century, it has seen continuing efforts towards the above problems. For RBDO, typical non-ML developments include, the sequential optimization and reliability assessment (SORA) method [174], which is intrinsically a single-loop scheme sequentially and alternately performing deterministic optimization and reliability assessment, and repeatedly proved to be effective for low to moderate nonlinear and convex problem; the augmented reliability analysis strategies [175, 176], which is based on estimating the failure probability function (against the design variables) and their sensitivities by reusing the samples generated by MCS or advanced MCS methods such as subset simulation and line sampling; the operator norm theory [177, 178], which is also a decoupling strategy based on a monotonic map between operator norm and reliability index, and is found to be effective for the case with randomness presented in only excitations; the non-intrusive imprecise stochastic simulation (NISS) [179, 180], which is also a decoupling method based on estimating the failure probability functions and their sensitivities by combining advanced MCS with high-dimensional model representation and global sensitivity analysis. It can be seen from those historic non-ML developments that, developing strategies to break the double loop has been a common trend for addressing the RBDO problem.

The above-mentioned RBDO algorithms are either based on approximation methods (such as FORM) or advanced MCS methods, which are extensions of those classical methods originally developed for reliability analysis. The past decade has also witnessed a rapid development of ML methods for reliability analysis, with the pioneering work of Echard et al. [181], who combined the active learning Kriging model (i.e., GPR model) with MCS, and presented the so-called “AK-MCS” method. Since then, the active learning methods have seen rapid development in the area of reliability engineering and UQ. More informative acquisition functions [182, 183], and improvement of AK-MCS by combining with advanced MCS schemes [184, 185], are both typical developments towards this direction. It is then as expected that the active learning scheme has also been introduced for RBDO, and a literature survey on these developments has been implemented in Ref. [121]. Some of those developments are combined with the classical RBDO algorithm for substantially reducing the required number of function calls, such as the combination with SORA in Ref [186]. and with NISS in Ref. [187]. Some others presented novel methodology frameworks with the aim of leveraging the active learning procedure. One typical example is the collaborative and adaptive Bayesian optimization (CABO), which is a decoupling scheme involving sequentially training GPR model in the augmented space \({\mathbb{Z}}\times {\mathbb{T}}\), and then alternately implementing Bayesian optimization in \({\mathbb{Z}}\) and Bayesian cubature in the \({\mathbb{T}}\), both equipped with active learning. This scheme is originally developed for propagating mixed uncertainties [78, 188], and then extended for RBDO [189].

For simplicity of illustration of the CABO scheme for RBDO, it is assumed that the target function is a cheap-to-evaluate deterministic function \(c\left(\mathcal{z}\right)\), and only one probabilistic constraint \({p}_{f}\left(\mathcal{z}\right)<{\mathbb{P}}_{\theta }[g(\mathcal{z},{\boldsymbol{\theta}})<0]<{p}_{f}^{*}\) is considered. The CABO scheme first transforms the constrained RBDO problem of Eq. (14) into a non-constrained optimization problem with an objective function formulated as \(F(c(\mathcal{z}),{p}_{f}(\mathcal{z}),\lambda ,\rho )\) using, e.g., augmented Lagrangian, where \(\lambda \in {\mathbb{R}}^{+}\) and \(\rho \in {\mathbb{R}}^{+}\) are the Lagrange multiplier and penalty parameter respectively. In Ref. [189], the penalty method is used, thus only a penalty parameter is introduced. Then, given a set of initial training data set \(\mathcal{D}={\{({\mathcal{z}}^{(k)},{{\boldsymbol{\theta}}}^{(k)},{y}^{(k)})\}}_{k=1}^{N}\), with \(\left({\mathcal{z}}^{(k)},{{\boldsymbol{\theta}}}^{(k)}\right)\) designed by e.g., Latin-hypercube sampling, in the augmented space \({\mathbb{Z}}\times {\mathbb{T}}\) and \({y}^{(k)}=g\left({\mathcal{z}}^{(k)},{{\boldsymbol{\theta}}}^{(k)}\right)\) computed by calling the expensive-to-evaluate g-function, the CABO scheme is composed of five sequential key steps as follows.

  1. a)

    First, train a GPR model \({\widehat{g}}_{N}(\mathcal{z},{\boldsymbol{\theta}})\sim \mathcal{G}\mathcal{P}({\mu }_{g,N}(\mathcal{z},{\boldsymbol{\theta}}),{\sigma }_{g,N}^{2}(\mathcal{z},{\boldsymbol{\theta}}))\) using \(\mathcal{D}\), with \({\mu }_{g,N}\) and \({\sigma }_{g,N}^{2}\) being the posterior mean and variance of the GPR prediction;

  2. b)

    Second, generate a set of random samples \({\{{g}_{N}^{(j)}(\mathcal{z},{\boldsymbol{\theta}})\}}_{j=1}^{M}\) for \({\widehat{g}}_{N}\) using the GPR conditioning sampling scheme (see Ref [189]. for details);

  3. c)

    Third, compute induced sample paths of failure probability function \({p}_{f,N}^{(j)}(\mathcal{z})={\mathbb{P}}_{\theta }[{g}_{N}^{(j)}(\mathcal{z},{\boldsymbol{\theta}})<0]\) for each \(j=\mathrm{1,2},\cdots ,M\), using, e.g., MCS or advanced MCS, performed in the space \({\mathbb{T}}\), and with these sample paths, derive an estimator for one of the commonly used acquisition functions of Bayesian optimization, e.g., the expected improvement (EI) function and lower confidence bound (LCB). By optimizing one of the acquisition functions (e.g., by maximizing EI or by minimizing LCB), one can obtain a design point of \(\mathcal{z}\), denoted as \({\mathcal{z}}^{+}\), in the space \({\mathbb{Z}}\). Judge whether the stopping criteria for optimization, which can be set as, e.g. the normalized EI being less than a small value (error tolerance), is satisfied or not.

  4. d)

    Forth, substitute \(\mathcal{z}={\mathcal{z}}^{+}\) into the posterior mean and variance of the GPR model to produce the marginal GPR model \({\widehat{g}}_{N}({\mathcal{z}}^{+},{\boldsymbol{\theta}})\sim \mathcal{G}\mathcal{P}({\mu }_{g,N}({\mathcal{z}}^{+},{\boldsymbol{\theta}}),{\sigma }_{g,N}^{2}({\mathcal{z}}^{+},{\boldsymbol{\theta}}))\). With this GPR model, one of the acquisition functions developed for Bayesian reliability analysis, e.g., the U-function, the expected feasibility function [190], and the expected integrated error reduction, can be used to generate a design point \({{\boldsymbol{\theta}}}^{+}\) of \({\boldsymbol{\theta}}\) from the sample pool generated by the MCS or advanced MCS in last step. Judge whether the stopping criteria for Bayesian reliability analysis, e.g., the minimal probability of misclassifying the points in the sample pool being less than a small value (error tolerance), is reached or not.

  5. e)

    At last, if both the two stopping conditions are reached, the algorithm breaks out and is thought as being converged; otherwise, let \(N=N+1\) and add \(\left({\mathcal{z}}^{+},{{\boldsymbol{\theta}}}^{+},{y}^{+}\right)\) to \(\mathcal{D}\), return the first step.

Benefiting from the global convergence property of the EI function [191], and the high accuracy of the Bayesian reliability analysis method, the CABO is empirically shown to provide a reasonable balance between exploration and exploitation, and suitability to a wide range of problems. But when very small failure probability constraints with e.g., \({p}_{f}^{*}\) being less than 10–4, or high-dimensional space of either \({\mathbb{Z}}\) or \({\mathbb{T}}\), are considered, CABO needs further extension.

ML models have also been widely utilized for devising RDO algorithms, and a critical review of these models as well as a strategy for extension to RDO has been presented in Ref. [123]. However, this general scheme is not equipped with active learning. Following those state-of-the-art developments for RBDO, it is expected to make a good balance of exploration and exploitation, and thus further improve the numerical efficiency, if the active learning scheme can be properly combined with those machine learning models. Following a similar scheme as CABO, Shi et al. [192] presented a decoupled algorithm by training GPR model in the augmented space, and the design point \(\left({\mathcal{z}}^{+},{{\boldsymbol{\theta}}}^{+}\right)\) is sequentially generated in the augmented space using the posterior covariance contribution (PVC) acquisition function developed by Wei et al. [37], as it admits a closed-form expression, instead of separately generated in the marginal spaces. Besides, by formulating the RDO in a multi-objective setting, Wauters [193] developed a Bayesian optimization algorithm for solving the RDO problem, and for generating closed-form expressions of the EI acquisition function, a second-order second-moment scheme is introduced for approximating the predictive variance of objectives. Besides, the CABO scheme can be directly applied to address the RDO problem due to its generality. Unfortunately, till now, there is neither theoretical nor numerical investigation on the comparison of the above state-of-the-art RDO developments, and like the trends for RBDO, addressing some specific challenges, such as the curse of dimensionality, still needs further treatments.

From the above overview, it can be seen that recent years have witnessed a rapid development of applying ML for addressing the three types of inverse UQ problems, i.e., PMU, RBDO and RDO. The introduction of ML methods such as regression models, active learning schemes, and Bayesian numerics has substantially reformulated the research modes, and brought benefits towards numerical efficiency, guarantee convergence, applicability, etc., which is of vital importance for promoting the inverse UQ techniques to real-world engineering applications. For other inverse UQ subtasks, such as global sensitivity analysis and optimal experimental design, ML has also received wide attention, but we don’t give a big picture to these specific topics due to the limitation of space.

4 Fault diagnosis

The intelligent diagnostic method based on transfer learning relaxes the requirement in traditional ML and deep learning for test data and training data to follow the same independent and identically distributed assumption, thereby addressing the problems present in deep learning-based intelligent fault diagnosis methods. Transfer learning effectively overcomes issues such as insufficient training data and mismatched feature distributions between training and test data in new tasks by learning knowledge from previous tasks and applying it to new ones. In 2010, Pan et al. [194] defined transfer learning in a review paper: transfer learning is a method that transfers knowledge learned from one domain to a similar domain based on the similarity of data, tasks, and models. Traditional deep learning methods train models to solve tasks from scratch, with the model's performance heavily relying on the quality and quantity of labeled data. Transfer learning, on the other hand, identifies the commonalities of tasks from an accumulated knowledge perspective, allowing knowledge learned from one task to be transferred to other similar tasks, enabling models to generalize better [195,196,197]. In recent years, cross-domain recognition techniques based on transfer learning, particularly deep domain adaptation techniques, have been increasingly studied in various fields [198,199,200,201].

There are two basic concepts in domain adaptation theory: domain and task. The domain includes a data set and the probability distribution of these data, typically denoted by \(\mathcal{D}\), with P representing the probability distribution. The task is the learning objective, including the label space \(\mathcal{Y}\) and the decision function \(f\left(\bullet \right)\). The basic concept of domain adaptation is: Given a source domain \({\mathcal{D}}^{s}={({x}_{i},{y}_{i})}_{i=1}^{n}\) with labeled data and a target domain \({\mathcal{D}}^{t}={({x}_{i})}_{i=1}^{n}\) with unlabeled data, where the data distributions of the two domains are different \(({P}^{s}\ne {P}^{t})\), domain adaptation uses the labeled source domain \({\mathcal{D}}^{s}\) to achieve recognition in the unlabeled target domain \({\mathcal{D}}^{t}\), thereby realizing cross-domain knowledge transfer and recognition.

A schematic diagram of fault diagnosis based on domain adaptation is shown in Fig. 9. Due to the difference in feature distributions between source domain data and target domain data, directly applying the source domain classifier to the target domain classification will result in many misclassified samples and poor recognition performance. Through domain adaptation learning between the source domain and the target domain, the difference in feature distributions between the source domain and target data can be effectively reduced, allowing the target domain samples to be accurately identified by the source domain classifier, thus realizing unsupervised fault diagnosis in the target domain. Based on the different approaches to aligning source domain and target domain feature distributions in the existing literature, the commonly used cross-domain fault diagnosis methods based on deep domain adaptation can be categorized into three types: cross-domain intelligent diagnosis methods based on statistical moment matching, cross-domain intelligent diagnosis methods based on adversarial training, and cross-domain intelligent diagnosis methods based on joint moment matching and adversarial training.

Fig. 9
figure 9

The schematic diagram of fault diagnosis based on domain adaptation

4.1 Cross-domain fault diagnosis methods based on statistical moment matching

Cross-domain fault diagnosis methods based on statistical moment matching utilize statistical moments to measure the feature differences between source domain data and target domain data in high-dimensional space. By updating the internal parameters of the model, these methods continuously reduce the statistical moment differences between the source and target domains, aiming to map the features of both domains into the same high-dimensional space as closely as possible. Commonly used statistical moments in transfer diagnosis include maximum mean discrepancy (MMD) [202], KL divergence [203], and correlation alignment (CORAL) [204].

Lu et al. [205] used deep neural networks as feature extractors and minimized the MMD loss between source domain features and target domain features to achieve cross-domain fault diagnosis of bearings. This was one of the earlier methods to apply domain adaptation techniques based on deep learning models to intelligent fault diagnosis. Qian et al. [206] proposed a cross-domain fault diagnosis method for rolling bearings based on higher-order KL divergence. This method establishes first-order and higher-order statistical moments constrained by self-balanced higher-order KL divergence and uses higher-order divergence to reduce distribution differences in the shared feature layer between source domain data and target domain data. Wen et al. [207] combined sparse autoencoders with MMD to achieve domain-invariant features, enabling bearing fault diagnosis under different working conditions. Li et al. [208] and Zhu et al. [209] introduced multi-kernel MMD into the multi-layer structure of convolutional neural networks to reduce differences between source domain and target domain across different feature dimensions, thereby better achieving domain-invariant features for accurate cross-domain fault diagnosis of bearings under different working conditions. Wang et al. [210] developed a layered deep domain adaptation network based on stacked autoencoders and correlation alignment, achieving alignment of marginal and conditional distributions between source domain and target domain under varying conditions for fault diagnosis of thermal systems. Ma et al. [211] used state MMD to reduce marginal and conditional distribution differences between source domain data and target domain data, improving cross-domain fault diagnosis performance for rolling bearings under different operating conditions. An et al. [212] employed the CORAL method to minimize distribution gaps between source and target domains by adjusting second-order statistical moments, enabling cross-domain intelligent diagnosis. Han et al. [213] proposed a deep joint framework that considers both marginal distribution adaptation and conditional distribution adaptation, constructing a joint distribution adaptation network for cross-domain diagnosis. Cao et al. [214] introduced a multi-task learning model based on deep domain adaptation, first constructing a soft-connected MMD to connect two channels, and then implementing cross-domain intelligent diagnosis of planetary gearboxes under different conditions through the Y-Net network. Yang et al. [215] incorporated polynomial kernel functions into MMD, utilizing the improved MMD to effectively estimate higher-order moment distances between source domain data and target domain data, significantly enhancing computational efficiency. Feng et al. [216] proposed a global–local multi-source domain adaptation fault diagnosis method based on class inconsistency, achieving state and condition alignment between multiple source domains and the target domain by optimizing the classifier with local Wasserstein distance and globally accumulating higher-order multi-source moments. Pan et al. [217] proposed a joint distribution domain adaptation strategy based on MMD to accommodate feature distribution differences between multiple sensor data, introducing a new standard to evaluate and measure the feature distribution differences between training data and test data. Jia et al. [218] introduced a weighted subdomain adaptation network for partial cross-domain fault diagnosis, aligning global distributions while simultaneously focusing on local data distributions, further improving model transfer performance by capturing fine-grained transferable information. Wang et al. [219] proposed a subdomain adaptation transfer learning method that combines subdomain adaptation with global domain adaptation to reduce marginal and conditional distribution differences between source and target domains, enhancing the generalization performance of transfer learning models. Li et al. [220] integrated parameter transfer learning, domain adaptation, and ensemble learning, proposing a bearing intelligent diagnosis method based on an optimal integrated deep transfer network. This method enhances the diversity of different kernel MMDs through multiple transfer networks, improving the performance of ensemble learning. Su et al. [221] developed a hierarchical fault diagnosis method for rolling bearings based on multi-layer MMD, simultaneously achieving recognition of fault state, location, and severity. Addressing the issue that existing MMD methods pay little attention to the structural information of target domain samples, Zhang et al. [222] proposed a transferable SAE network based on local MMD. The network first aligns the source and target domains in a shared feature space using local MMD, then captures the feature information of both domains using the k-means method to obtain the final recognition results.

Zhang et al. [223] proposed a novel multi-modal data cross-domain fusion network based on MMD that integrates vibration signals and thermal images to comprehensively assess gearbox health conditions. Li et al. [224] proposed a deep transfer network model utilizing multi-domain information fusion and multi-kernel maximum mean discrepancy (MK-MMD) for cross-device fault diagnosis of rolling bearings. Huang et al. [225] proposed a cross-attentional subdomain adaptation method incorporating a correlated local maximum mean discrepancy loss to enable fine-grained and fault-related subdomain adaptation for motor fault diagnosis under variable working conditions. Chang et al. [226] proposed a multi-order moment matching-based domain adaptation method to enhance cross-working condition fault diagnosis of rolling bearings by combining hybrid domain alignment and discriminative clustering. Zhang et al. [227] proposed the DPC-MGCN method, combining dual-path CNNs, multi-parallel GCNs, and the MK-MMD domain adaptation to enhance bearing fault diagnosis under variable working conditions. Shao et al. [228] proposed the adaptive multi-scale attention convolution neural network, which utilizes CORAL to minimize domain shift and enhance cross-domain fault detection with limited labeled data. Wang et al. [229] proposed a cross-domain diagnostic algorithm based on joint conditional Wasserstein distance matching, enhancing distribution alignment and fault diagnosis robustness across different domains. Zhang et al. [230] proposed a supervised contrastive learning-based domain adaptation network that combines MMD loss and supervised contrastive learning to enhance cross-domain fault diagnosis of rolling bearings. Qin et al. [231] proposed a discriminative manifold domain adaptation framework to align manifold structures and domain distributions.

In summary, statistical moment matching methods are effective for cross-domain fault diagnosis by aligning statistical properties such as MMD and CORAL between the source and target domains. These methods are computationally efficient and do not require large amounts of labeled data from the target domain, making them highly practical in many applications. However, their effectiveness is limited when domain shifts involve higher-order moments or nonlinear relationships between features. While innovations like conditional moment matching have improved their robustness, these methods still face challenges when higher-order moments exhibit significant differences. Additionally, they may require regularization to avoid overfitting when large domain shifts occur.

4.2 Cross-domain fault diagnosis methods based on adversarial training

These methods achieve domain-invariant learning of source and target features by introducing adversarial training with a domain discriminator, enabling the model to accurately classify samples from the target domain. The main components of these models are the feature extractor, domain discriminator, and classifier. During adversarial training, the domain discriminator's role is to accurately determine the domain to which the features in the high-dimensional space belong. By introducing a gradient reversal layer in the high-dimensional feature layer, the domain discriminator is prevented from accurately distinguishing whether features originate from the source or target domain, thus reducing the feature distribution differences between the source and target domain data in the high-dimensional feature space. Alternating updates of the internal parameters of the domain discriminator and feature extractor enable the confusion of source and target domain features in the shared high-dimensional feature layer, thus achieving domain invariance of the features.

In recent years, cross-domain diagnosis methods based on adversarial training have also been extensively studied. Liu et al. [232] proposed a deep domain adaptation learning model for rolling bearing fault diagnosis, using deep stacked autoencoders as feature extractors and aligning features between the source and target domains through adversarial training with a domain discriminator. Li et al. [233] introduced a cross-domain fault diagnosis method based on deep adversarial networks, transferring diagnostic knowledge learned from supervised data collected from multiple machines to the target domain data through adversarial training with the domain discriminator, using a small amount of labeled target domain data for training. Li et al. [234] applied adversarial training to align the marginal distribution of diagnostic knowledge between different sensor data, achieving cross-domain fault diagnosis between different sensor data and validating the effectiveness of the proposed method on two rotating machinery datasets. Jiao et al. [235] constructed a domain adaptation network based on classifier inconsistency, utilizing adversarial training with two classifiers to achieve domain-invariant feature representation between the source and target domains, enabling cross-domain fault diagnosis. To address the issue of inconsistent label spaces between the source and target domains, Li et al. [236] proposed an intelligent cross-domain fault diagnosis method based on class-weighted adversarial neural networks. This method encourages positive transfer of shared class samples and ignores outlier class samples from the source domain through class-weighted adversarial learning. Li et al. [237] developed a weighted adversarial domain adaptation network that balances the contribution of samples to the source classifier and domain discriminator through a weighted learning strategy, adaptively identifying and filtering out source samples irrelevant to the target class, achieving local domain adaptation. For the issue of multiple source domain adaptation, Zhang et al. [238] first used adversarial learning to reduce the feature distribution differences between multiple source domains and the target domain, and then utilized classifier alignment loss to reduce the prediction differences of multiple source classifiers on the target domain, validating the effectiveness of the proposed method in scenarios with two and three source domains. Liu et al. [239] proposed a local adversarial domain adaptation network based on stacked autoencoders, using a Softmax-based weighted classifier to weight the features of the source and target domains, achieving cross-domain fault diagnosis under partial domain adaptation conditions. Zhu et al. [240] established an open-set domain adaptation learning model based on adversarial learning, using multiple weighted auxiliary classifier modules to identify unknown and known fault categories in the target domain. Zhang et al. [241] proposed an adversarial learning mechanism based on source class weighting and target instance weighting to achieve class-level alignment of shared categories and identify unknown fault categories. Deng et al. [242] proposed a dual adversarial network based on an attention mechanism, using two attention matrices to guide the network in identifying which samples should be trained or ignored, enabling cross-domain intelligent fault diagnosis of rotating machinery. Yu et al. [243] proposed a simulation data-driven cross-domain fault diagnosis method for rolling bearings, utilizing a small amount of labeled target domain data and achieving marginal distribution alignment through domain adversarial learning with a feature transformer, and conditional distribution alignment through contrastive loss. Han et al. [244] proposed a deep domain generalization network for monitoring unseen operating conditions, improving the generalization performance of the model through regularization of the discriminator structure in the deep network, and promoting intra-class compactness and inter-class separability through triplet loss, enabling intelligent diagnosis of unseen target domains. To address the poor robustness of domain adaptation methods to noisy signals, Yu et al. [245] introduced label prediction to convey discriminative information in the domain discriminator, constructing a conditional adversarial domain adaptation network with discriminative embedding, and validated the performance of the proposed method on three rotating machinery datasets. Chen et al. [246] proposed an adversarial domain-invariant generalization network that achieves feature domain invariance across multiple domains through adversarial learning between the feature extractor and domain discriminator using data from multiple source domains, enabling fault diagnosis of unseen target domains. Fan et al. [247] first proposed a weighted quantile error measure, which was then integrated into a deep adversarial domain adaptation framework to achieve precise matching of complete distribution information, reducing the computational complexity of the model and improving cross-domain diagnosis performance of rotating machinery. Tang et al. [248] proposed a cross-domain fault diagnosis method for rotating machinery based on multiple target domains, where the proposed adversarial training process reduces the distribution differences between the source domain and multiple target domains, and ensures sample-level classification consistency through transferable hyperbolic prototypes. Wang et al. [249] introduced a balanced adversarial partial domain adaptation method, using a balancing strategy to reinforce target samples and filtering out outlier samples through a class-weighted adversarial strategy to avoid negative transfer, and validated the effectiveness of the method on two rotating machines.

Li et al. [250] proposed a multi-domain adversarial transfer framework to enhance fault diagnosis performance under imbalanced data conditions. Wang et al. [251] proposed an attention-guided multi-wavelet adversarial network that combines multi-wavelet convolution, attention mechanisms, and a gradient reversal layer to enhance cross-domain fault diagnosis. Chen et al. [252] proposed a deep conditional adversarial subdomain adaptation network that integrates CNN-GCN feature fusion and a conditional adversarial strategy to enhance unsupervised mechanical fault diagnosis under unknown working conditions. Zhang et al. [253] proposed a multi-sensor open-set cross-domain fault diagnosis method that combines CNN-based feature extraction, Transformer-based feature fusion, and weighted adversarial learning to achieve domain-invariant learning. Bi et al. [254] proposed a multipseudo domains augmented adversarial domain-invariant learning method, combining multipseudo domain generation and adversarial learning to enhance cross-domain fault diagnosis by reducing domain shift and increasing feature diversity. Zhang et al. [255] proposed a digital twin-driven fault diagnosis framework that uses labeled simulated data and a selective adversarial strategy to achieve cross-domain feature alignment, enabling high diagnostic performance for rolling bearings with insufficient and unlabeled real-world data. Li et al. [256] proposed an adversarial domain-augmented generalization method that uses domain augmentation and adversarial training to learn generalized domain-invariant features and enhance fault diagnosis in unseen domains. Wang et al. [257] proposed an inter-domain attention mechanism and an attention-based encoder that dynamically learn source-target relationships and integrate seamlessly into domain adversarial neural networks for improved cross-domain learning. Wang et al. [258] proposed an adversarial domain adaptation method with double auxiliary classifiers to enhance cross-domain open-set fault diagnosis by aligning shared classes and identifying private fault classes. Jin et al. [259] proposed an improved domain adversarial neural network using domain adversarial training and a multi-channel multi-scale module to enhance fault diagnosis of rolling bearings under diverse conditions.

In summary, adversarial training-based methods are powerful tools for cross-domain fault diagnosis, as they can effectively align features between the source and target domains, especially in cases of complex domain shifts. These methods learn domain-invariant features, making them highly adaptable. However, adversarial training is often computationally expensive and prone to instability, requiring careful tuning of model parameters. Besides, these methods may struggle when the target domain data is significantly different from the source domain, necessitating the use of hybrid models to address such challenges.

4.3 Cross-domain fault diagnosis methods based on joint moment matching and adversarial training

Thanks to the significant impact of moment matching methods and adversarial training in the field of cross-domain fault diagnosis, domain adaptation methods combining these two approaches have also been extensively studied.

Guo et al. [260] used MMD loss and adversarial training to simultaneously minimize the feature distribution differences between the source and target domains, achieving unsupervised cross-domain fault diagnosis. Jiao et al. [261] developed a cross-domain intelligent diagnosis method based on a one-dimensional residual network, which reduces both joint distribution and marginal distribution differences through the combination of MMD and adversarial learning. Zhang et al. [262] proposed a cross-operating condition fault diagnosis method for rolling bearings based on deep CNN. This method first reduces the distribution differences between the source and target domains in the shared feature space using multi-kernel MMD loss, and then further confuses the two domains in the shared feature layer through a domain discriminator, achieving better domain-invariant features. Wu et al. [263] introduced a domain adaptation method for rotating machinery, which achieves domain adaptation between the source and target domains in two different spaces through MMD loss and adversarial training. Li et al. [264] proposed a cross-domain fault diagnosis method based on ensemble learning. This method uses MMD loss and domain adversarial learning to train deep neural networks separately to enhance feature representation capability, and then utilizes ensemble learning to obtain the final diagnosis result. Addressing the high training cost and low classification accuracy of adversarial training, Qin et al. [265] developed a cross-operating condition fault diagnosis method for planetary gearboxes based on a parameter-sharing adversarial domain adaptation network. This method reduces network complexity by constructing a shared classifier and introduces CORAL loss and an imbalanced adversarial learning strategy to enhance the domain confusion ability of the feature extractor. Wan et al. [266] constructed a cross-domain fault diagnosis method for rolling bearings based on an improved deep residual network. The method aligns the marginal distribution and state distribution of the source and target domains using multi-kernel MMD loss and multiple domain discriminators and designs an adaptive factor to dynamically measure the relative importance of these two distributions. For the problem of multi-source domain adaptation, Ye et al. [267] proposed a cross-domain intelligent diagnosis method for rotating machinery based on multi-source feature alignment and adversarial training. The method first constructs an ensemble classifier for negatively correlated learning, then reduces noise through a discrete cosine convolution block, and finally achieves domain-invariant learning between the source and target domains using MMD loss and adversarial training. Ghorvei et al. [268] introduced a deep subdomain adaptation graph convolutional neural network. The method first utilizes a graph convolutional neural network for feature representation, and then simultaneously reduces structural differences between subdomains and the global domain through adversarial training and local MMD loss, achieving intelligent cross-operating condition diagnosis of rotating machinery.

Jiang et al. [269] proposed a domain adversarial fault diagnosis method using MMD and Wasserstein distance to improve gear fault diagnosis under fluctuating conditions. Wang et al. [270] proposed a domain-adversarial approach with feature alignment to effectively align the source and target domains for fault diagnosis. Xu et al. [271] initially developed a multi-agent diverse generative adversarial network to facilitate the transfer between simulated and measured data, and subsequently employed conditional MMD and adversarial training to perform cross-domain fault diagnosis.

The above research demonstrates that intelligent diagnosis methods based on deep domain adaptation have garnered widespread attention and research, consistently achieving outstanding performance across various intelligent cross-domain diagnosis scenarios.

5 Resilience assessment and management planning

Developing reasonable resilience assessment and gaining real-time insights into structural resilience are crucial for the safety of structures throughout their lifecycle. These tasks require comprehensive structural state analysis that accounts for various uncertainties. In the process of UQ, typically, the demands on extensive and iterative numerical simulations significantly increase computational costs. To alleviate the computational burden, many studies have explored a promising solution that constructs efficient surrogate models to replace finite element models for repetitive calculations of structural response, and thus for resilience assessment and management planning.

According to the characteristics of different data types, the surrogate models of response prediction can be categorized into two main applications: extreme value prediction based on tabular data and time history prediction based on time-series data. Tabular data has significant advantages in predicting structural response under loads, as it provides a structured format that effectively captures the input–output relationships, facilitating precise mapping between them. Prediction methods based on tabular data primarily focus on forecasting the maximum response of structures during disaster events. For instance, Morfidis and Kostinakis [272] and Sun et al. [273] used ANN and SVM to predict the maximum inter-story drift ratio of reinforced concrete frame (RCF) structures subjected to seismic loads, noting that the input features considered in these two studies are substantially different. Subsequently, Guan et al. [274] innovatively developed a hybrid (mechanics + data-driven) model that incorporated mechanics-based analysis results as input features to enhance the performance of ML models. Esteghamati and Flint [275] employed ML surrogate models to evaluate the mechanical performance of multi-story RCFs, linking predictive results with economic and environmental factors to advance performance-based structural design. Furthermore, Hwang et al. [276] utilized ML techniques with enhanced algorithms (i.e., adaptive boosting and extreme gradient boosting) to reliably predict seismic responses and collapse patterns of RCF buildings by considering uncertainties at different levels (component and system).

Correspondingly, time-series data collected from sensor networks or numerical simulations, such as stress, strain, displacement, and temperature, are of paramount importance for the analysis of structural behavior under diverse conditions, deformation patterns, and damage evolution processes. The primary objective of prediction methods based on time-series data is to predict the nonlinear dynamic responses of structures induced by earthquakes or wind. For example, Xu et al. [277] proposed a unidirectional multilayer stacked long short term memory (LSTM) architecture with a sliding time window to predict nonlinear seismic responses over varying time lengths and sampling rates. Zhang et al. [278] introduced two LSTM network architectures, i.e., full sequence and stacked sequence, for modeling both linear and nonlinear building responses under seismic loads. Chen et al. [279] devised an output-recursive LSTM that combines previous time-step predictions with current external excitations to forecast the seismic response of nonlinear dynamic systems. Oh et al. [280] developed a CNN model to capture changes in the dynamic characteristics of buildings under wind loads, with the model’s performance validated through wind tunnel tests. Li et al. [281] based on structural health monitoring systems, utilized potential fluid memory effects carried by unit states in LSTM modules to predict bridge vibration responses. In view of the substantial differences in characteristics of seismic response between surface and underground structures, Huang and Chen [282] selected the deformation in the free field situated underground structures instead of ground motions as input of surrogate models, using one dimensional (1D) CNN and LSTM networks for seismic response modeling of a two-layer, three-span metro station. Torky and Ohno [283] employed a Conv-LSTM neural network to consider multiple features for predicting structural nonlinear responses. To improve model accuracy, data are preprocessed using fast Fourier transform (FFT) and discrete wavelet transform (DWT). Kundu et al. [284] leveraged features of the application programming interface (API) in Keras to address the issue of random dynamic excitations and uncertainty of structural parameters as multiple input sources. This approach permitted layer sharing, thereby facilitating the development of sophisticated models. In this way, the formed LSTM-based deep learning (DL) model for uncertainty quantification in nonlinear time history response can consider dual uncertainties in structural parameters and external excitations.

The afore-mentioned studies have demonstrated the vast application potential of ML for resilience assessment and management planning. However, the construction of surrogate models still confronts several limitations, including the scarcity of high-quality data, opacity in the learning process, low interpretability, lack of physical consistency, and insufficient reasoning capabilities under unknown conditions. These factors collectively hinder the full potential of ML techniques. The root cause of these issues lies in the inadequate consideration of the compatibility between ML and domain knowledge of civil engineering in application processes. To this end, it is imperative to integrate abundant prior knowledge, fundamental physics, and domain-specific expertise (i.e., differential equations, conservation laws, and boundary conditions) into data-driven models.

Inspired by the work of Tronci et al. [285], researches about integrating domain knowledge of civil engineering into ML methods can be divided into four levels according to the extent of knowledge embedding from shallow to deep: physics-guided, physics-enhanced, physics-corrected, and physics-constrained, as illustrated in Fig. 10. Physics-guided methods leverage specific domain knowledge to inform the training of data-driven models without imposing strict constraints. The objective is to enhance the model’s generalization ability, interpretability, and robustness, particularly in scenarios with limited or noisy data. Physics-enhanced methods artificially increase the available data amount by generating data that incorporates physical information, thereby addressing the issues of performance and generalization arising from insufficient original data. The data augmented approaches involve modifying existing data or using physics-based models to create new data. Physics-corrected methods utilize domain-specific knowledge and data to adjust predictive results during or after the training process, enabling the model to continuously reduce the discrepancy between the predicted and actual values. Finally, physics-constrained methods impose restrictions on outputs or parameters, ensuring that the data-driven model remains strictly centered on physical laws. This guarantees that the model’s predictions are consistently aligned with these laws. Table 4 lists these four domain knowledge embedded ML methods, their essence, as well as the relevant references. The detailed review is presented as follows.

Fig. 10
figure 10

Domain knowledge embedded ML methods

Table 4 Summary of domain knowledge embedded ML methods

5.1 Physics-guided methods

Physics-guided methods make use of domain-specific knowledge to inform the training of ML models. Although these models utilize information derived from physical principles, their prediction processes remain data-driven. In this way, the prediction results are consistent with common physical phenomena without adhering strictly to physical constraints. The distinction between physics-guided and physics-constrained methods is somewhat ambiguous in theory, but there are significant differences in their implementation processes. In summary, any approach that does not rigidly impose physical knowledge on the model can be categorized as a physics-guided method, resulting in a variety of implementation approaches. Below are some representative cases.

During the construction of ML models, excessively high input dimensions usually become a critical factor that hinders model training. Therefore, it is crucial to appropriately preprocess the input features using domain knowledge before training. This process aims to eliminate irrelevant or redundant features while retaining the most critical ones for the predictive task. Through effectively reducing the number of input features, the training process of models would be guided, leading to the reduction of training time and improvement of prediction accuracy. For instance, in the seismic fragility analysis of structures, although it is theoretically feasible to incorporate all potential features as inputs when training a ML surrogate model, practical applications prioritize reducing input dimensions and enhancing training efficiency. Ding and Feng [286] employed the elastic net algorithm to select optimal features based on the determination coefficients and regression coefficients, achieving the reduction of feature dimensionality. Subsequently, these optimal features were used as input variables to construct an efficient surrogate model for evaluating structural seismic damage.

In the context of prediction tasks for structure groups, the training data for ML models often originates from only a portion of the structures. Due to each structure possessing unique physical properties, the direct application of these models may not satisfy the accuracy demand in the prediction results. To address this issue, Zhou et al. [287] proposed a physics-informed DL method for real-time prediction of structural responses, aimed at effectively reflecting the individual properties of each structure through incorporating physical information into data-driven models. This approach extracts structural dynamic characteristics from finite element models and integrates them with other inherent structural data into the DL model to reduce training complexity. The model takes the ground motion and structural dynamic information as inputs, while the outputs include the acceleration, velocity, and displacement response for various nodes at different structural floors.

5.2 Physics-enhanced methods

Physics-enhanced methods serve as a direct solution to the issue of insufficient training data by artificially expanding the dataset to enhance the performance and generalization capabilities of ML models. In cases where measured data is scarce or difficult to obtain, a prevalent strategy entails the generation of large amounts of data or imbalanced samples (i.e., samples that rarely occur in actual conditions) using physical models to support the training of ML models. For example, Wang et al. [288] proposed data augmentation approaches designed for intelligent computation of structural full-process responses based on DL. These approaches encompass segmental down sampling methods based on amplitude points, interpolation techniques, and refined finite element generation, effectively addressing the challenge of data scarcity in the field of structural computation.

Previous studies indicated that the scarcity of training data can be mitigated using data generated from physical models. However, the extent to which these physically generated datasets accurately reflect real-world conditions is crucial for the prediction accuracy of the trained models. Indeed, on the basis of the fidelity of physical models, the generated data can be classified into high-fidelity and low-fidelity data. High-fidelity data is precise but costly to obtain, while low-fidelity data is easy to acquire but less accurate. Consequently, many researchers have explored multi-fidelity data-driven physical enhanced methods. Chen and Feng [289] proposed a multi-fidelity data hybrid enhanced approach based on ML. In this approach, the mapping relationship between high-fidelity and low-fidelity data is established using ML algorithms to reduce inherent biases in the low-fidelity datasets. Hence, the hybrid data takes advantage of both low acquisition costs and high precision. Similarly, Feng et al. [290] devised a DL-enhanced multi-fidelity modeling method to improve computational efficiency. The mapping relationship between high-fidelity and low-fidelity data is decomposed into linear and nonlinear components and fitted respectively using DL models, thereby achieving the conversion from low-fidelity inputs to high-fidelity outputs. These studies emphasized that effective utilization of limited yet accurate high-fidelity data can address the issue of inherent biases in low-fidelity data.

5.3 Physics-corrected methods

Physics-corrected methods entail the continuous adjustment of prediction results of ML models through the utilization of knowledge or data from physics or specific domains. In accordance with the discrepancy of correction approaches, these methods may be divided into the following two types.

The first approach utilizes incremental learning methods to update model parameters through incremental training, progressively refining the model’s predictive capabilities. For instance, Chen et al. [291] introduced a prior knowledge-enhanced neural network based on incremental learning. Initially, the model is pre-trained using data generated from simplified calculation formulas that incorporate physical prior knowledge. The pre-trained model is then fine-tuned using incremental learning techniques with measured or high-fidelity simulation samples. This method not only significantly reduces the number requirement of measured or high-fidelity simulation data, but also leverages physical knowledge to circumvent overfitting issues.

In the second approach, an additional ML model is utilized to correct the prediction results of the initial ML model. To illustrate, Tsai and Alipour [292] overcome the challenge of obtaining aerodynamic parameters during the simulation of structural wind-induced responses by training two LSTM models to predict these responses. In the initial stage, a numerical model of the structure is established using the available aerodynamic parameters to generate data for training the first LSTM model. This model leverages simulated data with physical meaning to make preliminary predictions, which are less precise but roughly reflect the trends of the wind-induced structural response. Subsequently, the second LSTM model is trained using monitored wind-induced response data in conjunction with the initial predicted responses from the first model. This enables the second LSTM model to correct the predictions of the first, resulting in more accurate predictions of the wind-induced responses.

5.4 Physics-constrained methods

Physics-constrained methods represent a mainstream strategy that integrates physics or domain knowledge into ML models. This method incorporates the physical information inherent in the data to ensure that the model’s outputs consistently adhere to common physical laws, which is particularly conducive when dealing with incomplete or uncertain data. A majority of studies [293,294,295,296,297,298,299,300,301] have dedicated to explicitly embedding the residuals of equations that represent physical laws into the loss function of the models. By minimizing the loss function, these models strive to conform to the underlying physical principles as closely as possible.

Currently, the physics-constrained methods have been extensively applied in structural response prediction [302,303,304]. Zhang et al. [305, 306] integrated structural dynamic equations into the loss function to restrict the network output during the metamodeling process of nonlinear structural systems. When this approach is applied to both LSTM and CNN networks, the prediction precision and robustness of models are prominently improved compared to purely data-driven models. Similarly, Yu et al. [307] introduced a method to directly embed underlying physics into the architecture of recurrent neural networks for predicting structural dynamic responses. Hu and Guo [308] utilized prior knowledge related to structural dynamics to develop a neural network capable of reconstructing unmeasured or missing structural seismic responses. Liu et al. [309] incorporated physical constraints into a LSTM model for structural response prediction. The constraints are adjusted on the basis of the characteristics of linear and nonlinear cases. For linear analysis, the model’s physical constraints involve stiffness and damping, while velocity and acceleration are calculated based on predicted displacements. In nonlinear analysis, the constraints include ground acceleration, predicted restoring force, and acceleration derived from predicted displacements. Nevertheless, the majority of physics-constrained methods are not equipped with UQ capabilities. To extend their applications, Zhang et al. [310] combined PINNs with arbitrary polynomial chaos (PC) to quantify approximation and parametric uncertainty. Novák et al. [311] directly imposed physical constraints such as differential equations and boundary conditions into PC expansions methodology for addressing nonlinear differential equations. Subsequently, to solve the limitation of the proposed approach in [311], Sharma et al. [312] improved the physics-constrained PC expansions surrogate model to handle both equality-type and inequality-type constraints.

In conclusion, the incorporation of physical principles into the model serves to enhance its capacity to comprehend intricate phenomena. Meanwhile, the integration of knowledge and data not only facilitates effective predictions and generalization capabilities in the situation with limited training data, but also improves training efficiency and interpretability, ultimately providing more accurate and reliable predictions and decisions for resilience assessment and management planning. As a result, the future trajectory of artificial intelligence and civil engineering will undoubtedly lie in the intelligent fusion of domain knowledge with data-driven techniques.

6 Discussions and challenges

While ML techniques have demonstrated their effectiveness in addressing various UQ challenges, several complex tasks remain unresolved. The detailed discussion of these challenges is provided below:

  1. 1)

    In the domain of forward UQ analysis, a variety of ML approaches have been developed to evaluate and analyze the response and reliability of structures and systems. These methods are effective for addressing specific forward UQ problems. However, as science and technology progress, engineering structures and systems become increasingly complex, presenting significant challenges such as high-dimensional, highly nonlinear, and hybrid uncertainties that urgently need to be addressed. Moreover, adaptive learning techniques based on the ANN and PINN are still in their preliminary stages. Developing an efficient framework that enables ANN and PINN to provide predictions with uncertainty remains an ongoing research topic. Additionally, effectively utilizing the prediction uncertainty information to select the most influential training samples, thereby enhancing computational accuracy without wasting computational resources, is both a difficult and urgent task. Furthermore, although PINNs and neural operators circumvent direct computation of responses based on PDEs, the training time can be extensive when a large number of collocation points are used and when deep neural networks with a vast number of neurons are formulated to approximate complex physical models. This challenge is prevalent not only in UQ but also in many other ML-related fields.

  2. 2)

    The extensive reviewed methods for inverse UQ subtasks have been shown to be of good suitability for specific types of problems, but challenges still exist for some complex scenarios. For PMU, the five groups of methods are specifically developed, and thus are suitable for calibrating the (low- to medium-dimensional) deterministic-but-unknown model parameters, and some of the methods are shown sufficiently efficiency and effective for even multi-modal problems with expensive-to-evaluate simulators [172]. However, when applied to more sophisticated situations, these methods show a degree of powerlessness. For example, for problems with high-dimensional (e.g., several hundreds to several thousands) parameters, most of these methods are inapplicable, especially when the posterior density show multi-modal behavior in the high-dimensional parameter space as the distance between pair of modes can be extensively large. Although this issue can be alleviated by, e.g., performing a parameter screening using global sensitivity analysis, developing method of general applicability to this case is still appealing. For many simulators, a few parameters may depend on the inputs, and their epistemic uncertainty need to be captured with, e.g., stochastic fields, and in this case, the methods need to be extended for updating these stochastic fields, instead of simple probability distributions. Another challenge appears as the joint probabilistic calibration of the deterministic-but-unknown model parameters and model bias, as the model bias is unknown and cannot be neglected [114]. Further, it may get more challenging if the model parameters are intrinsically random with unknown distributions, in which case the target is to calibrate the distribution type and distribution parameters of each stochastic model parameters, and this type of problem is termed as stochastic model updating. It requires modelling the two types of uncertainties, i.e., aleatoric and epistemic uncertainties, with hierarchical probability models such that these two can be separately characterized. Calibrating (the hierarchical probability models of) these stochastic parameters, jointly with or without (the subjective probability models of) model bias, can be computationally challenging as it involves devising reasonable likelihood mapping the distance measure between stochastic observations and model predictions to a probability measure, and the numerical computation involves nested loops. For RBDO and RDO problems, the situations with high-dimensional design variables and/or random input variables also face challenges, especially when the targets/constraints are governed by expensive-to-estimate simulators, the targets show multi-modal or even non-continuous behaviors, and the feasibility regions are non-convex. The numerical implementation can be even more challenging when the random inputs involve also epistemic uncertainty due to lack of information, in which case the worst-case design optimization scheme needs to be introduced for RBDO and RDO, and it usually introduces one extra loop for numerical computation. For sure, the above-mentioned challenges are typical examples for inverse UQ which have not been well addressed by the state-of-the-art developments, other challenges may also exist.

  3. 3)

    Transfer learning has become a powerful tool in fault diagnosis within mechanical engineering, offering solutions to many limitations of traditional ML, particularly when dealing with scarce labeled data and mismatched feature distributions between training and testing datasets. By transferring knowledge from previously learned tasks, transfer learning enhances the model's ability to generalize across various domains and conditions. However, several challenges persist. One major issue is the variation in feature distributions between the source and target domains, which can result in poor diagnostic accuracy and increased misclassification rates if not properly addressed. Additionally, the scarcity of labeled data in the target domain further complicates model training, limiting the effectiveness of transfer learning. While advanced techniques like domain adaptation, statistical moment matching, and adversarial training have been developed to tackle these problems, they often require significant computational resources and precise optimization to achieve efficient cross-domain alignment. Another critical challenge is ensuring that models can remain robust when faced with noisy data and generalize effectively across diverse and unseen operational conditions. Addressing these challenges is vital for the successful application of transfer learning in real-world industrial scenarios, ensuring that models are both adaptable and reliable across different mechanical systems.

  4. 4)

    The diverse domain knowledge embedded ML methods have been demonstrated to be effective for solving several limitations in the construction of surrogate models for resilience assessment and management planning, but some challenges still exist as follow. The first challenge lies in determining the appropriate degree of domain knowledge integration. The ML model becomes more complex as the embedded degree of knowledge deepens. However, if the knowledge is not adequately embedded, its improvement on the model’s performance would be diminished. Striking a balance that reduces model complexity while enhancing generalization capability of ML models is a significant challenge. The second challenge is to identify the optimal representation forms of the embedded knowledge. Domain knowledge can be incorporated in various ways, including generating data for model training or directly constraining the model. It is difficult to compare these ways to determine the most effective forms for knowledge representation. The third challenge is applying domain knowledge embedded methods to textual and image data. Most existing studies focus on tabular and time-series data, leaving open the question of how to effectively integrate domain knowledge into models based on textual or image data. Additionally, developing the methods that enable the fusion of multi-source data with domain knowledge is also a complex problem. Finally, there is the challenge of accounting for model uncertainty. Previous studies had considered uncertainties related to loads and structures through different input parameters and approaches, but the inherent uncertainty of the model is neglected. Given that ML models completely differ traditional numerical models, understanding and managing their intrinsic uncertainties presents a considerable challenge.

7 Conclusions and future works

This paper provides a comprehensive survey of ML approaches for addressing various UQ tasks, including forward UQ analysis, inverse UQ analysis, fault diagnosis, resilience assessment and management planning. In the context of forward UQ analysis, response and reliability analyses enabled by ML techniques are reviewed. The research demonstrates that ML methods are widely applied in evaluating the response and reliability of engineering structures. Adaptive learning strategies and other advanced algorithms have proven effective in enhancing the accuracy of forward UQ analysis. For inverse UQ analysis, three key problem domains including PMU, RBDO, and RDO, are explored. The introduction of ML methods, such as regression models, active learning schemes, and Bayesian numerics, has significantly reshaped research approaches, leading to improved numerical efficiency, guaranteed convergence, and broader applicability. These advancements are crucial for promoting the adoption of inverse UQ techniques in real-world engineering contexts. In the domain of fault diagnosis, cross-domain recognition techniques, particularly those leveraging transfer learning are reviewed. The findings indicate that intelligent diagnostic methods based on deep domain adaptation have gained substantial attention, consistently delivering exceptional performance across diverse cross-domain diagnosis scenarios. For resilience assessment and management planning, domain-embedded ML methods such as physics-guided, physics-enhanced, physics-corrected, and physics-constrained approaches are discussed. The research shows that embedding physical principles within ML models enhances their ability to capture complex phenomena. Moreover, integrating domain knowledge with data not only facilitates accurate predictions and generalization under limited training data but also improves training efficiency and model interpretability. These capabilities contribute to more accurate and reliable predictions and decision-making in resilience assessment and management planning. Despite these advances, several key challenges remain for future research:

  1. 1)

    For forward UQ analysis, establishing efficient frameworks that enable neural network-driven models, such as the ANN and PINN, to provide predictions with associated uncertainties is a subject worthy of further investigation. These frameworks are crucial for assessing the confidence levels of forward UQ results and for developing adaptive learning approaches that enhance computational accuracy. Moreover, creating advanced strategies to expedite the training process of PINNs and neural operators in forward UQ analysis will further augment their applicability and effectiveness in addressing practical engineering tasks.

  2. 2)

    For inverse UQ subtasks, the state-of-the-art developments have been shown to be of sufficient for some relatively simple scenarios, and addressment of more challenging cases will be conducted in future work. For model calibration, addressing the high-dimensional deterministic-but-unknown parameters depends on inputs, the joint calibration of model parameters and model bias, the calibration of intrinsically stochastic model parameters, possible jointly with model bias, will be specifically studied in future research. For design optimization under uncertainty, problems with high-dimensional design and/or input variables, extremely small failure probability constraints, and inputs including mixed uncertainties need to be specifically treated.

  3. 3)

    For fault diagnosis, transfer learning offers significant advantages for fault diagnosis in mechanical engineering, particularly in addressing challenges related to limited labeled data and differing feature distributions across domains. While current approaches like domain adaptation and adversarial training have shown promise in enhancing cross-domain generalization, challenges such as computational complexity, the need for large-scale tuning, and maintaining robustness in noisy environments remain. Future work should focus on improving the efficiency and scalability of transfer learning models, reducing the computational burden, and enhancing their adaptability to a wider range of operating conditions. Additionally, developing more effective methods for handling sparse or noisy data, as well as creating models capable of learning from minimal supervision, will be essential for expanding the practical applicability of transfer learning in industrial fault diagnosis systems. Further research into unsupervised and semi-supervised techniques, as well as exploring more advanced domain adaptation methods, will help to refine these approaches and increase their usability in real-world scenarios.

  4. 4)

    For resilience assessment and management planning, in the future development of domain knowledge embedding, several following issues need to be addressed, other issues may also exist. Initially, efforts could be made to explore direct learning of data and structural evaluation results under domain knowledge embedded framework, which potentially bypasses the necessity for detailed structural analysis. Secondly, there is a possibility to establish a feedback-based system that can evaluate the forward prediction and then feed the evaluation results back into the model for updating. Moreover, the exploration of knowledge embedded methods that integrate multi-source data (e.g., tabular, image, textual, and time-series data) is crucial. Effectively combining and leveraging domain knowledge across these diverse data types would be a significant work. Eventually, another important direction for future research involves considering and quantifying the inherent uncertainty of ML models.