Gradient based Feature Attribution in Explainable AI: A Technical Review, Arxiv Preprint
An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records, EMNLP 2024, Blog
LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack, AAAI 2024
Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision, AAAI 2024
Using stratified sampling to improve LIME Image explanations, AAAI 2024
Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention, AAAI 2024
Beyond TreeSHAP: Efficient Computation of anyorder Shapley Interactions for Tree Ensembles, AAAI 2024
SHAP@k: Efficient and Probably Approximately Correct (PAC) Identification of Top-K Features, AAAI 2024
Approximating the Shapley Value without Marginal Contributions, AAAI 2024
GLIME: General, Stable and Local LIME Explanation, NIPS 2024
Deeply Explain CNN via Hierarchical Decomposition, IJCV 2023
Negative Flux Aggregation to Estimate Feature Attributions, IJCAI 2023, code
On Minimizing the Impact of Dataset Shifts on Actionable Explanations, UAI 2023
Counterfactual-based Saliency Map: Towards Visual Contrastive Explanations for Neural Networks, CVPR 2023
A Practical Upper Bound for the Worst-Case Attribution Deviations, CVPR 2023
IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients, CVPR 2023
Explaining Image Classifiers with Multiscale Directional Image Representation, CVPR 2023
SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries, CVPR 2023
Extending class activation mapping using Gaussian receptive field, CVIU Journal 2023
TSGB: Target-selective gradient backprop for probing CNN visual saliency, TIP 2022
Transferable Adversarial Attack Based on Integrated Gradients, ICLR 2022
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks, CVPR 2022
Consistent Explanations by Contrastive Learning, CVPR 2022
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers, CVPR 2022
REX: Reasoning-aware and Grounded Explanation, CVPR 2022
FAM: Visual Explanations for the Feature Representations from Deep Convolutional Networks, CVPR 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022
Do Explanations Explain? Model Knows Best, CVPR 2022
On Computing Probabilistic Explanations for Decision Trees, NeurIPS 2022
Exploiting the Relationship Between Kendall’s Rank Correlation and Cosine Similarity for Attribution Protection, NeurIPS 2022
Linear TreeShap, NeurIPS 2022
CS-SHAPLEY: Class-wise Shapley Values for Data Valuation in Classification, NeurIPS 2022
Consistent Sufficient Explanations and Minimal Local Rules for explaining any classifier or regressor, NeurIPS 2022
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound, NeurIPS 2022
Accurate Shapley Values for explaining tree-based models, AISTATS 2022
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations, NeurIPS 2022
What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods, NeurIPS 2022
Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF, NeurIPS 2022
What You See is What You Classify: Black Box Attributions, NeurIPS 2022
Explaining Preferences with Shapley Values, NeurIPS 2022
Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability, NeurIPS 2022
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability, NeurIPS 2022
Is this the Right Neighborhood? Accurate and Query Efficient Model Agnostic Explanations, NeurIPS 2022
Bayesian subset selection and variable importance for interpretable prediction and classification, NeurIPS 2022
Robust Models Are More Interpretable Because Attributions Look Normal, ICML 2022
Accelerating Shapley Explanation via Contributive Cooperator Selection, ICML 2022
Framework for Evaluating Faithfulness of Local Explanations, ICML 2022
XAI for Transformers: Better Explanations through Conservative Propagation, ICML 2022
A Functional Information Perspective on Model Interpretation, ICML 2022
A Psychological Theory of Explainability, ICML 2022
A Consistent and Efficient Evaluation Strategy for Attribution Methods, ICML 2022
A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions, ICML 2022
Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings, ICML 2022
Rational Shapley Values, FAccT 2022
Human Interpretation of Saliency-based Explanation Over Text, FAccT 2022
Higher-Order Explanations of Graph Neural Networks via Relevant Walks, TPAMI 2022
Explaining Explanations: Axiomatic Feature Interactions for Deep Networks, JMLR 2022
DERIVING EXPLAINABLE DISCRIMINATIVE ATTRIBUTES USING CONFUSION ABOUT COUNTERFACTUAL CLASS, ICASSP 2022
FastSHAP: Real-Time Shapley Value Estimation, ICLR 2022
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations, AAAI 2022
Backdoor Attacks on the DNN Interpretation System, AAAI 2022
Feature Importance Explanations for Temporal Black-Box Models, AAAI 2022
Do Feature Attribution Methods Correctly Attribute Features?, AAAI 2022
Improving performance of deep learning models with axiomatic attribution priors and expected gradients., Nature Machine Intelligence 2021
One Explanation is Not Enough: Structured Attention Graphs for Image Classification, NeurIPS 2021
On Locality of Local Explanation Models, NeurIPS 2021
Shapley Residuals: Quantifying the limits of the Shapley value for explanations, NeurIPS 2021
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations, NeurIPS 2021
Reliable Post hoc Explanations: Modeling Uncertainty in Explainability, NeurIPS 2021
Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis, NeurIPS 2021
Do Input Gradients Highlight Discriminative Features?, NeurIPS 2021
The effectiveness of feature attribution methods and its correlation with automatic evaluation scores, NeurIPS 2021
From global to local MDI variable importances for random forests and when they are Shapley values, NeurIPS 2021
Fast Axiomatic Attribution for Neural Networks, NeurIPS 2021
On Guaranteed Optimal Robust Explanations for NLP Models, IJCAI 2021
Explaining deep neural network models with adversarial gradient integration, IJCAI 2021
Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models, ACL 2021
What does LIME really see in images?, ICML 2021
Explanations for Monotonic Classifiers, ICML 2021
Explaining Time Series Predictions with Dynamic Masks, ICML 2021
On Explainability of Graph Neural Networks via Subgraph Explorations, ICML 2021
Generative Causal Explanations for Graph Neural Networks, ICML 2021
Explaining Explanations: Axiomatic Feature Interactions for Deep Networks, ICML 2021
How Interpretable and Trustworthy are GAMs?, KDD 2021
Leveraging Latent Features for Local Explanations, KDD 2021
S-LIME: Stabilized-LIME for Model Explanation, KDD 2021
An Experimental Study of Quantitative Evaluations on Saliency Methods, KDD 2021
TimeSHAP: Explaining Recurrent Models through Sequence Perturbations, KDD 2021
Black-box Explanation of Object Detectors via Saliency Maps, CVPR 2021 Interpreting Super-Resolution Networks with Local Attribution Maps, CVPR 2021
Transformer Interpretability Beyond Attention Visualization, CVPR 2021
A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts, CVPR 2021
Relevance-CAM: Your Model Already Knows Where to Look, CVPR 2021, code
Guided integrated gradients: An adaptive path method for removing noise, CVPR 2021
An Analysis of LIME for Text Data, AISTATS 2021
Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression, AISTATS 2021
A Unified Taylor Framework for Revisiting Attribution Methods, AAAI 2021
If You Like Shapley Then You’ll Love the Core, AAAI 2021
Explainable Models with Consistent Interpretations, AAAI 2021
On the Tractability of SHAP Explanations, AAAI 2021
Interpreting Multivariate Shapley Interactions in DNNs, AAAI 2021
Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking, ICLR 2021
Scaling Symbolic Methods using Gradients for Neural Model Explanation, ICLR 2021
Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability, ICLR 2021
Shapley explainability on the data manifold, ICLR 2021
ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping, NeurIPS 2020
What went wrong and when? Instance-wise Feature Importance for Time-series Models, NeurIPS 2020
How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods, NeurIPS 2020 code
Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability, NeurIPS 2020
Parameterized Explainer for Graph Neural Network, NeurIPS 2020
PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks, NeurIPS 2020
Visualizing the Impact of Feature Attribution Baselines, Distill 2020
There and Back Again: Revisiting Backpropagation Saliency Methods, CVPR 2020
Towards Visually Explaining Variational Autoencoders, CVPR 2020
Blur integrated gradient: Attribution in Scale and Space, CVPR 2020
Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution arxiv preprint 2020
GCN-LRP explanation: exploring latent attention of graph convolutional networks, IJCNN 2020
Visualizing Deep Networks by Optimizing with Integrated Gradients, AAAI 2020
LS-Tree: Model Interpretation When the Data Are Linguistic, AAAI 2020, slides
Investigating Saturation Effects in Integrated Gradients, ICMLW on WHI 2020
Robust and Stable Black Box Explanations, ICML 2020
Concise Explanations of Neural Networks using Adversarial Training, ICML 2020
You Shouldn’t Trust Me: Learning Models Which Conceal Unfairness From Multiple Explanation Methods, ECAI 2020
Bias also matters: Bias attribution for deep neural network explanation, ICML 2019
Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation, ICML 2019
On the Connection Between Adversarial Robustness and Saliency Map Interpretability, ICML 2019
Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation, ICML 2019
Explainability Techniques for Graph Convolutional Networks, ICML Workshop 2019
FullGrad, Full-Gradient Representation for Neural Network Visualization, NeurIPS 2019
Towards Automatic Concept-based Explanations, NeurIPS 2019
GNNExplainer: Generating Explanations for Graph Neural Networks, NeurIPS 2019
On the (In)fidelity and Sensitivity for Explanations, NeurIPS 2019
Robust Attribution Regularization, NeurIPS 2019
Explanations can be manipulated and geometry is to blame, NeurIPS 2019
Interpretation of Neural Networks is Fragile, AAAI 2019
XRAI: Better Attributions Through Regions, ICCV 2019
Understanding Deep Networks via Extremal Perturbations and Smooth Masks, ICCV 2019
L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data, ICLR 2019
Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks, CVPR 2019
Explainability Methods for Graph Convolutional Neural Networks, CVPR 2019
This Looks Like That: Deep Learning for Interpretable Image Recognition, NeurIPS 2019
“Why Should You Trust My Explanation?” Understanding Uncertainty in LIME Explanations, ICML 2019
Gradient-Based Vs. Propagation-Based Explanations: An Axiomatic Comparison, In book: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp.253-265, Springer 2019
The Many Shapley Values for Model Explanation, arxiv preprint 2019
Explaining the Explainer: A First Theoretical Analysis of LIME, arxiv preprint 2020
VarGard,Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values, ICLR 2018 workshop
NoiseTunnel, Sanity checks for saliency maps, NeurIPS 2018
Towards Robust Interpretability with Self-Explaining Neural Networks, NeurIPS 2018
Model Agnostic Supervised Local Explanations, NeurIPS 2018
Integrated Gradients, Did the Model Understand the Question?, ACL 2018
Neuron Integrated Gradients: Computationally Efficient Measures of Internal Neuron Importance , preprint 2018
TCAV: Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), ICML 2018
A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations, ICML 2018
L2X: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation, ICML 2018 code
Noise-adding Methods of Saliency Map as Series of Higher Order Partial Derivative, ICML 2018 workshop
InternalInfluence, Influence-Directed Explanations for Deep Convolutional Networks, IEEE International Test Conference 2018
Interpretable Basis Decomposition for Visual Explanation, 2018 ECCV
Grounding Visual Explanations, ECCV 2018
RuleMatrix: RuleMatrix: Visualizing and Understanding Classifiers with Rules, TVCG 2018
Manifold: Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models, TVCG 2018
Top-down neural attention by excitation backprop, IJCV 2018, (ECCV 2016)
RISE: Randomized Input Sampling for Explanation of Black-box Models, BMVC 2018
Shap: A unified approach to interpreting model predictions, NeurIPS 2017
Real Time Image Saliency for Black Box Classifiers, NeurIPS 2017
Explaining nonlinear classification decisions with deep Taylor decomposition, Pattern Recognition 2017
Interpretable Explanations of Black Boxes by Meaningful Perturbation, ICCV 2017
Grad-CAM: Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization ICCV 2017, IJCV 2019
Network Dissection: Quantifying Interpretability of Deep Visual Representations, CVPR 2017
DeepLIFT: Learning important features through propagating activation differences, ICML 2017
Integrated Gradients: Axiomatic attribution for deep networks, ICML 2017
SmoothGard: SmoothGrad: removing noise by adding noise, ICML 2017
Visualizing deep neural network decisions: Prediction difference analysis, ICLR 2017
Visualizing deep neural net- work decisions: Prediction difference analysis, arxiv preprint 2017
Lime: "Why Should I Trust You?": Explaining the Predictions of Any Classifier, SIGKDD 2016
Visualizing deep convolutional neural networks using natural pre-images, IJCV 2016
Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models, Arxiv preprint 2016
Salient deconvolutional networks, ECCV 2016
Top-down Neural Attention by Excitation Backprop, ECCV 2016
LRP: Layer-wise relevance propagation for neural networks with local renormalization layers, ICANN 2016
Gradient * input: Not Just a Black Box: Learning Important Features Through Propagating Activation Differences, arxiv preprint 2016
Investigating the influence of noise and distractors on the interpretation of neural networks, NeurIPS 2016
QII, Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems, IEEE Symposium on Security and Privacy (SP)
epsilon-LRP, On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation, PloS one 2015
Perturbation-Based method, Predicting effects of noncoding variants with deep learning–based sequence model, nature method 2015
CAM: Learning Deep Features for Discriminative Localization, CVPR 2015
Guided Backpropagation, Striving for simplicity: The all convolutional net, ICLR 2015
Understanding neural networks through deep visualization, arxiv preprint 2015
Back progagation: Deep inside convolutional networks: Visualising image classification models and saliency maps, ICLR 2014
Deconvnet: Visualizing and Understanding Convolutional Networks
Shapley sampling values: Explaining prediction models and individual predictions with feature contributions, ACM Knowledge and Information Systems 2014
Bounding the Estimation Error of Sampling-based Shapley Value Approximation, arxiv preprint 2013
Permutation importance: a corrected feature importance measure, Bioinformatics 2010
How to explain individual classification decisions, Journal of Machine Learning Research 2010
An Efficient Explanation of Individual Classifications using Game Theory, Journal of Machine Learning Research 2010
Explaining Classifications for Individual Instances, TKDE 2008
Review and comparison of methods to study the contribution of variables in artificial neural network models, Ecological Modelling 2003