This repository collects all relevant resources about interpretability in LLMs
-
Updated
Nov 1, 2024
This repository collects all relevant resources about interpretability in LLMs
MICCAI 2022 (Oral): Interpretable Graph Neural Networks for Connectome-Based Brain Disorder Analysis
Discover and Cure: Concept-aware Mitigation of Spurious Correlation (ICML 2023)
[KDD'22] Source codes of "Graph Rationalization with Environment-based Augmentations"
Official code for the CVPR 2022 (oral) paper "OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks."
[ICCV 2023] Learning Support and Trivial Prototypes for Interpretable Image Classification
Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"
TraceFL is a novel mechanism for Federated Learning that achieves interpretability by tracking neuron provenance. It identifies clients responsible for global model predictions, achieving 99% accuracy across diverse datasets (e.g., medical imaging) and neural networks (e.g., GPT).
Explainable AI: From Simple Rules to Complex Generative Models
Explainable Boosting Machines
Build a Neural net from scratch without keras or pytorch just by using numpy for calculus, pandas for data loading.
Explainable Speaker Recognition
Visualization methods to interpret CNNs and Vision Transformers, trained in a supervised or self-supervised way. The methods are based on CAM or on the attention mechanism of Transformers. The results are evaluated qualitatively and quantitatively.
Semi-supervised Concept Bottleneck Models (SSCBM)
[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality
Recbole extension with focus on Knowledge Graphs (KGs) and interpretability/explainability.
Interpretability: Methods for Identification and Retrieval of Concepts in CNN Networks
Interpretable Anomaly Severity Detection on UAV Flight Log Messages
My PhD thesis in NUS. Making it public so that future graduate students may benefit.
Add a description, image, and links to the interpretability-and-explainability topic page so that developers can more easily learn about it.
To associate your repository with the interpretability-and-explainability topic, visit your repo's landing page and select "manage topics."