NanoDesigner is an end-to-end workflow for the design and optimization of nanobodies. It integrates key stages—Structure Prediction, Docking, CDR Generation,and Side-Chain Packing—into an iterative framework based on an Expectation Maximization algorithm. Our method effectively tackles an often overlooked interdependency challenge where accurate docking presupposes a priori knowledge of the CDR conformation, while effective CDR generation relies on accurate docking outputs to guide its design.
- Installation
- External Tool Installation
- Data Download and Preprocess
- Training and Inference
- NanoDesigner
- Citation
- License
- Credits
git clone https://github.com/Melissaurious/NanoDesigner.git
cd NanoDesigner
conda env create -f nanodesigner_1.yml -n nanodesigner1
conda activate nanodesigner1
conda env create -f nanodesigner_2.yml -n nanodesigner2
conda activate nanodesigner2
The following repositories and software tools are required for NanoDesigner. Clone them into the NanoDesigner
directory and follow the installation steps provided in their respective webpages:
- IgFold - Trained models already included in nanodesigner1 conda environment.
- DockQ
- Rosetta
- FoldX
- HDOCK
- dr_sasa_n - Follow the instructions in the repository to compile this tool.
After installing the tools, ensure to update the dyMEAN/configs.py
file with the full paths to the installed tools.
Source code for TMscore evatuation is at dyMEAN/evaluation/
, please compile as:
g++ -static -O3 -ffast-math -lm -o evaluation/TMscore evaluation/TMscore.cpp
The data download and preparation steps are necessary to replicate our data processing, filtering, and preparation for training. All required instructions are included in the provided Jupyter notebooks.
- Open the notebook located at
jupyter_notebooks/process_datasets.ipynb
. - Follow the instructions in the notebook to download and preprocess the datasets.
- Once preprocessing is complete, open the notebook at
jupyter_notebooks/split_data.ipynb
. - Use this notebook to split the processed data into training and testing sets.
- Ensure Jupyter Notebook is installed. To check, run:
jupyter --version
The scripts
folder contains .sh
scripts for both training and inference workflows used in the study. These scripts are configured for each of the tools employed in this study and are designed to facilitate a 10-fold cross-validation setup.
- Location: Training scripts for each tool are located in the
scripts
directory. - Configuration: Update the file paths and any necessary parameters inside the scripts. This includes specifying paths for datasets, output directories and additional variables.
- 10-Fold Cross-Validation: The scripts are pre-configured to implement a 10-fold cross-validation strategy. Refer to -Data Download and Preprocess.
- Location: Inference scripts for each tool are also available in the
scripts
directory. - Configuration: Make sure that the paths across the training and inference scripts match. The folder specified in the training script dictates the location of the generated checkpoints, which will be used during inference.
- Manual Checkpoint Selection: For GNN-based tools, selection of the best checkpoint must be done manually. Refer to the instructions provided in the script files for guidance.
To run the training or inference for a specific tool, execute the corresponding script as in the example:
bash scripts/train_tool.sh
Note: The 10-fold generated datasets, used to conduct the proof of concept for this project, can be found in this Google Drive folder. For details on how these datasets were generated, please refer to the Data Download and Preprocess section.
NanoDesigner is an end-to-end workflow designed for both de novo and optimization cases in nanobody-antigen complex design. The workflow script is located in the scripts
folder and can be executed as follows:
bash scripts/NanoDesigner.sh your_working_directory/denovo_epitope_info/7eow_8pwh_example/7eow_8pwh_ep_1.json
The workflow requires a script and a JSON file containing the necessary information for each entry (a nanobody-antigen complex or nanobody scaffold and antigen structure).
-
De Novo Design: In cases where the 3D structure of a nanobody-antigen complex is absent (referred to as "de novo" design), the input JSON file can be generated using the notebook
jupyter_notebooks/prepare_NanoDesigner_inputs_Denovo.ipynb
. This notebook guides you through creating a properly formatted JSON file. -
Optimization Cases: For existing complexes, simply select a relevant line from the dataset-generated JSON files (prepared during the Data Download and Preprocess stage) and use it to create an input JSON file.
All required information for both cases should be obtained during the data download and preprocessing stage. Ensure the configuration files (config_files
) are updated as needed to reflect your setup.
For proof of concepts of NanoDesigner, please download and employ DiffAb or ADesigner trained models found here.
We highly encourage to keep a constant number of total number of designs across iterations for simplicity:
R = 50 # Number of randomized nanobodies (Initialization step)
N = 15 # Top best mutants to proceed with to subsequent iterations
d = 100 # Docked models to generate with Hdock
n = 5 # Top docked models to feed to inference stage
k_iteration_1 = 3 # Number of predictions obtained from CDR Generation stage at iteration 1
k_iteration_x = 10 # Number of predictions obtained from CDR Generation stage at iteration x
Rxnxk = 750 (Iteration 1)
Nxnxk = 750 (Iteration X)
*De novo design escenario; CDRH3 or 3CDRs design with ΔG optimization objective.
TODO
TODO
This codebase is primarily based on the following deep learning tools. We thank the authors for their contributions:
We also acknowledge the rest of tools and software that played a crucial role in the workflow employed in this study. We sincerely thank the authors of these tools for their invaluable work, which made this project possible.