Skip to content

A-Gentle-Cat/PremuNet

Repository files navigation

A Pretrained Multi-Representation Network for Molecular Property Prediction

OverView

PremuNet can extract information from multiple views and combine them interactively through pre-training and fine-tuning. The network consists of two branches: a Transformer-GNN branch that extracts SMILES and graph information, and a Fusion Net branch that extracts graph and conformation information, called PermuNet-L and PermuNet-H respectively. We employ masked self-supervised methods to enable the model to learn information fusion and achieve improved performance in downstream tasks.

totalmodel

Dataset

Dataset for pretraining.

Our pre-training process employs both the ChEMBL and PCQM4M datasets. The ChEMBL dataset can be accessed through this link, while the PCQM4M dataset is available for download here.

Dataset for finetune

We use MoleculeNet as our benchmark test. The datasets and splits of MoleculeNet in our experiment are available at this link. Download and unzip it into your dataset directory.

We have added 3D coordinates to the molecules in MoleculeNet to further improve prediction performance. The source for these 3D coordinates can be accessed via here. You should download the molecule_net.tar.gz file. Unzip it and place it in your dataset directory

Setup Environment

pip install -r requirements.txt

Pretrain

The pretraining weights can be found at this link. Please download them to the project root directory and unzip them. If you want to train your own pre-training weights, you can refer to the following steps.

SMILES-Transformer

For Atom-Level Pretrain

Step 1: Download PCQM4M Dataset

Step 2:

cd Transformer1D
python pretrain_trfm.py --data /pcqm4m-v2/raw/data.csv.gz

Note that you should replace parameter with your own data path.

For Graph-Level Pretrain

The pre-training of Graph-Level Transformer can follow the introduction in this link.

PremuNet-H

PremuNet-H module, we used the code from this link. You can consider using their code and steps for pre-training, or use our code with the following steps:

cd UnifiedMolPretrain
python train.py --num-layers 12 --batch-size 128 --enable-tb --node-attn --use-bn --pred-pos-residual --mask-prob 0.25

Finetuning

The configuration files for each dataset are located in the /configs directory. These parameters can be utilized directly as provided, or they can be modified as needed. The step for finetuning on MoleculeNet is as follows

Step 1: Download Dataset

You should unzip the downloaded dataset and place them separately under the /dataset/ directory.

Step 2: Get Molecular Feature

python pretrans_data.py --dataset_dir ${your dataset directory}

Step 3: Start finetuning

bash evaluate_all.sh

Cite

Zhang H, Wu J, Liu S, et al. A pre-trained multi-representation fusion network for molecular property prediction[J]. Information Fusion, 2024, 103: 102092.

Reference

This project utilizes the code from the UnifiedMolPretrain and Smiles-Transformer projects. The successful implementation of this project is attributed to the outstanding research papers and well-structured code provided by these initiatives.

About

Official code implementation of PremuNet model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages