PremuNet can extract information from multiple views and combine them interactively through pre-training and fine-tuning. The network consists of two branches: a Transformer-GNN branch that extracts SMILES and graph information, and a Fusion Net branch that extracts graph and conformation information, called PermuNet-L and PermuNet-H respectively. We employ masked self-supervised methods to enable the model to learn information fusion and achieve improved performance in downstream tasks.
Our pre-training process employs both the ChEMBL and PCQM4M datasets. The ChEMBL dataset can be accessed through this link, while the PCQM4M dataset is available for download here.
We use MoleculeNet as our benchmark test. The datasets and splits of MoleculeNet in our experiment are available at this link. Download and unzip it into your dataset directory.
We have added 3D coordinates to the molecules in MoleculeNet to further improve prediction performance. The source for these 3D coordinates can be accessed via here.
You should download the molecule_net.tar.gz
file. Unzip it and place it in your dataset directory
pip install -r requirements.txt
The pretraining weights can be found at this link. Please download them to the project root directory and unzip them. If you want to train your own pre-training weights, you can refer to the following steps.
Step 1: Download PCQM4M Dataset
Step 2:
cd Transformer1D
python pretrain_trfm.py --data /pcqm4m-v2/raw/data.csv.gz
Note that you should replace parameter with your own data path.
The pre-training of Graph-Level Transformer can follow the introduction in this link.
PremuNet-H module, we used the code from this link. You can consider using their code and steps for pre-training, or use our code with the following steps:
cd UnifiedMolPretrain
python train.py --num-layers 12 --batch-size 128 --enable-tb --node-attn --use-bn --pred-pos-residual --mask-prob 0.25
The configuration files for each dataset are located in the /configs directory. These parameters can be utilized directly as provided, or they can be modified as needed. The step for finetuning on MoleculeNet is as follows
You should unzip the downloaded dataset and place them separately under the /dataset/
directory.
python pretrans_data.py --dataset_dir ${your dataset directory}
bash evaluate_all.sh
Zhang H, Wu J, Liu S, et al. A pre-trained multi-representation fusion network for molecular property prediction[J]. Information Fusion, 2024, 103: 102092.
This project utilizes the code from the UnifiedMolPretrain and Smiles-Transformer projects. The successful implementation of this project is attributed to the outstanding research papers and well-structured code provided by these initiatives.