Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
Despite significant progress in medicinal chemistry and life sciences research, drug discovery and development remain slow and expensive, taking on average approximately 15 years and US$2 billion to take a small-molecule drug to market. In silico approaches have attracted considerable interest because of their potential to accelerate drug discovery in terms of time, labor, and costs, and have become a key driving force for drug discovery in both academia and industry.
Computer-aided drug discovery (CADD) approaches include computational identification of potential drug targets, virtual screening of large chemical libraries for effective drug candidates, further optimization of candidate compounds, and in silico assessment of their potential toxicity and bioavailability. Over the past few years, big data and machine learning approaches have been integrated into conventional CADD to increase the accuracy and efficiency of in silico drug discovery. These include the structure-based virtual screening of large-scale chemical spaces, fast iterative screening approaches, and deep learning predictions of ligand properties and target activities, among others.
This Collection aims to collate the latest advances in computational method development for drug discovery and medicinal chemistry, as well as their application in preclinical studies. We welcome submissions in all related areas of research, including but not limited to:
data-driven drug design
virtual screening
de novo drug design
lead optimization
ADMET property prediction
The Collection primarily welcomes original research papers as well as Reviews and Perspectives, and we encourage submissions from all authors—and not by invitation only.
Fragment-based drug design plays a pivotal role in the field of drug discovery and development, however, the construction of high-quality fragment libraries is a critical but challenging step. Here, the authors develop DigFrag, a digital fragmentation method based on the graph attention mechanism, showing higher structural diversity of the fragments and higher applicability to artificial intelligence-based drug design.
Artificial Intelligence (AI) is accelerating drug discovery. Here the authors introduce a new approach to de novo molecule design - structured state space sequence models - to further extend AI’s capabilities of charting the chemical universe.
An efficient representation of molecular structures is crucial for the implementation of data-driven methods. Here the authors present t-SMILES, a representation that encodes molecular substructures into strings, giving more structure to the SMILES representation.
Machine learning (ML) is a powerful tool in the field of drug discovery, with the continuous development of new models, however, rational selection of the most appropriate model based on the task remains challenging. Here, the authors explore the capabilities of classical ML algorithms and newer models over a range of dataset tasks and show an optimal zone for each model type, developing a predictive model to aid in the selection of a modeling method based on dataset size and diversity.
Structure-based generative chemistry is crucial in computer-aided drug discovery. Here, authors propose PMDM, a conditional generative model for 3D molecule generation tailored to specific targets. Extensive experiments demonstrate that PMDM can effectively generate rational bioactive molecules
Designing peptides that bind to specific protein targets is crucial for peptidic drug development, however, traditional computer-aided binder design is outperformed by AlphaFold2. Here, the authors develop a peptide binder designing tool by combining Foldseek, ESM-IF1 and AlphaFold2 to increase the success rate.
Data-driven computational methods have demonstrated promising potential in predicting compound activities from chemical structures, however, unbiased practical applications remain challenging due to the lack of proper benchmarking methods. Here, the authors develop a benchmark termed CARA to eliminate the biases in current compound activity data by using new train-test splitting schemes and evaluation metrics, revealing accurate and informative model performances.
Identifying active compounds for a target is time- and resource-intensive. Here, the authors show that deep learning models trained on Cell Painting and single-point activity data, can reliably predict compound activity across diverse targets while maintaining high hit rates and scaffold diversity.
Reduced molecular graphs can integrate higher-level chemical information and leverage advantages from atom-level graph neural networks. Here, the authors introduce the Multiple Molecular Graph eXplainable model, investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation from various perspectives
AI has become a crucial tool for drug discovery, but how to properly represent molecules for data-driven property prediction is still an open question. Here the authors evaluate 62,820 models to highlight existing challenges, the impact of activity cliffs, and the crucial role of dataset size.
Binding free energy calculations are crucial in computational drug discovery, however, the current alchemical free energy perturbation (FEP) requires large computational capabilities to achieve high accuracy. Here, the authors develop an alternative method by combining QM/MM and the mining minima method to predict free energies at lower computational cost and with comparable accuracy to FEP-based methods.
The paper presents the universal QM-based scoring function that accurately and rapidly predicts protein-ligand binding affinities, outperforming current computational tools. This is demonstrated on the PL-REX experimental benchmark dataset.
Proteins often function by changing conformations upon ligand binding. Efficient structural modelling of these interactions, crucial for drug discovery, is limited: here the authors address this with DynamicBind, a diffusion-based deep generative model.
Relative binding free energy calculations are widely used to guide drug discovery by accurately computing binding affinity, however, these simulations remain complicated to set up, computationally expensive to run, and technically challenging to scale up. Here, the authors develop an end-to-end relative free energy workflow based on a non-equilibrium switching approach that calculates the binding free energies starting from SMILES strings.
Streamlined data-driven drug discovery remains challenging, especially in resource-limited settings. Here, the authors present ZairaChem, an AI/ML tool that streamlines QSAR/QSPR modelling, implemented for the first time at the H3D Centre in South Africa.
Virtual screening methods for drug discovery typically rely on static structures and lack efficient incorporation of dynamic information exhibited in experimental electron densities. Here, the authors develop an approach utilizing multi-resolution experimental electron density maps to screen docking poses, with the effectiveness demonstrated in both the improvement of active compound enriching exhibited in the test using DUD-E data set and the identification of four inhibitors of Covid-19 3CLpro with IC50 of up to 1.9 μM.