AI guided synthesis of the GDB Chemical Space
In previous work our group has exhaustively enumerated the chemical space of up to 11, 13, and 17 heavy atoms, the so-called generated databases (GDB). We have examined the synthetic feasibility of the GDB chemical space using computer aided synthesis planning. One of the approaches developed by the group in collaboration with AstraZeneca is AiZynthFinder, an open-source template based retrosynthetic planning tool. As the GDB project aims to tackle low molecular weight compounds in likeness to building blocks, we have augmented the underlying reaction datasets, built domain specific models such as for ring formations, and developed a rapid scoring methodology with uncertainty calibration to assess at scale which GDB molecules have potential to be synthesised in the wet lab.
Preliminary findings have shown that retrosynthetic routes can be found using AiZynthfinder for ca. 20 % of the compounds from a subset of GDB17 called GDBChEMBL displaying ChEMBL-like properties. We have expanded on these findings by using information about the predicted routes as training data for a machine learning based classifier capable of determining whether a synthetic route can be found for a given compound using AiZynthfinder or not, named the retrosynthetic accessibility score (RAscore). The RAscore has been trained on compounds from the ChEMBL database to make it generally applicable, and a more specific GDB based score has also been trained to demonstrate utility in domain-specific cases. RAscore presents a distinct advantage compared to running full retrosynthetic analysis on large sets of compounds, and computes at least 4500 times faster, enabling synthetic feasibility prediction to be conducted at scale, for instance in the virtual screening of databases of bioactive compounds.
In order to aid exploration of the GDB chemical space, the results are available to browse in a web-interface allowing for searching selected subsets, viewing synthetic routes, examining the building blocks used, and for which synthesis. We hope this will facilitate interest from our experimental counterparts and provide a tool for computational chemists to bridge the gap between simulation and the wet-lab.
[1] Thakkar, A.; Kogej, T.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. J. Datasets and Their Influence on the Development of Computer Assisted Synthesis Planning Tools in the Pharmaceutical Domain. Chem. Sci. 2020, 11 (1), 154–168. https://doi.org/10.1039/C9SC04944D.
[2] Thakkar, A.; Selmi, N.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. J. “Ring Breaker”: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space. J. Med. Chem. 2020. https://doi.org/10.1021/acs.jmedchem.9b01919.
[3] Thakkar, A.; Chadimová, V.; Bjerrum, E. J.; Engkvist, O.; Reymond, J.-L. Retrosynthetic Accessibility Score (RAscore) – Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning. Chem. Sci. 2021. https://doi.org/10.1039/D0SC05401A.
[4] Genheden, S.; Thakkar, A.; Chadimová, V.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. AiZynthFinder: A Fast, Robust and Flexible Open-Source Software for Retrosynthetic Planning. J. Cheminformatics 2020, 12 (1), 70. https://doi.org/10.1186/s13321-020-00472-1.