
Researchers have designed a machine learning algorithm that predicts the outcome of chemical reactions with much higher accuracy than trained chemists and suggests ways to make complex molecules, removing a significant hurdle in drug discovery.
Researchers have designed a machine learning algorithm that predicts the outcome of chemical reactions with much higher accuracy than trained chemists and suggests ways to make complex molecules, removing a significant hurdle in drug discovery.
Our platform is like a GPS for chemistry
Alpha Lee
探花直播 of Cambridge researchers have shown that an algorithm can predict the outcomes of complex chemical reactions with over 90% accuracy, outperforming trained chemists. 探花直播algorithm also shows chemists how to make target compounds, providing the chemical 鈥榤ap鈥 to the desired destination. 探花直播results are reported in two studies in the journals and .听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听听
A central challenge in drug discovery and materials science is finding ways to make complicated organic molecules by chemically joining together simpler building blocks. 探花直播problem is that those building blocks often react in unexpected ways.
鈥淢aking molecules is often described as an art realised with trial-and-error experimentation because our understanding of chemical reactivity is far from complete,鈥 said Dr Alpha Lee from Cambridge鈥檚 Cavendish Laboratory, who led the studies. 鈥淢achine learning algorithms can have a better understanding of chemistry because they distil patterns of reactivity from millions of published chemical reactions, something that a chemist cannot do.鈥澨
探花直播algorithm developed by Lee and his group uses tools in pattern recognition to recognise how chemical groups in molecules react, by training the model on millions of reactions published in patents.
探花直播researchers looked at chemical reaction prediction as a machine translation problem. 探花直播reacting molecules are considered as one 鈥榣anguage,鈥 while the product is considered as a different language. 探花直播model then uses the patterns in the text to learn how to 鈥榯ranslate鈥 between the two languages.
Using this approach, the model achieves 90% accuracy in predicting the correct product of unseen chemical reactions, whereas the accuracy of trained human chemists is around 80%. 探花直播researchers say that the model is accurate enough to detect errors in the data and correctly predict a plethora of difficult reactions.
探花直播model also knows what it doesn鈥檛 know. It produces an uncertainty score, which eliminates incorrect predictions with 89% accuracy. As experiments are time-consuming, accurate prediction is crucial to avoid pursuing expensive experimental pathways that eventually end in failure.
In the second study, Lee and his group, collaborating with the biopharmaceutical company Pfizer, demonstrated the practical potential of the method in drug discovery.
探花直播researchers showed that when trained on published chemistry research, the model can make accurate predictions of reactions based on lab notebooks, showing that the model has learned the rules of chemistry and can apply it to drug discovery settings.
探花直播team also showed that the model can predict sequences of reactions that would lead to a desired product. They applied this methodology to diverse drug-like molecules, showing that the steps that it predicts are chemically reasonable. This technology can significantly reduce the time of preclinical drug discovery because it provides medicinal chemists with a blueprint of where to begin.
鈥淥ur platform is like a GPS for chemistry,鈥 said Lee, who is also a Research Fellow at St Catharine鈥檚 College. 鈥淚t informs chemists whether a reaction is a go or a no-go, and how to navigate reaction routes to make a new molecule.鈥
探花直播Cambridge researchers are currently using this reaction prediction technology to develop a complete platform that bridges the design-make-test cycle in drug discovery and materials discovery: predicting promising bioactive molecules, ways to make those complex organic molecules, and selecting the experiments that are the most informative. 探花直播researchers are now working on extracting chemical insights from the model, attempting to understand what it has learned that humans have not.
鈥淲e can potentially make a lot of progress in chemistry if we learn what kinds of patterns the model is looking at to make a prediction,鈥 said Peter Bolgar, a PhD student in synthetic organic chemistry involved in both studies. 鈥 探花直播model and human chemists together would become extremely powerful in designing experiments, more than each would be without the other.鈥
探花直播research was supported by the Winton Programme for the Physics of Sustainability and the Herchel Smith Fund.
References:
Philippe Schwaller et al. 鈥.鈥 ACS Central Science (2019). DOI: 10.1021/acscentsci.9b00576
Alpha Lee et al. 鈥.鈥 Chemical Communications (2019). DOI: 10.1039/C9CC05122H
听
探花直播text in this work is licensed under a . Images, including our videos, are Copyright 漏 探花直播 of Cambridge and licensors/contributors as identified.听 All rights reserved. We make our image and video content available in a number of ways 鈥 as here, on our main website under its Terms and conditions, and on a range of channels including social media that permit your use and sharing of our content under their respective Terms.