(Please note that this post was updated on 12th Dec 2020 – see below)
This week DeepMind has announced that, using artificial intelligence (AI), it has solved the 50-year old problem of ‘protein folding’. The announcement was made as the results were released from the 14th and latest competition on the Critical Assessment of Techniques for Protein Structure Prediction (CASP14). The competition pits teams of computational scientists against one another to see whose method is the best at predicting the structures of protein molecules – and DeepMind’s solution, ‘AlphaFold 2’, emerged as the clear winner.
There followed much breathless reporting in the media that AI can now be used to accurately predict the structures of proteins – the molecular machinery of every living thing. Previously the laborious experimental work of solving protein structures was the domain of protein crystallographers, NMR spectroscopists and cryo-electron microscopists, who worked for months and sometimes years to work out each new structure.
Should the experimentalist now all quit the lab and leave the field to Deep Mind?
No, they shouldn’t, for several reasons.
Firstly, there is no doubt that DeepMind have made a big step forward. Of all the teams competing against one another they are so far ahead of the pack that the other computational modellers may be thinking about giving up. But we are not yet at the point where we can say that protein folding is ‘solved’. For one thing, only two-thirds of DeepMind’s solutions were comparable to the experimentally determined structure of the protein. This is impressive but you have to bear in mind that they didn’t know exactly which two-thirds of their predictions were closest to correct until the comparison with experimental solutions was made.*
Would you buy a satnav that was only 67% accurate?
So a dose of realism is required. It is also difficult to see right now, despite DeepMind’s impressive performance, that this will immediately transform biology.
Alphafold 2 will certainly help to advance biology. For example, as already reported, it can generate folded structure predictions that can then be used to solve experimental structures by crystallography (and probably other techniques). So this will help the science of structure determination go a bit faster in some cases.
However, despite some of the claims being made, we are not at the point where this AI tool can be used for drug discovery. For DeepMind’s structure predictions (111 in all), the average or root-mean-squared difference (RMSD) in atomic positions between the prediction and the actual structure is 1.6 Å (0.16 nm). That’s about the size of a bond-length.
That sounds pretty good but it’s not clear from DeepMind’s announcement how that number is calculated. It might be calculated only by comparing the positions of the alpha-Carbon atoms in the protein backbone – a reasonable way to estimate the accuracy of the overall fold of the protein. Or, it might be calculated over all the atomic positions, a much more rigorous test. If it is the latter, then an RMSD of 1.6 Å is an even more impressive result.
But it’s still not nearly good enough for delivering reliable insights into protein chemistry or drug design. To do that, we want to be confident of atomic positions to within a margin of around 0.3 Å. AlphaFold 2’s best prediction has an RMSD for all atoms of 0.9 Å. Many of the predictions contributing to their average of 1.6 Å will have deviations in atomic positions even greater than that. So, despite the claims, we’re not yet ready to use Alphafold 2 to create new drugs.
There are other reasons not to believe that the protein folding problem is ‘solved’. AI methods rely on learning the rules of protein folding from existing protein structures. This means that it may find it more difficult to predict the structures of proteins with folds that are not well represented in the database of solved structures.
Also, as reported in Nature, the method cannot yet reliably tackle predictions of proteins that are components of multi-protein complexes. These are among the most interesting biological entities in living things (e.g. ribosomes, ion channels, polymerases). So there is quite a large territory remaining were AlphaFold 2 cannot take us. The experimentalists, who have been successful in mapping out the structures of complexes of growing complexity, have still a lot of valuable work to do.
While all of the above is supposed to sound a note of caution to counter some of the more hyperbolic claims that have been heard in the media in recent days, I still want to emphasise my admiration for the achievements of the AlphaFold team. They have clearly made a very significant advance.
That advance will be much clearer once their peer-reviewed paper is published (we should not judge science by press releases), and once the tool is openly available to the academic community – or indeed anyone who wants to study protein structure.
Update (02 Dec, 18:43): This post was updated to provide a clearer explanation of the RMSD measures used to compare predicted and experimentally determined protein structures. I am very grateful to Prof Leonid Sazanov who pointed out some necessary corrections and additions on Twitter.
*Update (12 Dec, 15:35): Strictly this is true, but it misses the more important point that the score given to each structure prediction (GDT_TS) broadly correlates with the closeness of its match to the experimental structure. As a result, I have deleted my SatNav crack.
For a deeply informed and very measured assessment of what DeepMind has actually achieved in CASP14, please read this blogpost by Prof. Mohammed AlQuraishi who knows this territory much better than I do. His post is pretty long but you can skip the technical bits explaining how AlphaFold 2 works. He gives a very good account of the nature of DeepMind’s advance; in AlQuraishi’s view, AlphaFold 2 does represent a solution to the protein structure prediction problem, though he is careful to define what he means by a solution. He also acknowledges that there are still some significant improvements to be made to the programme, but regards these as more of an engineering challenge than a scientific one. He agrees that AlphaFold 2 won’t be used any time soon for drug design work. AlQuraishi also gives an excellent overview of the implications of this work for protein folders, structural biologists and biotechnologists in general, and offers some very interesting thoughts on the differences between DeepMind’s approach to research and that of more traditional academic groups.