Thursday, December 10, 2020

Photonic quantum supremacy, protein folding, and "computing"

In the last week or two, there have been a couple of big science stories that I think raise some interesting issues about what we consider to be scientific computing.

In one example, Alphafold, a machine learning/AI approach to predicting protein structure, has demonstrated that it is really good at predicting protein structure.  Proteins are polymers made up of sequences of many amino acids, and in biological environments they fold up into complex shapes (with structural motifs like alpha helices and beta sheets) held together by hydrogen bonds. Proteins do an amazing amount of critical things in organisms (like act as enzymes to promote highly specific chemical reactions, or as motor units to move things around, or to pump specific ions and molecules in and out of cells and organelles).  Their ability to function in the wet, complex, constantly fluctuating biological environment is often dependent on minute details in their folded structure.  We only know snapshots of the structures of only some proteins because actually getting the structure requires crystallizing the protein molecules and performing high precision x-ray diffraction measurements on those crystals.  The challenge of understanding how proteins end up in particular functional structures based on their amino acid sequence is called the protein folding problem.  The statistical physics of folding is complex but usefully considered in terms of free energy landscapes.  It is possible to study large numbers of known protein structures and look for covariances (see here), correlations in sequences that show up commonly across many organisms.  Alphafold was trained on something like 100,000 structures and associated data, and is now good enough at predicting structures that it can actually allow people to solve complex x-ray diffraction data that was previously not analyzable, leading to new solved structures.  

This is very nice and will be a powerful tool, though like all such news stories one should be wary of the hype.  It does raise questions, and I would like to hear from experts:  Do we actually have greater foundational understanding of protein structure now?  Or have we created an extraordinarily effective interpolational look-up table?  It's useful either way, but the former might have more of an impact on our ability to understand the dynamics of proteins.  

That's a lot of optical components!
The second big story of the week is the photonic quantum supremacy achievement by a large group from USTC in China.  Through a very complex arrangement of optical components (see image), they report to have used boson sampling to determine statistical information about the properties of matrices at a level that would take an insanely long time with a classical computer.  Here, as with google's quantum supremacy claim (mentioned here), I again have to ask:  This is an amazing technical achievement, but is it really a computation, as opposed to an analog emulation or simulation?  If I filmed cream being stirred into coffee, and I analyzed the images to infer the flow of energy down to smaller and smaller length scales, I would describe that as an experiment, not as me doing a computation to solve the Navier-Stokes equations (which would also be very challenging to do with high precision on a classical computer).  Perhaps its splitting hairs, and quantum simulation is very interesting, but it does seem distinct to me from what most people would call computing.

Anyway, between AI/ML and quantum information sciences, it is surely an exciting time in the world of computing, broadly construed. 

(Sorry for the slow posting - end of semester grading + proposal writing have taken a lot of time.)


Random Biophysicist said...

Do we actually have greater foundational understanding of protein structure now? Or have we created an extraordinarily effective interpolational look-up table? It's useful either way, but the former might have more of an impact on our ability to understand the dynamics of proteins.

Right now I'd lean more towards "look-up table", but that might change as DM releases more details. My understanding of how alphafold works is that it constructs a multiple sequence alignment of proteins evolutionarily related to the one it's trying to predict, does a bunch of deep learning voodoo to mine features and correlations from the MSA, and then uses that information to construct an effective "force field" that's specific to that protein. Apparently a feature of this force field is that it's smooth enough to allow finding the minimum through good old gradient descent.

There's a long history of creating coarse grained effective force fields of proteins based on knowledge of the native structure, and these are able to reproduce major features of folding pathways and native state fluctuations with impressive success. Although they also can miss things that full atomistic physics based simulations get right. Basically "the details don't matter, except when they do". In my corner of biophysics, there's a lot of jealousy of the renormalization group, and a number of efforts (none very successful, IMO) to establish a similarly rigorous connection between full atomistic physics based representations and coarse grained effective models.

All of which is to say that the protein specific force fields that alphafold generates may well contain information on dynamics and folding pathways, but we won't know until DM releases more details on the methods.

DaveC said...

It's great that you are providing these notes of caution, Doug.

By far the most accurate and effective (not to mention affordable) way to simulate a complex many-body quantum system such as a cup of coffee is to brew a cup of coffee and watch its dynamics. It seems to have been proven basically that the Google machine and this optical machine are better at simulating themselves than a supercomputer is at simulating them. This applies to our cup of coffee too, but that does make the cup of coffee the dawn of a quantum revolution?

Anonymous said...

The difference is that the Google machine (but not the optical instrument in China) is programmable. You can bet that you can't encode useful computations into your coffee cup. Scott Aaronson has spoke about this point in detail, being hard to compute doesn't make you any good at being a computer! Same reason people don't use bubbles to solve the traveling salesman problem