Tuesday, December 12, 2023

AI/ML and condensed matter + materials science

Materials define the way we live.  That may sound like an exaggeration that I like to spout because I'm a condensed matter physicist, but it's demonstrably true.  Remember, past historians have given us terms like "Stone Age", "Bronze Age", and "Iron Age", and the "Information Age" has also been called the "Silicon Age".  (And who could forget plastics.)

Perhaps it's not surprising, then, that some of the biggest, most wealthy companies in the world are turning their attention to materials and the possibility that AI approaches could lead to disruptive changes.  As I mentioned last week, there have been recent papers (back to back in Nature) by the Google Deep Mind group on this topic.  The idea is to use their particular flavor of AI/machine learning to identify potential new compounds/solids that should be thermodynamically stable and synthesizable, and make predictions about their structures and properties.  This is not a new idea, in that the Materials Genome Initiative (started in 2011) has been working in this direction, compiling large amounts of data about solid materials and their properties, and the Materials Project has been pushing on efficient computational methods with the modest goal of computing "the properties of all inorganic materials and provid[ing] the data and associated analysis algorithms for every materials researcher free of charge".

In addition to the Google work, Microsoft has released on the arxiv their effort, MatterGen, which uses a generative AI approach to try to predict new stable materials with desirable properties, such as a target symmetry or chemical composition or mechanical/electronic/magnetic response.  An example from their paper is to try to find new magnetic materials that have industrially useful properties but do not involve rare earths.  

There is a long way to go on any of these projects, but it's easy to see why the approach is enticing.  Imagine saying, I want a material that's as electrically conductive and mechanically strong and workable as aluminum, but transparent in the visible, and having software give you a credible approach likely to succeed (rather than having to rely on a time-traveling Mr. Scott).  

I'd be curious to know readers' opinions of what constitute the biggest obstacles on this path.  Is it the reliability of computational methods at predicting formation energies and structures?  Is it the lack of rapid yet robust experimental screening approaches?  Is it that the way generative AI and related tools work is just not well-suited to finding truly new systems beyond their training sets?

10 comments:

Stefan Bringuier said...

"I'd be curious to know readers' opinions of what constitute the biggest obstacles on this path. Is it the reliability of computational methods at predicting formation energies and structures? "

I think so. Almost all the training data for these generative models is DFT calculations with numerical details generalized for a large swath of material classes. Furthermore, we are training on an approximation to the true many-body quantum physics problem, so its only ever as good as the approximation (i.e., XC functional), which is good in some cases but definitely not in others. I know there is some work in ML XC functionals to make improvements.

Anonymous said...

Maybe I am biased, but coming from a strongly correlated materials background, I don't think I've ever seen amazing material discoveries preceded by a prediction of that materials existence (and possibly interest) by theory/computation.

Maybe I just don't have a good memory, or my bar for "interesting material" is too high.

This is all to say that such a database might be more useful as a list of possible stable structures to take as starting points, as opposed to a list of interesting materials we should synthesize.

Anonymous said...

Nickelates

Jon Barnard said...

The quality of the training data is critical. As Cory Doctorow remarked, MLs are still bound by GIGO (garbage in; garbage out). Simulation packages are ten-a-penny, so the best MLs will be trained on actual experimental data. Probably the most successful ML based discovery I know of is the collaboration between Harshad ("Harry") Bhadeshia (ferrous metallurgist) and David MacKay (Inference Group).

They put together a ML algorithm trained on the microstructure of steels produced by various processing steps, but used data from the literature - the ferrous metallurgy literature is suitably extensive. Looking for a steel that was both hard and tough, the algorithm suggested a processing path that gave an extremely fine-grained retained austenitic steel that could be batch made in large quantities. "Super bainite" was discovered. The toughness and wear resistance was about twice that of martensitic steels and was used for the rails in the Channel Tunnel (rail line linking the UK to mainland Europe), i.e. all this took place over twenty years ago. However, it was Harry's knowledge of the literature and David's understanding of Bayesian algorithms that made it work. A more modern example would be Alphafold and the protein crystallographic databases.

Pizza Perusing Physicist said...

Putting aside the fears I have about the unintended consequences of this technology, in my opinion the biggest obstacle right now, from a scientific standpoint, is the last point that you mentioned: "the way generative AI and related tools work is just not well-suited to finding truly new systems beyond their training sets".

Ultimately, no matter how many training examples you give an algorithm, I just don't see how an AI that is trained entirely on low-temperature inorganic quantum materials could ever predict the way a biopolymer folds in an ambient cellular environment. I suppose, if you did something akin to physics-guided AI, where the machine learning is supplemented and constrained using prior mechanistic theories (quantum mechanics, nonequilibrium thermodynamics), it might be possible 'in principle' given 'enough' computational resources. I guess that is already in part taking place with the fact that these AI engines are trained using ab initio calculations, but again, those are limited by their approximations and the fact that a full-blown exact and universal solution to the fundamental equations, applicable for any and all purposes, is never really possible.

That's not to say that such AI tools won't still be useful, but that they'll be limited at any given moment in history by their generalization capacity, which is limited by, among other things, the extent to which they can important ever more general and universal physical theory. And of course, this capacity will continue to grow as the hardware and algorithms evolve. But no matter how big it gets, there will always be the possibility of making it even larger with an even better version. I imagine it'll be a situation akin to the shrinking of transistor size and Moore's law.

Pizza Perusing Physicist said...

*incorporate, not important

Douglas Natelson said...

Thanks, everyone, for the insightful comments.
@Stefan and anon@9:58, yes, I agree that systems with strong correlations and/or open shells, for which the xc part of the functional would seem to be critical, are the really tough nuts to crack. I know there has been some work to try and use ML to gain physical insights into this, but I haven't been keeping up.
@Jon, thanks for the story about super bainite - very cool!
@PPP, I think no one would reasonably expect a tool trained on solid state chemistry to be good at protein folding. I think an interesting question is, will a solid-state tool trained on, e.g., oxide magnetic materials, be able to make a leap to predicting magnetic materials made from chalcogenides or nitrides or something - systems with different symmetries and chemistry, or will it only be any good within its training subdomain.

Pizza Perusing Physicist said...

Right, that was an extreme example, but the key point that I was getting across is that generalization will ultimately be the bottleneck that keeps getting improved as the technology evolves (in my opinion).

Anonymous said...

I'd personally still bet on e.g. Bob Cava, if I wanted to find some cool new material with a particular property.

Anonymous said...

@anon 2:14 PM

Bob Cava is like 70, so we can't rely on him forever, well maybe someone will make a character.ai version of him or something.