Friday, January 26, 2024

Generative AI and scientific images/diagrams

Generative AI for image generation is a controversial topic for many reasons.  Still, as someone who doesn't have a staff of graphic artists on hand to help make scientific illustrations, it has certainly been tempting to see whether it might be a useful tool.  My brief experiments are based using bing's integrated engine (which I believe is DALL-E 3) since Rice has a license.  The short summary:  Trying to make scientific illustrations this way is great at showing how this technology is not close to useful yet for this application.  While natively integrated w/ chatGPT, it still cannot parse prompts well, indicating that it doesn't really "know" the meaning of a lot of basic terms and how those relate to its image training data.

Let me give you an example.  When I asked it "Make me an image of a benzene molecule showing the chemical bonds between the carbon atoms", this is the response:  

When I tried to refine this by typing "Make an image of a benzene molecule.", it responded by showing me image search results with sources, which were all correctly benzene, but not an original image.  When I responded "Please create a new image (not from an existing source) of a benzene molecule.", it responded with "Ok, I will create a new image from scratch showing a benzene molecule." and showed me this:

I then tried more specificity in the prompt:  "Create (from scratch) an image of a benzene molecule, which consists of six carbon atoms (black spheres) linked in a ring, with one hydrogen atom (a smaller red sphere) attached to each carbon atom."  The result was this:
Closer, but I'm still surprised at how poorly any of these resemble the stock images of benzene that it correctly found through image searching on the internet.  That last one is particularly close, so I tried "The fourth image is close, but it should have only one hydrogen attached to each carbon, not three."  In classic chatGPT form, the response was "I’m sorry, I made a mistake. I’ll try to create a new image of a benzene molecule with only one hydrogen attached to each carbon. Please wait a moment."  It then produced this:
Clearly it just cannot understand or follow the instructions yet. 

Similar things happen if you ask it to make an image of a lattice.  If I ask "Make me an image from scratch of a simple cubic lattice.", the verbal and image responses are quite encouraging.  "I’ll try to create that. Here is a new image of a simple cubic lattice that I generated from scratch. A simple cubic lattice is a type of crystal structure where the atoms are arranged in a regular pattern of cubes. Each atom is located at the center of a cube and has six nearest neighbors. A simple cubic lattice is one of the simplest and most symmetric types of lattices."

This is quite close, though all of these have problems (non-orthogonal angles, uneven spacings).  Things go off the rails badly, though, if you want a face-centered cubic lattice.  It can say the definition of an FCC lattice reasonably well:  "I’ll try to create that. Here is a new image of a face-centered cubic lattice that I generated from scratch. A face-centered cubic lattice is a type of crystal structure where the atoms are arranged in a regular pattern of cubes. Each atom is located at the center of a cube and at the center of each face of the cube. A face-centered cubic lattice has 12 nearest neighbors for each atom and is one of the most common and dense types of lattices."  However, the images are, ummm, artistic.
These are quite different from the simple image search of the internet for "face-centered cubic lattice", which does a good job if you don't mind using someone else's image.  The bottom line:  These kinds of image generation tools have enormous potential to save us time, but right now the parsing technology and training methods just aren't there yet.  If a simple image search turns up the right targets, but the generation tool can't make images that look like those search results, then training has failed somewhere. 

If anyone out there has found some flavor of these image generation capabilities that really does work well for helping to make scientific illustrations, please let me know.

Update:  Google just released their free image generation tool, and while it still has problems like those above, it's pretty impressive.  For example, here is what it made from the prompt "drawing of a benzene molecule, consisting of six black carbon atoms arranged in a ring, with each carbon atom connected to one red hydrogen atom":

That's certainly better than anything DALL-E made.  Of course, the other three images that came along with that one were all screwed up.  Still, progress.


 


6 comments:

  1. @Doug, I've had very similar experience with generative AI for scientific images, even if you try to prompt in very specific ways.

    What I have found that works is using generative AI for programmatic construction of images, i.e., have it write code or markup language. You could ask for python code that visualizes a structure, ask for SMILES for a molecule, or even ask for it to provide SVG of a structure.

    The downside here is there is still the step of need to run/visualize elsewhere and then going back-and-forth till you get something of your liking. For illustrative style images, I typically ask for an SVG draft and then use inkscape to enhance.

    ReplyDelete
  2. But it does OK at rendering pot roasts used as strategic defense weaponry. :)

    https://cdn.midjourney.com/82f8bd22-0bc2-4809-8030-53ee3f349547/0_0_2048_N.webp?method=width&qst=6

    ReplyDelete
  3. Bernie, do you pay a subscription fee for midjourney?

    ReplyDelete
  4. Anonymous10:59 PM

    I wonder if you would get better luck trying a tool where you input a cartoon image of what you want and saying "make this a nature cover image", rather than using a pure text prompt. At least at the current state of things

    Nonetheless, I find it interesting that it perfectly captures the aesthetic that all artists go for these days for high profile publications which is more eye candy than scientific. The number of well off PI's I see wasting thousands of dollars (on top of ludicrous publication costs) just to get a cover is really sad.

    ReplyDelete
  5. Doug--
    I'm on the basic $10/month plan. So far it's been worth it for the oddities and occasional D&D portraits.
    -=-Bernie

    ReplyDelete
  6. nanoscale blog about condensed matter and nanoscale physics. Why should high energy and astro folks have all the fun?

    ReplyDelete