Tuesday, July 01, 2008

What makes an experiment "good"

Recently I've had some conversations with a couple of people, including someone involved in journalism, about what makes a physics experiment good. I've been trying to think of a good way to explain my views on this; I think it's important, particularly since the lay public (and many journalists) don't have the background to judge realistically for themselves the difference between good and bad scientific results.

There are different kinds of experiments, of course, each with its own special requirements. I'll limit myself to condensed matter/AMO sorts of work, rather than high energy or nuclear. Astro is a whole separate issue, where one is often an observer rather than an experimenter, per se. In the world of precision measurement, it's absolutely critical to understand all sources of error, since the whole point of such experiments is to establish new limits of precision (like the g factor of the electron, which serves as an exquisite test of quantum electrodynamics) or bounds on quantities (like the electric dipole moment of the electron, which is darned close to zero as far as anyone can tell, and if it was nonzero there would be some major implications). Less stringent but still important is the broad class of experiments where some property is measured and compared quantitatively with theoretical expectations, either to demonstrate a realization of a prediction or, conversely, to show that a theoretical explanation now exists that is consistent with some phenomenon. A third kind of experiment is more phenomenological - demonstrating some new effect and placing bounds on it, showing the trends (how the phenomenon depends on controllable parameters), and advancing a hypothesis of explanation. This last type of situation is moderately common in nanoscale science.

One major hallmark of a good experiment is reproducibility. In the nano world this can be challenging, since there are times when measured properties can depend critically on parameters over which we have no direct control (e.g., the precise configuration of atoms at some surface). Still, in macroscopic systems at the least, one should reasonably expect that the same experiment with identical sample preparation run multiple times should give the same quantitative results. If it doesn't, that means (a) you don't actually have control of all the parameters that are important, and (b) it will be very difficult to figure out what's going on. If someone is reporting a surprising finding, how often is it seen? How readily is it reproduced, especially by independent researchers? This is an essential component of good work.

Likewise, clarity of design is nice. How are different parameters in the experiment inferred? Is the procedure to find those values robust? Are there built-in crosschecks that one can do to ensure that the measurements and related calculations make sense? Can the important independent variables be tweaked without affecting each other? Are the measurements really providing information that is useful?

Good analysis is also critical. Are there hidden assumptions? Are quantities normalized in sensible ways? Do trends make sense? Are the data plotted in ways that are fair? In essence, are apples being compared to apples? Are the conclusions consistent with the data, or truly implied by the data?

I know that some of this sounds vague. Anyone more eloquent than me want to try to articulate this more clearly?


Don Monroe said...

What makes an experiment good? Tough question. How do you determine if science is good? Maybe we can work from there.

For now, let me quote Supreme Court Justice Potter Stuart (on obscenity):

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it....

Anonymous said...

The pornography inspired "I know it when I see it" argument was among my first reactions as well, though I would not have been able to quote Supreme Court Justices.

But I think a quite important part of Doug's post is how to inform non-scientists about what is good science, or a good experiment. Somehow most everybody (even Supreme Court Justices!) develops the depth of knowledge they require to decide, for themselves, what is pornographic. I think to really have the base of knowledge to know good science when you see it takes, what, three to five years of dedicated study?

But it would certainly be great if everyone was interested and informed enough to figure out who to ask about what is good science.

Remember that the non-scientists, one way or another, pay the bills. If they had some deeper appreciation of good science, I would be much happier.

Anonymous said...

I think a good experiment is one that makes an otherwise difficult to observe effect crystal clear -- one where everyone says "I wish I had thought of doing it that way"

Don Monroe said...

All of your chosen features of a "good" experiment are insightful and important, but they miss the larger role of a good experiment: it teaches us something.

To be "good," an experiment should be not just well executed, but significant. This means that it should be designed to unamgibuously discriminate between alternative hypotheses, in a way that depends as little as possible on conceptual frameworks that may change with time.

Doug Natelson said...

1st anon - I agree, especially on your last point. There is some responsibility on us scientists and engineers to try to educate the public on what constitutes good science. I think what I'm talking about here is the experimental aspect of the "but it's just a theory" phenomenon, where the vernacular definition of theory is closer to "untested hypothesis", while the scientific definition is "rigorously supported hypothesis that makes testable predictions".

Don - Good to hear from you! I agree that a good experiment actually tells you something that you didn't know previously, or truly eliminates possible explanations for a phenomenon from contention. Your earlier response kind of gets at my issue, though.

Recently I was asked by a nonscientist to look at some experimental work in a particular area and offer my opinion on the work. The work did not pass my threshold for reproducibility - that is, if they took identically prepared macroscopic samples (or even one sample cut into nominally identical pieces) and ran the experiment several times, they would not get the same data (in this case, the t dependence of some property A(t)). The individual A(t) traces would all look very different, with lots of structure to them. To my mind, this is not reproducible. Now, the experimenters were claiming that the fact that avg(A(t)) was greater than 0 for 3/4 of the trials showed 75% reproducibility. I think that's very misleading - clearly the various samples were behaving very differently, and sweeping all of that under the rug by only looking at the sign of the average (not even the average itself) isn't fair. (Yes, there could be weird circumstances where this is ok, but this was not such a case.) Still, I had a very hard time explaining this to the nonscientist. That's what got me thinking about this issue. These experiments just weren't reproducible in the common scientifically accepted sense, yet to a lay person it was much more ambiguous.

Anonymous said...

Hi Doug, (first Anonymous again)

Here's my stab at explaining this particular situation to the non-scientist in question.

Let's say we decide to test if a coin is fair, but do it in a somewhat contrived way of holding the coin flat and parallel to a table about 3 inches high (note the calming use of English units, assuming your non-scientist is American...). For consistency, we always start with "heads" up. We repeat the experiment about ten times, each time recording whether the coin lands heads or tails.

I just did this (in a not perfectly controlled way of course) and if I believed my experiment was a good experiment, I would be convinced that my random quarter is terribly unfair. Actually I got 20 "tails" in 24 tries, the quarter tended to flip exactly once. But if I repeated the experiment from 1 inch high, I'm going to get different results...

In this case scientist and non-scientist alike know this coin is much more likely to actually be "fair" and that our result is incorrect, because this is not how you flip a coin. Often the trick in designing real experiments, is making sure that you don't "rig the coin." It sounds to me that in the experiment you were asked to review, they didn't know how they were flipping the coin...