Saturday, July 16, 2016

Impact factors and academic "moneyball"

For those who don't know the term:  Moneyball is the title of a book and a movie about the 2002 Oakland Athletics baseball team, a team with a payroll in the bottom 10% of major league baseball at the time.   They used a data-intensive, analytics-based strategy called sabermetrics to find "hidden value" and "market inefficiencies", to put together a very competitive team despite their very limited financial resources.   A recent (very fun if you're a baseball fan) book along the same lines is this one.  (It also has a wonderful discussion of confirmation bias!)

A couple of years ago there was a flurry of articles (like this one and the academic paper on which it was based) about whether a similar data-driven approach could be used in scientific academia - to predict success of individuals in research careers, perhaps to put together a better department or institute (a "roster") by getting a competitive edge at identifying likely successful researchers.

The central problems in trying to apply this philosophy to academia are the lack of really good metrics and the timescales involved in research careers.  Baseball is a paradise for people who love statistics.  The rules have been (largely) unchanged for over a hundred years; the seasons are very long (formerly 154 games, now 162), and in any game an everyday player can get multiple opportunities to show their offensive or defensive skills.   With modern tools it is possible to get quantitative information about every single pitched ball and batted ball.  As a result, the baseball stats community has come up with a huge number of quantitative metrics for evaluating performance in different aspects of the game, and they have a gigantic database against which to test their models.  They even have devised metrics to try and normalize out the effects of local environment (baseball park-neutral or adjusted stats).

Fig. 1, top panel, from this article.  x-axis = # of citations.
The mean of the distribution is strongly affected by the outliers.
In scientific research, there are very few metrics (publications; citation count; impact factor of the journals in which articles are published), and the total historical record available on which to base some evaluation of an early career researcher is practically the definition of what a baseball stats person would call "small sample size".   An article in Nature this week highlights the flaws with impact factor as a metric.  I've written before about this (here and here), pointing out that impact factor is a lousy statistic because it's dominated by outliers, and now I finally have a nice graph (fig. 1 in the article; top panel shown here) to illustrate this.  

So, in academia, the tantalizing fact is that there is almost certainly a lot of "hidden value" out there missed by traditional evaluation approaches.  Just relying on pedigree (where did so-and-so get their doctorate?) and high impact publications (person A must be better than person B because person A published a paper as a postdoc in a high impact glossy journal) almost certainly misses some people who could be outstanding researchers.  However, the lack of good metrics, the small sample sizes, the long timescales associated with research, and enormous local environmental influence (it's just easier to do cutting-edge work at Harvard than at Northern Michigan), all mean that it's incredibly hard to come up with a way to find these people via some analytic approach.  


Anonymous said...

Doug, the Nature article spends a lot of ink (or pixels) explaining what statisticians and economists know all too well: means or "averages" are meaningless in skewed distributions, and medians are the correct statistic (please correct the typo in your caption).

I suspect the reason why the impact factor got defined the way it did, is that a metric that uses median citation numbers would probably not differentiate much between many journals; most journals probably have a median article citation number smaller than 5. And this alone makes the point that this impact factor is a very coarse metric.

We live in a world where decision-makers rely on high-level metrics to make their decisions, or at least to confirm a decision they have already made in the old-fashioned way (like in academic hiring). Impact factors are just one more example of that mindset, and are probably not going to go away so easily. And whatever new metric replaces it, Nature editors will make sure that they still keep their top ranking.

Douglas Natelson said...

Anon, you're right, of course. I agree, both about why IF was designed as it was, the role of high-level (that is, representing complex multivariate properties with a single number) metrics, and the tendency to preserve hierarchies. (Sorry about the caption typo. That was one of those cases where my brain was thinking "median would be much more representative" and thus I typed "median" when the whole point is that "mean" = impact factor = a bad way to describe the information in that distribution.)

Anonymous said...

When metrics are so uncritically and widely sought and used to evaluate humans, certain part of humanity is obviously lost …

Humans are on the brink of deteriorating into modern barbarians.

Anonymous said...

Please scroll down this web site and there is huge report " The metrics tide" .

Douglas Natelson said...

Anon@8:03, thanks! That was very interesting reading.