Thursday, June 20, 2019

The physics subject GRE and grad school

As I've mentioned before, there is a lot of discussion lately about the physics subject GRE.  The exam is intended to cover a typical undergrad physics curriculum in terms of content, and is in the format of about 100 multiple-choice questions in about 170 minutes.  The test is put together with input from a committee of physics faculty, and there is presently a survey underway by ETS to look at undergrad curriculum content and subscores as a way to improve the test's utility in grad admission.  The issue out there is to what extent the test should be a component in admissions decisions for doctoral programs. 

The most common argument for requiring such a test is that it is a uniform, standardized approach that can be applied across all applicants.  Recommendation letters are subjective; undergraduate grades are likewise a challenge to normalize between different colleges and universities.  The subject exam is meant to allow comparisons that avoid such subjectivity.  ETS points to studies (e.g., this one) that argue meaningful correlations between subject test scores and first-year graduate GPA. 

At the same time, there has been a growing trend away from emphasizing the test.  The astronomy and astrophysics community has been moving that way for several years - see here.  There have been recent studies (e.g. this one, with statistics heavily criticized here and relevant discussion here) arguing that the test scores are not helpful in actually predicting success (degree completion, for example) in doctoral programs.  In our own graduate program, one of my colleagues did a careful analysis of 17 years worth of data, and also found (to the surprise of many) basically no clear correlation between the subject test score and success in the program.  (Sampling is tricky - after all, we can only look at those students that we did choose to admit.)  At the same time, the tests are a financial obligation, and as mentioned here scores tend to be systematically lower for women and underrepresented minorities due to educational background and access to opportunities. 

Our program at Rice has decided to drop the physics subject GRE.  This decision was a result of long consideration and discussion, and the data from our own program are hard to argue.  It all comes down to whether the benefits of the test outweigh the negatives.  There is no doubt that the test measures proficiency at rapidly answering those types of questions.  It seems, however, that this measurement is just not that useful to us, because many other factors come into play in making someone an effective doctoral student.   Similarly, when people decide to leave graduate school, it is rare that the driving issue is lack of proficiency in what the test measures. 

I'm on a mailing list of physics department chairs, and it's been very interesting to watch the discussion back and forth on this topic and how much it mirrored our own.  It takes years to see the long term effects of these decisions, but it will definitely be something to watch. 

14 comments:

Anonymous said...

When both were required, would you say the subject test was weighted much more heavily than the regular GRE? My research areas borders on CMP, but I'm in an engineering department, so no GRE subject test. For us, the general approach seems to be to check for a "high enough" quant score, but not really a huge factor in our admissions.

Douglas Natelson said...

Anon, good question. We never had a formula or anything. I agree that the quant and analytical parts of the general exam tend to be viewed as a rough check; it's pretty rare, though, to see a big disparity between those test scores and undergrad grades, for example, even though grading can vary from school to school. For instance, my perception is that it's unusual to see someone with strong math grades and poor quantitative scores, or poor math grades but strong quantitative scores.

Pizza Perusing Physicist said...

While the criticisms of the subject GRE are certainly valid and understandable, might it not be possible to address them (at least partially) by modifying and revamping the structure of the test, as opposed to flat out eliminating it? For example, instead of a hundred multiple choice questions, perhaps we could have the test be more reflective of typical in-class exams for undergrad and grad courses, with more open ended free response questions? I recognize that a common objection to this proposal might be that the logistics of organizing a way to grade such an exam would be prohibitively costly, but on the other hand, we have found a way to do this for the AP Physics exams, and there are a lot more students who take those than take the Physics GRE.

Douglas Natelson said...

PPP, I think that would definitely be a help. The nature of the exam at present is very far away from what we do.... A confession of bias: I've always had a healthy distrust of standardized tests. Even at the time I took the subject test, I felt like it mostly rewarded the fact that I am a fast reader and have a good memory, not much to do with my physics abilities.

Brian said...

I'm curious about the breakdown of faculty reactions to the idea of dropping the subject GRE across the lines of experimentalists vs. theorists. For myself, after doing some research it made sense to me to not use the test for admissions for the reasons you mentioned in your post. But, I also have my own bias that as an experimentalist, I never felt like the physics subject GRE was a good test of what might make someone successful in a lab. However, I imagine that theorists might have a different take on that and be less likely to want to abandon the test.

Douglas Natelson said...

Brian, I do think that theorists are generally strongly interested in metrics that particularly assess mathematical preparedness. One issue is self-selection - it's pretty rare that people who say they are strongly interested in theory don't already have other indicators to show an aptitude there (e.g., grades, advanced courses, rec letters).

gilroy0 said...

Pizza Perusing Physicist mentioned the AP Physics exam. It's important to note that half of the AP exam is also rapid-fire multiple choice. Also, the open-ended questions are hard to get right -- there is usually at least one that generates tremendous controversy at the reading -- and, due to the need for standardized grading, often unsatisfying. Making the rubric usually takes a full work day involving a hundred professors and teachers, and people are often still not happy. Finally, despite literally decades of trying to craft questions that measure lab skills and experimental ability, every year the lab question proves challenging and divisive -- when it isn't outright ludicrous.

Standardized tests that are fair, effective, and, well, standardized requires a huge investment. This would exacerbate the expense issue raised above.

As an aside, was the poster a physicist perusing pizza? Or a pizza that looking over physicists?

Pizza Perusing Physicist said...

Gilroy0, I don't doubt anything you are saying, but the question is, would the proposed reforms not at least be better than what we have presently? There's no way we can ever have a standardized test that is perfect and that makes everyone happy, but that doesn't mean we can't keep trying to do better.

Regarding the aside, I am a physicist who loves perusing pizzerias, as pizza is my absolute favorite food, which might explain why I went to New Jersey for grad school.

Anonymous said...

Maybe I'm outing myself by saying this, but...

I don't remember taking the general GRE.
I know I didn't take the physics GRE.

I still have my PhD from Rice.

:P

Douglas Natelson said...

Anon@9:42: We've always required the general GRE. We did do an experiment for three years back around 2010 (IIRC) of not requiring the subject exam. In the end, we reverted at the time because some faculty did feel like it was valuable (though those folks have revised their views in light of the new data) and because most students were reporting the scores anyway since it was still required at a very large percentage of competing programs. Perhaps you were admitted within that window?

Grumpy said...

This comes up all the time in my dept too, often with theorists backing subject test and experimentalists being unsure. We've done a similar study to what you describe at Rice and found that our "subjective assessment" score is best predictor of whatever we defined as success. GPA was slight correlation. GRE subject, GRE general, and our numeric assessment of rec letters were essentially uncorrelated.

So then what exactly is it we see when we do this "subjective analysis" and why can't we codify it into a more rigorous (if complicated) metric?

And if nothing correlates then why don't we just admit all of the female/URM/first-gen applicants and hope for the best?

I think the answer to that last q is that we probably should take more risk with those students, but if we did admit everyone of a certain category blindly we would end up finding some GRE correlations.

Will be interesting to see some studies of GRE correlations after GRE-blind admissions...

Douglas Natelson said...

Grumpy, my colleague had actually done a similar analysis about undergrad admissions, involving a deep dive with our undergrad admissions office. The idea was to look at students after they'd graduated that we had identified as particularly excellent, and see whether there were any commonalities/predictors in their undergrad applications that might be markers to watch for in the future.
Interestingly, the one aspect of the undergrad admissions that really correlated well with the students that we'd marked down after-the-fact as outstanding was a seemingly subjective "does this student seem to have their act together overall" rating.

sylow said...

How about class rank? Someone with a GPA of 3.8 (class rank first) could be a better choice than with someone with a GPA of 3.9 (class rank fourth). Basically, how can you assess GPA as a raw number? It needs to be judged relatively.

I guess the problem boils down to how to define success in graduate school. So if someone obtains a PhD in 5 years and gets out, can we safely say he was successful? There needs to be certain criteria to determine success. It is an entirely different experience than undergrad education where students take standard courses and exams. Once you pass those exams, you are done basically. Here, everybody is different. There is nothing standard and you are never done. Even graduation criteria are obscure.

I know people who came to USA without knowing any English and became professor at MIT (Mehmet Toner) so what do tests or grades tell us about someone's potential as a researcher? Not much... So how do we judge the applicants? Any undergrad research project that the student actively participated may be much more useful than those test scores...

Michael (mbw) said...

I don't have a strong feeling about requiring the GRE-P, which I never took. (My wife says people will see that as evidence it should be required.) I do have strong feelings about incompetent use of statistics. The anecdotes fro individual programs are very strongly biased by collider stratification (compensatory effects) to give almost no signal for various predictors. This is discussed, with relevant references, here: https://arxiv.org/abs/1902.09442 as well as in Alex Small's paper. The former also includes discussion of the disingenuous reply by Miller et al. to criticisms. E.g. that when it was pointed out that their results had major collider stratification bias they responded by increasing the stratification!
It's really embarrassing to see smart physicists talking stupidly about statistics, like hearing smart statisticians talk stupidly about global warming.