How to think clearly about that clever econometrics policy paper

A late August NBER working paper by Joshua Goodman, Oded Gurantz, and Jonathan Smith argues that if every high school student took common college admissions tests twice, that would shrink the income-relevant college enrollment gap by 20%. New York Times reporter Sahil Chinoy wrote up the story, and the headline repeated the eyebrow-raising import of the paper’s conclusion: “A Surprisingly Simple Way to Help Level the Playing Field of College Admissions.”

Chinoy quoted only one person other than the authors, and the story came out the same morning as the paper. One could be excused if the NYT story felt a bit like one of those “imminent medical breakthrough” stories about various cures that are just over the horizon. Many of those imminent breakthroughs never make it through medical trials, and one might wonder, is it really true that just taking the SAT or ACT a second time would have such a dramatic effect if every high school student did it?1

Spoiler warning: the research is clever — the paper is very clever — and yet state policymakers and those who advise them should not assume that steps to encourage test-retaking would have dramatic benefits in either equalizing enrollment across the state or boosting a state’s college enrollment.2

I have written before about the strengths and weaknesses of tightly-constructed research on policy effects. Especially when thinking about the first such good paper with a narrow question, I have three general questions, and in this case the answers to those questions point in somewhat different directions from what the paper’s authors say. Here are the questions, and then I’ll explore each in a little depth.

  1. How does this paper’s findings compare with related empirical research?
  2. In the thought experiment where this idea is applied universally, what are the potential consequences?
  3. What choices would have to be made to implement this paper’s (implied) suggestions?

Each of these questions asks how this individual paper and policy idea fits into a broader context, but in different ways: the context of existing research, the context of consequences, and the context of practical policy choices.3

The context of existing research. There is a basic question one can ask of most empirical policy research: How does this compare with the existing related research? Since the paper on retaking admissions tests is unique in several ways, I can ask a variant of this: Is there good research in the “local neighborhood” of this policy idea, other work that can give us some ideas of how this research fits into other good research on likely effects?

It turns out that not only is there such a neighborhood of research (cited by the paper’s authors), but it tells us something very important about the idea of encouraging the retaking of admissions tests: this retaking-the-admissions-test idea is a refinement/tweak of something many states have done in the past two decades. About a quarter of states now require that high school students take a college admissions test, generally replacing what had previously been a state-specific test in high school. Pushing many high school students from a minimum of zero admissions test scores to a minimum of one set of test scores is a concrete policy outcome, one that is the effect of policies in states such as Michigan (which mandated the ACT in 2006-07). Going from a minimum of one set of scores to a minimum of two is a tweak of the first idea.

And there is good research on the consequences of requiring at least one admissions test! Judith Hyman estimated that Michigan’s mandatory test-taking requirement4 boosted college enrollments in the state by 1.4% and by 1.6% for poor students in the state.5 Daniel Klasik estimated that admissions-test requirements boosted college enrollment probabilities by about 10% in Colorado and Illinois and by about 1% in Maine, but the estimates for Colorado and Maine were far too noisy to draw robust conclusions. Michael Hurwitz and colleagues estimated that Maine’s mandatory SAT test-taking requirement boosted enrollments in four-year colleges by 2-4% in general. (These are percentage-point increases, not proportional increases for high school students who would not otherwise attend college.)

The estimates cited above are about statewide increases in enrollment, which involves a mix of high school students who would have taken the ACT or SAT anyway, on the one hand, and also those who only took it because of the mandate, on the other. Another question to ask about an admissions-test mandate is how it affected just the students who took the test because it was a mandate. Hyman estimated an 18% four-year-college enrollment increase for Michigan students who were induced to take the ACT because of the state admissions-test mandate. Hurwitz et al. estimated an enrollment boost in Maine of about 10% among students who would not have taken the SATs but for the requirement. And this is the question that Goodman et al. really address — they estimate that retaking the SAT shifts enrollment from 2-year to less-selective 4-year institutions and boosts enrollment overall by about 6.5% in general, by about 12% for lower-scoring students, 6.1% for low-income students, and 1.2% for underrepresented minority students — specifically for students who move from taking an admissions test once to taking it at least twice. So Goodman et al.’s estimate of the effect of retaking the SATs is somewhat lower than the estimate of the effect of taking an admissions test once in Michigan or Maine as a result of state mandates. The overall estimate of 6.5% boost in enrollment is a little larger in comparison with either the 10% or 18% than I would expect from this type of behavioral tweak. But we are talking three studies, no matter how well-conducted, and there is some uncertainty about how to generalize from one-state analyses (Maine or Michigan). So… yay, evidence of the effect of restesting! But very possibly not at the scale claimed in the working paper.

Looking at it from a state policymaker’s perspective — what is the effect at the scale of a state? — it is important to understand that the work paper focuses on the effect of moving from one test to more than one test for individual students rather than the likely effect on enrollments across a state. That population-wide effect will depend directly on the proportion of students who do not retake an admissions test — that is, how many people could be affected in the first place by encouraging retaking. For states, there would be a big difference in the effect of encouraging retaking when 90% of students already take it twice and only 25% do. According to Goldman et al., about 54% of students who took the SAT once took it at least twice, and 45% of ACT participants were retakers. Those are national percentages, and I would expect state-specific percentages to vary, especially among high school students who are currently less likely to attend college.

The context of consequences. The Goodman et al. working paper is like many narrow policy analyses in that it is great for answering a focused question, what is the consequence of retaking admissions tests for the retakers?, and less appropriate for answering broader policy questions. From Michelle Fine I learned the universal-application thought experiment (aka the magic-wand question): Let’s wave a magic wand and imagine X happens. Does that mean that Y automatically happens?6 In a thought experiment where a policy idea or specific target is applied universally, you can explore what happens in a broader system. What are the potential consequences, especially those unanticipated by advocates of a policy or a particular way of defining a social problem?

Part 1: Thinking through the desired consequences. To the credit of Goodman, Gurantz, and Smith, they are transparent about this:

[W]e ignore general equilibrium effects that might result from population-wide increases in retake rates, such as colleges raising admission standards in reaction to a higher-scoring applicant pool. (p. 20)

Okaaaay, maybe that language is not transparent to non-economists, but it essentially says, We’re not looking right now at every last piece of that thought experiment where we actually get what we want. And they have anticipated the most likely cocky response to their paper: “Hey! Colleges will just raise their standards when high schoolers have higher test scores on average.”

The value of the magic-wand thought experiment is not in such cocky responses, but serious exploration, and let me apply it in this sense: In a world with universal retaking of admissions tests, where does that change make a difference, and in what ways?7

The cocky response is partly correct: some highly-selective colleges have capped admissions and will respond to higher test scores on average by… well, by being highly-selective. The Goodman et al. analysis is consistent with this argument; retaking tests did not increase enrollment in the most selective colleges and universities.

But there aren’t many highly-selective colleges in the United States. There are many colleges that are nonselective, mostly but not entirely community colleges. And there is an increasing number of public colleges and non-flagship public universities that are effectively nonselective: in an era of declining young-adult populations, non-elite colleges and universities are going to be less selective even if the admissions requirements include test scores. For these colleges and universities, retaking an admissions test does not change the admissions decision. Finally–and most relevant to this research–there are colleges and universities that are moderately selective, where for a variety of reasons they care about admissions test scores but can accommodate more students.

That leaves an interesting set of institutions and people where retaking an admissions test might matter — what we might call zone(s) of plausible benefit:

  1. Those (mostly-public) colleges and universities where admissions tests matter, but where admissions are not highly selective and where there is capacity to take additional students. Here, retaking a test truly can change an admissions decision.
  2. Those colleges and universities (regardless of admissions decisionmaking) where institutional aid is partly dependent on admissions test scores, and where there is capacity to fund an additional student. Here, retaking a test may or may not change an admissions decision but could change a funding decision.
  3. Those high school students where a higher admissions test score will trigger more marketing contacts from colleges and universities, or more encouragement from counselors and others on applying to colleges. Here, retaking a test may change how a student and the student’s family see college as an option.8

This is an odd list at first glance: the first two zones are donut-hole sets of institutions, and I doubt the second category has many (and possibly none). And then the last zone is not about institutional decisions but student and family attitudes and responses. But higher education often involves a compendium of odd lists, and since most students attend college within their state and close to home, it is important in policy questions to think in terms of specific zones of benefit as well as generalities.

From the first zone we might expect that taking the admissions test twice would make a difference in getting into the University of Missouri (with a decline of 4,000 undergraduates from 2015 through fall 2017) but maybe not the U of Iowa (where enrollment has been fairly stable).9 In general, the moderately-selective state universities in regions with declining young-adult populations are where this might make more of a difference, for in-state students. But those universities could also just admit everyone who is qualified and not worry too much about SAT or ACT scores.10

The second zone is probably mostly empty: which colleges and universities do not try to give out every dollar in aid that they can?

The last zone of benefit is even more fuzzy than the prior two: which students are just on the bubble of being contacted by colleges, or being encouraged to apply to college by high school counselors, where that extra attention would result in enrollment in a four-year college or university, and where their first admissions test did not bump them up in terms of attention?  This is essentially a category where the first roll of the dice on an admissions test did not end up with more attention, but the second might. This is an implicitly small group, unless admissions tests are so incompetently constructed that they leave large numbers of students who should attend college just on the low side of a line where they would get marketing material or counselor attention.

The last category in turn suggests an alternative policy to explore: make sure that some four-year college in a state contacts every single high school junior. Don’t wait for admissions test scores; just push out contact.11

Part II: Thinking through the unexpected/undesired consequences. Last week I was talking with a graduate student about this paper, and she responded to the suggestion of universal test-retaking with the adult version of eyerolling: “Oh, great, take up even more time of students with testing!” Encouraging/mandating retaking of tests would certainly involve more test-taking, and for those concerned with the amount of tests that students take, the suggestions of Goodman et al. ignore the time taken with testing and test-prep that would not improve college enrollment… and that would be true for the majority of those pushed to retake tests.

What about unintended consequences? As explained above, I see retaking a test s a tweak of the common practice of requiring at least one admissions test administration. It is possible that universal retaking of admissions tests has some unanticipated consequences, but they are likely to be small since the primary policy of pushing admissions testing in the first place is likely to have already created some unintended consequences.

The context of practical policy choices. Policies do not stand alone, even when an analysis focuses on a narrowly-defined choice of yes or no on a specific issue. A legislature, school district, college, or any institution has to decide how to focus energy and spend money. Often, a proposed policy solution competes against other ideas that address the same basic issue, and while sometimes you can combine ideas, that is not always true.

The core question is, what choices have to be made to implement this paper’s suggestions? There are two components of this:

  • What are the potential policy choices to address the target of the proposal and related issues?
  • What are the costs of making this particular choice? This is partly about direct costs and comparative cost effectiveness, something that the Institute of Education Sciences will fund going forward. But it is also a matter of other types of opportunity costs, such as the time and effort that goes into policy implementation or the political capital required to enact a policy.

At least in theory, funding an admissions test retake is not that expensive in terms of direct costs, especially with waivers that the ACT and SAT provide for low-income students. Would it take much time of adults to push out? Possibly, but given that this is a policy tweak, it’s probably not much more on top of existing state policies that require that high school students take an admissions test once.12

I am more worried about the political opportunity cost, the potential way that this could monopolize discussion of higher-education policies in a state legislature or divert attention from other policies with more potential impact, especially a committee that focuses on higher-education or broader education policy. Many state legislatures have short annual sessions, and there is scant time to run legislation through a rational discussion at length about the benefits and disadvantages of well-vetted proposals. Often, the active discussion each year is on a limited set of issues, and something that becomes the focal point can take up all of the figurative oxygen in the room. In most years, the budget process takes up a good chunk of the available attention. After the budget comes whatever is on the priority list for the leader of each chamber, followed by the priorities of powerful legislators such as committee chairs. In states with term limits, committee and legislative leaders have little time to accomplish their priorities, and that shrinks available attention even further.

This limited capacity for deep exploration of policy in many states should not be an excuse to avoid advocating for effective, simple policies. However, it is a reason to be wary of standalone proposals that are likely to be low-impact in a particular area. In this case, which is a more critical target for a state, the group of low-income high school students who do not retake admissions tests, or students who apply to college, are accepted, and never attend the next year — so-called summer melt? To me, that group is a more urgent target than the idiosyncratic zones of benefit affected by policy encouraging test-retaking.

This suggests a yes and approach to this type of policy proposal. If you are a state policymaker thinking, “Hey, it’s in the New York Times! Why not try encouraging retakes in my state?” I would say, “Yes, and what else are you going to do to help high school students enroll in college?” And if you hear a state policymaker being attracted to this standalone idea like a moth to the flame, keep saying yes and until there is more on the table than pushing teenagers to retake admissions tests.

Bottom line? State legislators should probably not look to test-retaking as providing more than marginal improvement in college attendance. There are a few cases where it might make sense: in regions around underenrolled state colleges and universities that are moderately selective, especially in states where a low proportion of high-school students retake admissions tests. But that’s an oddly-defined zone of benefit. In many or most states, universal retaking of admissions tests is not likely to move the needle all that much. Go ahead and toss it into a comprehensive package to improve college enrollment, but do not rely on it as a standalone policy.

Kudos to authors. I would be remiss if I did not repeat my impression that the paper is very good. The fact that it is a working paper just means it hasn’t been accepted yet in a refereed journal, and yet is the operative word, for it surely will be. This is a clearly-defined topic, the analysis cleverly designed and well-presented, and that is the only basis on which I could have written this blog entry. Well-done!

If you enjoyed this post, please consider subscribing to the RSS feed to have future articles delivered to your feed reader, and sign up for my irregular newsletter below!

Notes

  1. The conclusion in this working paper does not suggest requiring a retake but speculates on various mechanisms to encourage retaking admissions tests. []
  2. There is a reason for this conclusion at the beginning: This blog post has been brought to you by the letter We Can Torture Finer Points of Analysis Until They Bark Like Chihuahuas. Also, my thanks to Joshua Goodman, Oded Gurantz, and Daniel Klasik for the Twitter conversation that helped clarify some things for me. []
  3. These are not the only relevant areas where one might want to think clearly about empirical research results. To pick three other examples, one might also want to explore how we define social problems, the implicit social mechanisms at play, and uncertainties about both the study’s results and applying ideas outside the study context. []
  4. The requirement was at first the ACT, now the SAT. []
  5. Hyman’s paper is very detailed and worth reading front to end; these specific estimates represent tiny slices of the analysis to give a sense of the magnitude of the effects of an admissions-test-taking mandate. []
  6. As Fine put it, wave a magic wand and imagine that all teenagers graduate from high school… or these days, college. While we have the wand, let’s also imagine that they graduate with all the skills and knowledge we’d like them to have. Does that mean that everyone gets a job? []
  7. If you wish, you can imagine Don LaFontaine saying this. []
  8. This is somewhat close in concept to Hyman’s finding that Michigan high school students in the middle of predicted admissions-test-taking propensity gained the most in terms of enrollment. []
  9. Arizona State University admits everyone with a B average in relevant high school coursework, and we do not cap undergraduate admissions, so retaking the SATs or ACTS does not change much about the admissions process. []
  10. Yes, I am showing my ASU colors here. []
  11. Evaluating this idea would require a small experiment about the value of universal information, in line with some experiments in the recent past regarding financial-aid promises — the presentation I saw in 2016 relied on a physically-mailed packet. It has not yet been released, but the design is simple enough to apply in this case. []
  12. Unlike a basic admissions-test mandate that generally substitutes the ACT or SAT for a high school achievement test, support for retaking admissions tests does not compensate for the costs with any savings. []