I’ve been able to read through the December paper by Chetty, Friedman, and Rockoff (hereafter CFR) discussed in yesterday’s Annie Lowrey article, and my impressions on the first read are similar to those of Bruce Baker’s:
- This is a serious study with loads of data and clear descriptions of methods. If your first response was to dismiss this paper because it hasn’t yet appeared in a refereed journal, you need to understand that this is publishable or close to it (depending on the journal’s standards). That doesn’t mean it’s perfect or definitive, but nothing is perfect and few articles are definitive.
- My guess at the most interesting contribution for methods is CFR’s new test of potential bias in value-added measures, explained and demonstrated in section 4.4. CFR provided the STATA code for this test, so it’s available to be tried out in other contexts with value-added measures for classroom teachers (hint, hint: someone go try it now in places with sloppy measures!).
- It is important to understand the main findings discussed in the article in the right context: Persistent classroom/group effect scores for the same teacher for math and reading in grades 4-8 are associated with a small but noticeable improvement in some young-adult quality-of-life measures for the students who had these teachers (reported young-adult income, reported college attendance, and reported early childbirth). If you want to generalize this claim beyond the data used for the study — associating the group effect scores with teacher quality more generally, making claims about lifetime income, or extrapolating to policy questions — you are making assumptions beyond what the data support. That doesn’t mean that the assumptions are necessarily wrong, but some readers with an axe to grind are going to leap from the reasonable parts of this paper to Fallacy of Composition Fantasyland and other neighborhoods in Kissimmee, Florida, or Anaheim, California.1
- Like Baker, I am concerned with the relatively narrow scope of the teachers used in some critical analyses (proportion of data used to all the teachers/students in the school system), especially the ones where CFR are trying to distinguish between group effects for a year and a fixed persistent effect for a teacher. CFR has a ton of data, and they ended up using several hundreds pounds of the data. I follow and understand their reasons on inclusion/exclusion rules, and I can’t argue with the choices CFR made, but that limits the generalizability of the paper. Essentially, as a result of their choices they can say that for teachers with enough data where we can find stable effects for the children whose data we can use, there appear to be stable effects we can use to explore long-term consequences of teachers. That’s not nothing, but it’s more limited than you are going to hear by advocates of using any value-added measure.
- Also like Baker, I’m irritated with the spin Lowrey accepted and repeated in the article, about policy consequences favoring summary firing of teachers with very low value-added measures. I think the efficient way of replacing a poor teacher would generally be helping the same teacher improve, but that’s not how Rick Hanushek frames the issue, CFR have unfortunately accepted his frame as the relevant one, and so has Lowrey.
And, yes, in case you’re wondering, I sometimes make choices on Saturdays to read this type of stuff to relax.
- I need to write a separate post to explain the Fallacy of Composition Fantasyland, but let it be sufficient to say that imaginary place includes “estimates” of lifetime earnings changes. [↩]