Before the New York City Department of Education released the name-and-number spreadsheet on the now-defunct Teacher Data Reports, I wrote and released grading criteria for news coverage. And now, with more than a week of coverage from a number of outlets, the grades (limited to major outlets where I read a critical mass of coverage).1
Gotham Schools: A
Gotham Schools climbed into the A-C range by declining to publish specific teacher names. Here are some of the stories GS published on the TDR dump to earn the A:
- Perspectives from a teacher, parents, Dennis Walcott both with caveats and then backing away from caveats, and UFT head Michael Mulgrew (in situ).
- A corrective follow-up to the New York Times reporters’ superficial analysis of score distributions, explaining why the statistics were structured to leave the impression ratings were evenly distributed across schools.
- A piece putting the TDR and the release in a national context.
New York Times: D
The New York Times earned a D because it published teacher names and scores. It avoided an F by including some indication of margin of error–not very smartly, because the point estimate had two nonzero digits, which is inconsistent with the brontosaur-wide error bands. But if I am not going to require perfection for an A, why should I quibble too much with work that is no higher than a D? NYT reporters added plenty of context, sometimes smartly and sometimes foolishly (see the superficial-analysis article linked above, including the GS response). My bottom-line judgment: so much effort, not enough value added.2
New York Daily News: D
New York 1: D
Wall Street Journal: D
Publishers, producers, and editors of the Daily News, NY1, and the Wall Street Journal comprised the rest of the herd in publishing teacher names and scores but avoiding the worst of tabloid coverage and displays. The WSJ has taken a different tack from others in displaying the labels (high, above-average, etc.) rather than quantities. The central problem of such reports is that when the data are so untrustworthy, there are no good options for displaying them with integrity. Quantify and you falsely imply precision (e.g., the too-many-significant-digits issue). Use categories and you imply the labels are applied with precision.
New York Post: F
What can you say about the New York Murdoch tabloid? For several days the NYP only displayed point estimates with no margins of error, and in several cases the paper wrote an entire story about a single low-rated teacher, which qualifies as sensationalizing. Either of those would have earned the paper a failing grade.
Observations on the TDR race to the bottom
I am struggling with explaining why the publishers and producers of major news outlets in New York took the bait Joel Klein offered more than a year ago, despite clear evidence that a broad range of people in journalism and education thought publishing individual teacher ratings was a lousy idea. The major dailies and NY1 continued down that path through the long court case and after Klein’s departure, even after editors knew or should have known that TDR was a pilot, fragile, and based on tests revealed to be problematic since the original NYCDOE-UFT agreement creating the pilot. The easy answer is that the TDR coverage was the equivalent of nightly-news “if it bleeds, it leads” practice. But I think it’s more complicated, since several of the outlets had plenty of time to have extensive internal discussions before the release, and there were plenty of options between “publish everything with names” and “publish nothing.”
It is important to keep pushing on this story, because unlike the Los Angeles Times VAM publishing project, we have several news organizations and a longish period of time in which decisions could have been made, reversed, and so forth. The Columbia Journalism Review article from March/April 2011 should not be the last word on this, because this is likely to be a continuing ethical dilemma with large data sets and journalism. I have seen at least one comparison between the NYT coverage of TDR and the Judy Miller Iraq-WMD scandal, but it feels to me to be a different challenge to the Times‘ integrity, even if it is the same publisher (“Pinch” Sulzberger) who made the misjudgments. On the other hand, in both cases, some form of heady rush (pre-war coverage or data on public employees) trumped caution on ethics and professionalism. Sadly, it looks like bad data are publishers and editors’ crack cocaine.
I have seen a few efforts nationally to spin the coverage in NYC in different ways, from “well, it got a conversation started” to “Gates’ op-ed in the NYT was an Overton Window thing” (paraphrases, not direct quotations). Folks: please stop this effort to spin. Just. Stop. When people with widely-varying views of value-added measures all think publishers and editors at the Times et al. jumped the shark with their coverage, there is no obvious effort at spin that doesn’t come across as tenuous reasoning. The coverage mostly stank, it bodes poorly for how news organization managers respond to data dumps in general, and that’s about all to be said for it for now.3