While I am on the theme of data overinterpretation this week, two recent surveys that have been overinterpreted:
1. The recent Quinnipiac University Polling Institute survey of New York City residents, which included a number of questions on school politics in the city.
The New York Times’ blog entry by Winnie Hu started, “A majority of New York City voters approve of the public release of ratings for thousands of public school teachers, even though a plurality of voters believe that the ratings are flawed, a new poll has found.” (This is fairly representative of the coverage of the poll.)
The problem is that the question from the poll did not ask explicitly about the ratings released in the news. Here is the question (the first of five):
25. Do you approve or disapprove of releasing to the public the results of evaluations of public school teachers?
Maybe we need a primer for poll-question writers and reporters, but rating is not the same as evaluation. Advocates of using value-added statistics keep saying that a teacher’s evaluation should not consist just of test score measures, and yet there is no one who has yet complained about the wording of this poll question. All five of the related poll questions used the term “evaluation” rather than “rating,” but it is only the last one that would clarify the term refers to the VAM ratings. Those who also read the news about the NY state teacher evaluation policies or the agreement with UFT to allow appeals and validators in NYC teacher evaluations could have thought question #25 on the poll refers to the annual evaluations, not the VAM ratings. I think the response percentages thus are tough to read clearly without the firm having tried alternate wording. Mickey Carroll, who is the polling institute’s representative on NYC polls, responded to my question on the matter with, “You make a good point but, on balance, I think the quesions work.” So I guess the poll didn’t test different phrasing.1
The publicity on the MetLife survey has focused on the top-line decline in the number of teachers who reported being “very satisfied” with their jobs. Teacher-blogger Ken Bernstein is fairly typical in reading this as partly about the effects of the recession and partly about overreaching on accountability. Well, to be honest, he’s better than most in saying there are multiple things feeding into what the MetLife survey reports as a 17% decline in the “very satisfied” response. But after looking at the survey report, again I wonder how much we can draw from the results.
This is not a case of random error. While MetLife did not report standard errors (why not, MetLife???), because there were 1001 teacher respondents, we can be fairly sure that the 17% decline is real: for a 44% statistic and 1001 respondents, the 95% margin of error is +/- 3%.
But as with the Quinnipiac poll, the MetLife poll has a pattern of questions that limit my willingness to trust the responses are what the polling firm claims. In this particular case it is not about the question wording but the sequence of questions that came just before the item about job satisfaction. Out of the prior prior 22 questions, 15 asked about budgets, layoffs, and other issues that at least a substantial minority of teachers answered as indicating increasing problems in their schools. If you had just been asked more than a dozen questions that required you to think about what had gone wrong in your workplace recently, how would you answer a question about job satisfaction?
I could quibble more, but in each case, I think there is less there than meets the eye.
- To my friends who support 40%, 50%, or whatever of evaluations coming from value-added measures, if you really want me to believe you have any sincerity about the “oh, no, test scores shouldn’t be everything” line, you may want to be quiet about the Quinnipiac poll results unless/until you have complained about the wording. [↩]