At one level, the most recent debate over federal elementary and secondary education policy is about annual testing: will the next rewrite of the federal Elementary and Secondary Education Act continue to require that states test public school children annually in grades 3-8? The draft bill released by Senator Lamar Alexander (TN) suggested he was considering two possible policies: a continuation of annual testing and a requirement for so-called grade-span testing, where children are tested once in each grade span (elementary, middle, and high school years). In September I discussed the debate over sampling instead of every-child testing, and I’ll extend my analysis there to the current debate: the argument that we must test every child every year will win out. Whether or not annual testing provides all the benefits that its advocates claim (Marty West probably had the best argument in favor of that at the Senate hearing last week), annual testing is a sort of accountability theater, and that symbolism alone will make the difference.
At another level, though, the debate over testing mandates is a projective test of attitudes towards data. How important are the inherent flaws of any system of data collection and use? In one part of education policy, you may admire the willingness of people to take mild or extreme liberties with data sets, to boldly go where no analysis has gone before.1 In another, the inherent weakness of someone’s proposal to gather less data is awful, horrible, a step back, weak, retrograde,… am I missing any pejorative adjectives? To wit, some advocates on either side of the grade-span vs. annual-testing debate are also on opposite sides of the debate over the usefulness of value-added modeling. The problem is, they’re not on the same side on the two issues if the question is what standard or rigor we expect. If you defend the algorithmic use of value-added modeling in teacher evaluation because we must do something about teachers!, and then are in favor of annual-testing requirements because of the inherent flaws in grade-span testing, you may well have a higher standard for collecting student achievement data than in its use. Likewise, if you criticize the formulaic use of value-added models because the causal reasoning is too slipshod, and then you want to go to grade-span testing, you want to hold analysts to very high standards, but, hey, let’s have less data.
This isn’t exactly White Queen territory, but it is an irregular verb: We’re clever with data; you don’t understand the limitations of what you propose; they’re lying with statistics.
- What would have been Captain Kirk’s career as an econometrician? [↩]