Another dirty little secret of test-based measures

New York State’s department of education recently reported that approximately 20% of students in testing grades refused to participate in this year’s state assessments, the high-water mark thus far of the opt-out movement. Among the various stories and arguments flowing from that is the argument that 20% refusal is easily over the threshold of non-participation that invalidates conclusions drawn from testing. This is an argument made by both proponents and opponents of test refusals.

Here is the dirty little secret of the existing system of accountability: plenty of measures already are working on less than 80% coverage for many schools. Three come to mind this morning, with only one cup of coffee in me:

  • The federal graduation rate definition excludes students who move away from a school after ninth grade — thus the adjusted-cohort part of adjusted-cohort graduation rate. In areas with high student mobility, well over 20% of the original cohort will be uncovered in the graduation rate. (Are those mobile students truly counted somewhere? No one knows.)
  • Value-added algorithms require multiple years of test-score data. Again, with high student mobility, plenty of schools and a higher proportion of teachers have value-added indicators that come from far fewer than 80% of the students taught in a year.
  • In many states, even cross-sectional test data is limited to students who attended a particular school for at least a good part of the school year. High-mobility schools have far more than 20% turnover in a year.

If 20% nonparticipation is a measure-killer, we need to worry about far more than New York state’s accountability indicators.

What can happen with missing information in a statistic? At least two things:

  • Bias from the nonrandom nature of missing values. If the participants are different in fundamental ways from nonparticipants, any measure will not reflect the complete population.
  • An incorrect assumption that a statistical estimate is more accurate than it truly is. This is an important finding of the research of Donald Rubin in the 1980s: correcting for missing values generally leads to larger standard errors for statistics.

There are other problems, but these are the main ones to keep in mind.