What type and level of failure can we tolerate in evaluation systems?

Bruce Baker points out the flaws in the Brookings white paper on evaluation, "Passing Muster" but I think there needs to be a broader take on this. Baker properly sees the circular-reasoning aspects of the Brookings approach (everything but value-added measures is judged by value-added measures, which are judged by … how well they are internally consistent). The problem is that every single personnel evaluation system is flawed if you're looking for pure technical capacity. Give a set of smart postdoctoral researchers a personnel evaluation system intended to be used for summative purposes (i.e., hiring/firing, etc.), and I bet they can poke at least 20 holes in it within an hour. 

Employers can't wait for the perfect evaluation system, and given my cynical nature I think most of them are so far below the Lee Shulman idea of a "marriage of insufficiencies" that most of them are best characterized as marriages of incompetencies. But I don't think the flaws are generally technical; they are political flaws in the general sense (not the partisan sense). Any evaluation system has an implicit theory of action on what you do with employees. You can design a system that leans towards retention, leaving too many employees in place when there should be something to intervene to help the employee or counsel the employee out. (In schools, it's very important to keep in mind and remind people that most new teachers are under extraordinary pressures and think of leaving at some point in the first few years, even if they stay. "Annual evaluation leniency" isn't the systematic problem with managing new teachers that many think it is, or at least it's far from the most urgent priority.) You can design a system that leans towards dismissal, kicking too many employees out regardless of their level of competence. You can design a system that leans towards inconsistency, where the idiosyncrasies of individual supervisors dominate decision-making. (That appears to be Rick Hess's preference.) And the way those options dominate real systems leaves little room for a good system of evaluation. Yes, you can have appropriate evaluations in insane systems. But since a lot of the rhetoric revolves about "human capital" strategies for school districts and states, I don't think a proposal for a system should be allowed to live just because a number of supervisors are decent human beings who are smarter than the paperwork.

I know there are examples of better ways of evaluating teachers. They tend to be very local (as in Toledo) or very expensive (as in the Gates Foundation grant supporting peer evaluation in my area). But without extraordinary effort and luck, you're talking primarily about what level of stupid your evaluation system is. That's where you get systems that leave very weak teachers in place (or kicking around from school to school) for years. That's also where you get systems where (despite all pretenses otherwise) outcomes are determined entirely by test scores; see math-teacher blogger JD2718's discussion of a NYC teacher who was denied tenure (technically "extended" for another year on probationary status), at least on first impression entirely based on test scores and apparently part of a broader pattern in the insane NYC system. 

So, how can we think around this issue without getting trapped in the unimaginative rhetoric I've read for the last few years? First, we can go back to the overlap between some writings in philosophy and management/I-O psychology research that talks about the relationship between (perceptions of) procedural and substantive justice in personnel decisions. Events that you and I might see as procedurally appropriate can result in decisions (substance) that we disagree with. So if the lived experience of teachers and principals or parents is that the substantive decision is wrong in a case, that can trump an abstract agreement with a procedure. So what appears to be the right substantive judgment in the vast majority of cases is a requirement for political legitimacy of a system to those within it. And the reverse is true for cases near the margin: if a particularly hard case is accompanied by procedural screwiness, lots of peers and community members are going to be unhappy with what happened. That doesn't mean you need perfection; but if I were a principal I'd be very unhappy with the practices of a system that led regularly to questions about the justice of a decision in either a procedural or substantive sense. 

If you've worked in a small (and by this I mean a mom-and-pop) business, a lot of this probably seems ridiculous: bosses know those they supervise so well they're comfortable with making decisions based on that holistic judgment. That's if you're in a reasonably healthy job environment; in a boss-from-hell situation, if you have any choice you're likely to leave long before it's annual evaluation nuttiness that's what ends the job. If your main job is something generic such as a programmer with a certain package of skills, you can look for similar jobs. That's not the case with a large number of teachers, especially if their expertise is in an area with relatively few openings each year (and almost none in the middle of a year). So the exit from a particular school for a new teacher is also likely to be the exit from the field. 

So we have a nasty combination of structurally asymmetrical exit/voice with new teachers and high vulnerability to lack of credibility from either procedural or substantive flaws. Add to that the idiosyncrasies of individual administrators who may or may not have the professional knowledge and judgment, paperwork skills, spine, political savvy, human touch, and sense of principle to make good decisions in the right way. (And that can be either in favor of retention or letting go, case by case.) It's enough to seduce a number of generally rational people into throwing one's hands up in the air and give in to the technocratic impulse.

I think that way lies both corruption and a politically vulnerable direction; there are loads of people who are claiming the "reformer" mantle and asserting that no personnel decision is occurring and no one is advocating personnel decisions based on test scores. Such claims are mere bullshit, those who voice them know better, and the political consequences of an expanding set of test-based personnel decisions are not what the self-anointed "reformers" really want. (Remember the technocratic AYP, anyone?)

So, if the problems are essentially political, any solution has to be political as well in the best sense of politics (i.e., handling interests in a practical sense). If peer-based systems are political viable, they're one possible solution because they provide input apart from a school principal (one solution to the potential/reality of both principal-tyrants and principal-wimps). There are other potential partial solutions: one would be giving new teachers greater freedom to move between schools before tenure decisions, to eliminate the monopoly an individual school has on new teachers' careers. (My dream of a required annual rotation of new teachers between schools is impractical in most places, unfortunately, and it would be too different from the current model to avoid seriously disrupting teachers' and principals' internal script for "real teacher careers.") But the best route to better evaluation is not now and never will be the perfection of value-added machinery. That is a fantasy of "reformer" technocrats along the lines of flying cars and robot maids. 

If you enjoyed this post, please consider subscribing to the RSS feed to have future articles delivered to your feed reader, and sign up for my irregular newsletter below!

5 responses to “What type and level of failure can we tolerate in evaluation systems?”

  1. CCPhysicist

    Has anyone studied how private and parochial schools evaluate teachers? I’ve never seen anyone argue that some public school should evaluate teachers the way some local private or parochial school does it. Is this because the latter have never been tested for quality by the state, or is it that they keep principals around for a long time after firing the ones that are incompetent and just rely on them to do a good job?

    PS –
    My idea is to swap teachers rather than rotate them. That is, the lowest performing teacher and the highest performing one should swap classes. Although imperfect, it would be better than simply guessing what needs to be controlled between different schools and classrooms.

  2. Glen S. McGhee

    “My idea is to swap teachers rather than rotate them. That is, the lowest performing teacher and the highest performing one should swap classes. Although imperfect, it would be better than simply guessing what needs to be controlled between different schools and classrooms.”

    This proposal turns the political economy of teaching upside down — you don’t take your best teachers and stick them in with special ed students; instead, you give them AP classes. It also makes the workplace unpredictable for the teachers. No one would go for it.

    Maybe not even the students. And certainly not the parents that can exert influence. Which reminds me — what is the procedure for putting students in classes? If you put a kid in music, doesn’t this track them with similarly situated students? I heard that it does. How much does this factor into performance and performance measures?

    1. CCPhysicist

      Special ed students get teachers with a unique certification, so they would not be included in my proposal. (I have no clue how they can be evaluated with value added testing, either, since there are many more variables.) No, I would expect a teacher to demonstrate the ability to teach something other than better (not even the best) students in a suburban part of the district to earn extra pay over a teacher stuck with students who have a rather different support system at home.

      Contrary to your assertion, my experience was that the top math teachers in my high school taught both the highest level math class AND a class that could be best described as “using fractions so you can work in a machine shop”. They excelled with both student populations.

  3. Glen S. McGhee

    Well, the variance of student performance within a school may undermine the benefit of swapping (new) teachers between schools — especially if ‘new’ teachers are given problem students/classes that the experienced teachers know how to avoid. (On the one hand, using the school as the unit of evaluation assessment makes organizational sense — but from the point of view of the student, it might not.)

    AP teacher assignments, I think, are highly idiosyncratic. The lack of minimum standards statewide only makes it moreso. You would expect ‘gifted teachers’ for ‘gifted students,’ but, hey, now everyone is gifted.