Whole-school Bonuses in NYC…It’s Complicated
Columbia University PhD candidates Serena Goodman and Lesley Turner prefer individualized teacher bonuses over whole-school bonus plans. That was the only conclusive evidence I could find in their article describing their work on the aborted whole-school bonus program at the NYC DOE. Now the Rand Corporation has revisited the experiment, suggesting in the fine print that it failed because of inadequate teacher buy-in and competing accountabilities. Too bad they did not stop there.
New York City’s Department of Education has scuttled an experiment that paid merit bonuses to staff for whole-school performance. The DOE suspected that the program was ineffective and ended the bonuses less than two years into a three-year plan. A couple of Columbia PhD candidates seemed, at least on the surface, to confirm that perception. However, too many factors were in flux to draw any real conclusions. Further, the researchers spent more time suggesting support for an alternative (unstudied) program than they did critically assessing the flaws that made the whole experiment invalid. Their persistent attempts to draw an untested conclusion seemed inexcusable. That was before I read the Rand Corporation’s research brief, What New York City’s Experiment with Schoolwide Performance Bonuses Tells Us About Pay for Performance. Rand repacked the flawed study with some attitude surveys and spun it for release as legitimate research. Sadly, rapid electronic dissemination of their fictional findings has transformed them into virtual reality.
Whole-school merit pay failed to drive a statistically significant change in outcomes in the New York experiment. This absence of a result occurred in a truncated time period, under unstable conditions, and using an unreliable measurement tool. This form of merit pay deserves reconsideration under more reasonable conditions. The issues…
1. The objective of the study did not match the method.
The study sought to evaluate the impact of whole-school bonuses on motivation to achieve student outcomes in effective schools. Instead of conducting the experiment in stable, effective schools, the study group was chosen from the most disadvantaged schools.
2. Bonus-related motivation was obscured from the onset by externalities.
The system-wide New York City accountability system was implemented simultaneously with the whole-school merit pay experiment. These new public report cards became the basis for high-stakes decisions across the NYC DOE that could lead to principal firings and school closures. These factors held the potential to overwhelm the impact of the bonus money on differential performance. The researchers discarded this factor, suggesting that the new NYC DOE accountability was barely noise in the environment compared to NCLB.
Teachers did not have a clear idea of their bonus potential. In each school, a committee of four administrators and teachers decided, after the fact, how to distribute the bonus money, if earned. Individual bonuses varied from $200 to $5,000 with few limitations on the gang-of-four’s discretion.
The results were measured using an unstable tool. The NYC accountability reports, which formed the basis for merit pay, were new and unfamiliar to many employees. Also reports were subject to variability in NYSED test scores. Standardized test scores had been rising across all populations during the period studied, suggesting that the tests themselves were becoming less rigorous. Control groups could have seen their performance rise regardless of motivation.
3. The study sample was not representative of the population.
Random selection did not yield demographics that were representative of overall populations in schools. In fact, bonus pool schools had a higher percentage of students whose learning was complicated by special needs, English language proficiency, and poverty. They had more minority students, less experienced faculty, and higher absenteeism. In addition, they had a history of lower than average test scores in math and reading.
4. Performance measurement was not standardized between the study group and the control group.
Bonus pool schools with lower test scores were required to make larger incremental improvements to meet accountability goals than their counterparts in the control group.
5. The time horizon for the study was too short.
The program was implemented late in the first year – announced in November with accountability for test results beginning two months later in January – and ended in the second year. This is not adequate time for authentic behavior change.