School improvement providers who want to get a handle on program evaluation should read Education Week’s Debra Viadero’s May 16 article. To your editor, it illustrates that providers need to understand “what” exactly is being evaluated, and how “the what” affects evaluation outcomes.

In 1997, Congress established a hefty state grant program to encourage poor schools to invest in proven, off-the-shelf models for improving learning. By 2006, though, the Comprehensive School Reform Demonstration program—or CSRD, as it’s commonly known—had effectively, and quietly, died. Though still on the U.S. Department of Education books, the program made no new grants to states in that year or since.

Now, along comes a study offering a fitting, if somewhat ironic, epitaph for the program: Models matter….

The federal study, which was conducted by researchers at the American Institutes for Research, or AIR, a nonprofit research group based in Washington, tracked progress in 650 elementary and middle schools, half of which were implementing one of eight different, packaged school improvement models (Accelerated Schools Project, ATLAS Communities, Co-nect, Expeditionary Learning/Outward Bound, Modern Red Schoolhouse, Success for All, Turning Points, Urban Learning Centers), and half of which were using no formal schoolwide program to boost student achievement, between 2001 and 2004.

Funded in 2000 when President Clinton was still in office, the $7.5 million study was completed last year. But the Bush administration, which does not count comprehensive school reform among its priorities for K-12 education, did not publicize it. The findings were published last fall in a report by the research group and again last month in a series of articles in the Journal of Education for Students Placed at Risk.  

In the interest of full disclosure, your editor had direct and sustained relationships with each of the eight organizations and their designs. All but Accelerated Schools and Turning Points were original New American Schools Designs, and they became members after an extensive quality review. All received substantial technical assistance from NAS and/or its affiliated Education Entrepreneurs Fund on the challenge of scaling up to national scope. The Fund had an equity stake in Co-nect and loans with most of the others.

“What" is being evaluated?

The AIR evaluation differed from the typical study accepted by the What Works Clearinghouse and summarized weekly in New Education Economy®.

Generally, the evaluations submitted to WWC took place relative early in the program’s history; involve a small number of control and experimental schools, teachers and students; and cover a period of no more than one school year. Aside from the fact that they tend to occur before the provider has accumulated much experience with the program, there is every reason for the reader to believe that this kind of study should produce the best possible outcomes. The program developer is likely to select schools with favorable conditions for implementation, can assign the best staff to implementation, and devote as much attention to implementation as necessary. Any shortcoming in documentation for implementation staff, or materials for teachers and students, could be repaired on the fly – because the staff assigned to the school was often the development team itself.

The AIR study highlights why the kind of review submitted to WWC may be called a “program” evaluation but only evaluates part of the program – specifically the model, design or process on which the program is built. The evaluation described above tells the reader very little about the provider’s capacity to offer quality outcomes at any kind of scale.

By reviewing some 325 schools with widely different characteristics - chosen by the researchers and watched over five years, the AIR evaluation covers the whole program. This includes the criteria providers apply in selecting client schools; the extent to which program documentation covers implementation comprehensively and accurately; the quality and training of the implementation staff - and their managers; the extent to which the materials, online and/or phone support services provided to teachers and students clarify and reinforce the design; decisions about the mix and amount of implementation services provided to schools; and the quality of providers’ service delivery.

Because it captures all the factors that bear on student outcomes when a provider takes a program to scale, this kind of evaluation represents the real world marketplace of school improvement services. Educators considering a model still need to consider whether a design that performs well here will fit their schools needs, but it offers much more information than the typical study accepted by the WWC.

How “the What” Affects Evaluation Outcomes

The basic article of faith shared by CSR providers is that programs that are not implemented don’t get improvements in student performance. Viadero explains how the AIR study confirmed this belief.

Between the third and the fifth year of implementation, schools that had stayed true to their school improvement models experienced achievement gains at a rate outpacing those of the comparison schools. The boost in achievement was all the more notable because the experimental schools had started out lagging behind the control schools. In the experimental schools that adhered less closely to the program guidelines, on the other hand, test-score gains were no different from those in other schools.

“In short,” one of the journal papers on the subject concludes, “CSR ‘works’ when external models are implemented faithfully and consistently for three to five years.”

The learning gains also varied, depending on the program that schools used. Success for All, a program developed by researchers from Johns Hopkins University in Baltimore, appeared to produce the largest academic improvements of any of the models studied, the study found, and to have schools that adhered closely to its program.

The pie chart below categorizes the extent of school implementation.



The chart also suggests how a wholistic view of program can help providers improve outcomes – very often dramatically and quickly.

Roughly half of the schools in the AIR study never really implement their CSR design (Nonreform, Momentary Reform, Nominal Reform, Resident Reform). Your editor’s own experience with all of the CSR providers in this study, whose total reach today includes literally thousands of schools, suggests that almost every one of those non-implementers could have been identified by objective measures before the provider signed a contract. These potential clients lacked some combination of teacher buy-in, supportive district policies, and adequate funding. Any one of these factors can doom implementation, any combination will.

If the CSR providers in the AIR study had looked for these indicators of likely failure and decided not to contract with these schools, they would have doubled their success rate. It is worth pointing out here that Success for All - the program with the best results, requires that teachers in its prospective schools favor implementation by 80 percent in a secret ballot.

The payoff of this decision would not only be measured in student performance. In every case providers found that they spent more time than budgeted on these clients and so lost money – even before the costs of customer acquisition are factored in. And poor outcomes at these sites made it harder for marketing and sales to sign up new schools (i.e., cost more).


This discussion covers the “front end” of program. Looking at the half of the schools that did succeed in implementation for some period, provides an opportunity to talk about the “back end.” Roughly half of theses schools failed to maintain their model.

Whether or not providers should be accountable for these schools depends on what they promised when they entered in to the contract relationship.


Some CSR providers offer to get their client schools up to a point where the model is self-sustaining. Assuming the schools in question completed the initial implementation period, these providers are responsible for all the schools – those that managed to maintain the design (Sustained Implementation, Sustained Implementation Without) and those that did not (Transient, Temporary).

Many CSR providers recognize that normal staff and principal turnover, and changes in district policy erode their model continuously. This group requires a constant service relationship between the school and the provider. These providers are not responsible for schools no longer under contract. This simple difference doubles these providers success rate in this study.

The bottom line:  Providers need to understand what is being evaluated and how management decisions at the front and back ends of their programs affect evaluation outcomes. It's not rocket science.