"Effectiveness of Reading and Mathematics Software Products: Findings from the First Student Cohort", a report submitted to Congress by the Department of Education's Insitute for Education Sciences, but conducted by the program evalation organizations Mathematica Policy Research and SRI International, is bound to become part of the "technology wars." It will also give some comfort to a) publishers who feel under the gun from the assaults of the tech trade group SIIA to open textbook adoption to the software providers and b) teacher groups arguing that the answer to student performance problems is adding labor instead of substituting it for technology

The main findings of the study are:
  • Test scores were not significantly higher in classrooms using the reading and mathematics software products than those in control classrooms. In each of the four groups of products-reading in first grade and in fourth grade, mathematics in sixth grade, and high school algebra-the evaluation found no significant differences in student achievement between the classrooms that used the technology products and classrooms that did not.
  • There was substantial variation between schools regarding the effects on student achievement. Although the study collected data on many school and classroom characteristics, only two characteristics were related to the variation in reading achievement. For first grade, effects were larger in schools that had smaller student-teacher ratios (a measure of class size). For fourth grade, effects were larger when treatment teachers reported higher levels of use of the study product.
It is unlikely that the media will treat this report with much subtlety, but the study may tell us much more about the state of the evaluation art than the efficacy of technology-based software programs.

Perhaps the first point is the idea of randomly assigning programs to teachers, even if they were volunteers. If we know anything from research on other large scale program interventions, we know that teacher "buy-in" and district support are essential to implementation, and we don't need a study to tell us that implementation is important to program results. In the real world, providers seek schools with teachers who have selected the program after reviewing their options. Random assignment is bound to result in random buy-in, and so random implementation, and random results.  The meaningful test of program efficacy is as the program is intended to be offered on the market - comparing teachers who want to use the program with teachers who do not use it.  (There must also be some contamination of the control group results because teachers in that group were allowed to use other technlogy products although not ones "similar" to products in the experimental group.)

Second, "the study was designed to report results for groups of products rather than for individual products."  To get a sense of how useful this is to consumers or policy makers, consider a study of automobile emmissions, safety, or gas mileage based on the categories "compact," "SUV," and "luxury". In each category every make and model, or rather selected makes and models chosen by the reviewers, are treated as a homogemnous group and assigned to drivers interested in participating in the study.  What exactly is the utility of the findings about the impact of these classes of car on any of these measures?  Does it help policymakers make decisions about the automobile industry? Does it help consumers decide on the purchase of their next cars? No and no.

Technology-based programs may or may not add value to student performance, but this study doesn't tell us anything about that. Indeed rather than shed light, it is likely to obscure the issue. The study has value as a step in the development of appropriate methodologies for the evaluation of educational interventions promulgated on a mass scale, and the research community is better off for it. But is no guide to policy or purchasing.

Download the report here.