# «Draft, April 21, 2008 Teacher Effects: What Do We Know? Helen F. Ladd Edgar Thompson Professor of Public Policy Studies and professor of economics Duke ...»

If Rothstein is correct about the importance of dynamic tracking, his analysis repesents a serious challenge to the validity of the standard value approach. On the face of it, the evidence he presents seems quite compelling. At the same time, it appears to imply that all of the estimated teacher effects are spurious, which conflicts with the conclusion from other studies that teachers matter. Hence additional research on the validity of the static tracking model is clearly needed. A first step would be to reestimate the Rothstein models with multiple cohorts, and to examine results for math in addition to reading. The use of multiple cohorts would permit the researcher to separate teacher effects from contextual effects, which as discussed below have emerged as a cause of concern with respect to the estimation of teacher effects in more complex models. Although Rothstein believes that the use of multiple cohorts will not change the results (personal communication with the author, April 2008), it would be useful to have that confirmed empirically. A second step would be to explore the student-assignment process used by school One challenge for the authors, which they had not resolved as of the April,2008 version of their paper was in determining the standard errors of the teacher fixed effects.

** Ladd, Teacher Effects, Draft April 21, 2008**

principals. Some preliminary informal investigation by this author in a few North Carolina elementary schools provides little support for the hypothesis of dynamic tracking, but the observations were limited. Clearly more investigation is needed. In addition, it might be productive to see how sensitive his findings are to the fact that he is examining within school variation alone.

Mixed methods or layered models (multivariate modeling) Mixed methods or layered models are far more complicated than the simple value added models in that they specify a joint distribution for the entire multivariate vector of test scores. 6 Included among these models are the Tennessee Value Added Assessment System (TVAAS) developed by William Sanders for Tennessee, the cross classified models of Rowan, Correnti, and Miller (2002) and Raudenbush and Bryk (2002) and the persistence models of McCaffrey et al. (2003).

The key element of such models is that a student’s performance in any year is modeled not only as a function of her teacher in the current year, but also of her teachers in years 1 and 2.

Moreover, such models typically estimate teacher effects using random rather than fixed effects.

A major advantage of multivariate models relative to the simpler value added models is that they use more information to identify teacher effects. In particular they make use of the fact that student scores in future years hold information about the effectiveness of teachers in the past.

Another advantage is that they are very flexible. The primary disadvantage of such models is their tremendous computational demands. Until computational methods are developed to make it easier to estimate such models, it is likely that the more standard value-added models will be the basis of much of the ongoing research in this field.

I focus here on the TVAAS layered model because it has received significant attention in the literature. Implicit in this specific model is the assumption that any teacher effects in a prior year persist undiminished in future years. No student covariates are included. Instead, the complex correlations among the errors from the repeated test scores substitute for student specific covariates.

Kupermintz (2003) provides some useful insights into the TVAAS model. First the resulting teacher effects rank teachers within each school system. Hence, a weak teacher in a system with many other weak teachers may receive a more favorable ranking that a similar teacher in a stronger system. Second, the teacher effects are “shrunken” towards the system average for reasons similar to those discussed above. Thus, once again, it is difficult to get accurate estimates of the effectiveness of teachers who are working with small numbers of students. In addition, and perhaps most significantly, Kupermintz questions the validity of the estimated teacher effects given that they emerge from a model that includes no student level or classroom level covariates. Though he acknowledges that the model uses prior achievement as a covariate or “blocking variable,” which means that each child serves as his or her own control, he notes that such “blocking” procedures were developed in the context of controlled experiments not in the context of observational studies. In contrast to controlled experiments in which treatments can be randomly assigned, students are not randomly assigned to teachers (Kupermintz, 2003, p. 292). As a result, the estimated teacher effects may be confounded by the effects of correlated student level characteristics that are omitted from the model. Further, he See McCaffrey et al, 56-62.

** Ladd, Teacher Effects, Draft April 21, 2008**

argues that for the TVAAS procedure to be valid, the prior year achievement variable would have to serve as a proxy for a variety of contextual factors including, for example the socioeconomic or achievement mix of students in the classroom.

The extent to which the absence of covariates, at either the student level or at the classroom level, distorts the results has been examined at by Lockwood and McCaffrey 2007) in the context of a general multivariate model (also see McCaffrey et al, 2004). Despite concerns that the use of random effects can lead to inconsistent estimates when unobserved individual effects are correlated with other variables in the model, Lockwood and MCaffrey (2007) demonstrate through analysis and simulation that the mixed method approach does not generate much bias in practical applications, especially when the number of tests scores or individual students is relatively large. The authors’ simulations support the claim of William Sanders that when the joint estimation of multiple test scores for individual students, along with other elements of the TVAAS approach, effectively purges the results of any bias that would otherwise arise as a result of the variation in student backgrounds (Lockwood and McCaffrey, 2007, p.

244.) At the same time, however, the mixed methods approach cannot control for bias when the student population is stratified. A stratified student population is one “in which there are disjoint groups of students such that students within a group share teachers but students in different groups never share any teachers ” (McCaffrey and Lockwood, 2007, p. 245).

Ballou, Sanders and Wright (2004) reinforce these conclusions empirically in the context of the TVAAS model. To examine the effects of student level covariates, the authors add them to the TVAAS model in a two- stage approach. They begin with a first-stage equation in which student achievement gains are estimated as a function of student characteristics and standard teacher fixed effects (not the teacher fixed effects that emerge from the TVAAS model). The inclusion of the teacher fixed effects ensures that the estimated coefficients of the student characteristics are uncorrelated with any time invariant component of teacher quality. They then use the estimated coefficients of the student characteristics to adjust the gain scores for each student and rerun the TVAAS model with the adjusted student gain scores. Consistent with the findings of Lockwood and McCaffrey (2007), the authors conclude that the use of the adjusted gain scores does not significantly change the estimates of teacher effects and hence that the unadjusted TVAAS model does an acceptable job of controlling for student level covariates.

The results differ, however, when Ballou, Sanders and Wright (2004) make similar adjustments for contextual factors (such as the percent of students in a grade or school eligible for free and reduced price lunches). In that case, the TVAAS results change significantly, are implausibly large in some grades, and are sensitive to minor changes in model specification.

Thus, consistent with the findings of Lockwood and McCaffrey, the stratification of students across schools renders the TVAAS model far less useful.

## Conclusion

Although a discussion of the policy implications of these results are in general beyond the scope of this paper, it is worth highlighting that neither the value added approach nor the mixed methods approach to the estimation of teacher effects generates sufficiently reliable and stable estimates of the causal effects of individual teachers for policy makers to use them for high stakes decisions about teachers. Those shortcomings notwithstanding, the results of value added modeling might potentially be useful for lower stakes personnel decisions within a school. For ideas on the possibilities, see Rivkin (2007).Ladd, Teacher Effects, Draft April 21, 2008 II. Are teacher credentials predictive of student achievement?

Based on a large number of early studies in the tradition of education production functions, and well publicized reviews of those studies by Eric Hanushek (e.g. Hanushek, 1997), a common view has been that teacher credentials are not very predictive of student achievement, and hence are not useful as policy levers for improving schools. More recently, researchers have taken advantage of the richness of longitudinal administrative data to estimate the effects of credentials in a two-step procedure. In the first step, they estimate teacher effects using one of the approaches discussed in section II and then in the second step they explore the extent to which the variation in those effects can be explained by variation in teachers’ credentials.

Because such researchers typically find little relationship between teacher credentials and teacher effects, their findings reinforce the standard view that teacher credentials are not predictive of student achievement. The validity and usefulness of this two-stage approach, of course, depends in part on the validity of the estimated teacher effects.

Other researchers have taken advantage of the newly available rich administrative data to examine the predictive power of teacher credentials more directly. Their strategy is simply to replace the teacher variables in value added or gain models with a vector of teacher credentials More recent work, including some by me and my Duke colleagues, has generated new, somewhat more positive results about the relationship between teacher credentials and student achievement. Since much of the recent literature on teacher credentials has been reviewed elsewhere (see for example, Goldhaber, 2008), my discussion here of the credentials literature is highly selective, is intended to be illustrative only, and draws heavily on my own research.

Are the effects of teacher credentials big or small?

Most value added or gain models that focus on teacher credentials are based on measures of student level test scores that have been normalized by year, grade, and subject. Thus the estimated coefficients on teacher credentials that emerge from models explaining student test scores or gains in test scores are calibrated in terms of fractions of a standard deviation. For example, a standard finding in such models is that, relative to a teacher with no experience, the first year or two of experience is associated with an increase of student achievement of about

0.06 standard deviations, all other factors held constant. The apparently small size of this estimate, along with similar or smaller estimated coefficients for other teacher credentials, has led some people to argue that even if teacher credentials emerge as statistically significant determinants of student achievement, they may be inconsequential from a policy perspective. A coefficient of 0.06, for example, is tiny compared to black-white test score gaps that have historically ranged from 0.5 to 1.0 standard deviations and is well below the 0.20 effect size often deemed small or moderate in the education literature, and even further below the mean effect size of 0.33 standard deviations that emerged from a study of 61 random assignment studies of school interventions at the elementary level (Hill et al, 2007, cited by Boyd et al, in progress).

A new paper by Boyd, Grossman, Lanford, Loeb and Wyckoff ( in progress) argues that, if correctly interpreted, the effect sizes of teacher credentials are far larger than they first appear.

** Ladd, Teacher Effects, Draft April 21, 2008**

The authors argue first that the coefficients of teacher credentials should be interpreted relative to the standard deviation of gain scores, not the standard deviation of test score levels. That argument follows directly from the cumulative nature of education. Given that estimates of teacher credentials in the context of a value-added or gains model are specifically designed to capture how the student’ fifth grade teacher, for example, affects the student’s gain in achievement during that year, it would not be appropriate to compare the estimated effect of the one- year teacher intervention to dispersion in the level of test scores, which reflect the cumulative effects of teacher and other variables over a longer period of time.

In addition, the authors argue that any interpretation of the estimated coefficients should account for the measurement error in the reported test score or scores. In particular, the coefficients should be compared to the dispersion in true achievement gains rather than in the measured achievement gains. This argument is consistent with the use of “reliable” variance discussed above in connection with research by Rowan, Correnti, and Miller (2002). For any student, true achievement gains differ from actual achievement gains both because of the measurement error associated with the test itself and because the student may have been particularly alert or inert on the day(s) of the test. Failure to account for these measurement errors is particularly problematic when the focus is on achievement gains because gains are based on two test scores, both of which are measured with error. Importantly, the measurement error does not complicate the estimation of the effects of teacher credentials; it only affects the resulting effect sizes. The elimination of measurement error generated standard deviations in “true” achievement gains that are far smaller than those in measured achievement gains.