# «Draft, April 21, 2008 Teacher Effects: What Do We Know? Helen F. Ladd Edgar Thompson Professor of Public Policy Studies and professor of economics Duke ...»

Models of this form (but with additional explanatory variables as discussed below) are typically referred to as value-added models and are commonly used in the literature to estimate β, namely the effect of current teachers on current achievement. Their popularity comes largely from their simplicity and intuitive appeal. Logically, it makes sense that one would want to control statistically for the achievement, or knowledge, that the student brings to the classroom at the beginning of the year when estimating the effect of her current teacher. In addition, the valueadded model is flexible in that it does not impose a specific assumption about the rate at which knowledge persists over time; instead it allows that rate to be estimated. Nonetheless, the model is valid only if the underlying assumptions about the constancy of effects are valid. Further such models raise statistical concerns because of the inclusion on the right hand side of the equation of the lagged achievement term, which in the presence of serial correlation would be correlated with the error term.

Gains model This last statistical problem can be avoided by assuming there is no decay in knowledge so that the persistent parameter, α, equals 1 and moving the lagged achievement term to the left

**hand side of the equation. This procedure generates the gains model:**

Ait –Ait-1 = β Tit + εit. (5) In this case, the parameter, β, refers to the effect of teacher quality on the gains in achievement.

If the assumptions underlying the initial value-added model are correct, however, and the decay rate is not zero, the gains model is mispecified. The reason is that the term (α-1)Ai,t-1 is now missing from the right hand side of the equation. To the extent that prior year achievement is positively correlated with teacher effects, the teacher effects would be biased downward. Thus, within the framework of education as a cumulative process, the shift to the gains model solves one statistical problem but introduces a new one.

Full value added (or gains) model with student fixed effects In fact, most researchers estimate a richer form of the simple model in equation 4, one that includes time varying student characteristics, classroom or school characteristics, and student fixed effects. This full model can only be estimated with longidudinal data on individual students and multiple cohorts of students. If data are available for only a single cohort of

** Ladd, Teacher Effects, Draft April 21, 2008**

students, no classroom characteristics such as class size or the composition of the students can be included in the equation because teachers and their classrooms are indistinguishable.

Ait = αAit-1 + β T it + γ Xit + δCit + θi + ηit (6) Where Ait, A i, t-1 and T are as defined above and Xit are time varying student variables 3 Cit are classroom and school characteristics in year t θi are student fixed effects ηit is an error term For this model to be consistent with the cumulative model of the education process, the same assumptions that were needed to derive the simple value added model in equation 4 are needed.

In particular, each of the variables must exert a constant linear effect on student achievement in each year and their effects on student achievement must all decay at the same rate (1-α).

The student fixed effects are a crucial part of this enriched model. They control for the time-invariant characteristics of students – both those that are measurable and those that are not and under certain assumptions address the fundamental problem highlighted above, namely that the teachers are not randomly assigned to students. The inclusion of student fixed effects means that the teacher effects are derived from the within-student variation in student achievement. The key assumption needed for student fixed effects to address fully the concern about nonrandom sorting is that students are assigned to teachers based on their permanent or average characteristics rather than on any time-varying unmeasurable characteristics. Most value added studies of teacher effects either implicitly or explicitly make this assumption. I return below to Jesse Rothstein’s recent test of the validity of this assumption.

In the context of these models, the teacher variables are typically entered as 0-1 indicator variables, either for each teacher or for each teacher by year. Thus teacher effects are estimated by the method of teacher fixed effects (in contrast to the method of random effects), an approach that seems reasonable given the goal of determining the effectiveness of a specific group of actual teachers.

Two issues arise in the estimation and interpretation of such models. One is the technical challenge of using a program such as STATA to generate teacher effects in a model that also includes student fixed effects. Though STATA can easily handle one set of fixed effects through the process of demeaning – e.g by subtracting the mean value for each student from all the variables in the model – it cannot use that procedure simultaneously for a second set of fixed effects. A natural solution to that technical problem is to create a new set of indicator variables that combine the student and teacher indicator variables into a single set of student-teacher indicator variables. Though that process works well for some purposes, it has the disadvantage of making it difficult to capture the individual teacher effects. New programs are becoming available to address this technical program (Corneliβen. 2006). An alternative solution to this technical problem is to replace the student fixed effects with a vector of student characteristics.

Often included as time varying student variable are indicators for whether a student has changed schools, either independent of other students or as part of a move with others from one level of schooling to another.

Ladd, Teacher Effects, Draft April 21, 2008 That approach, however, misses all the unmeasurable characteristics of students that could well be correlated with teacher quality.

The second issue relates to measurement error. The coefficients of the teacher indicator variables are estimated with different degrees of precision. Had they been estimated by random effects rather than by fixed effects, estimates for individual teachers would have been shrunken toward the mean, with the amount of shrinkage greater for the teacher effects that are estimated with less precision. Letting βt* represent the predicted teacher effect for teacher t that emerges from a fixed effects specification, βt the true value and ε a random error, we can express the predicted teacher effect that emerges from a fixed effect specification as a function of the true

**effect plus an error term as follows:**

β t* = β t + ε (7) One might then calculate an adjusted teacher effect for any given teacher as a weighted average of the estimated teacher effect for that teacher and the mean teacher effect for the sample as a

**whole:**

λ βt* +(1-λ) mean Bt* (8) where λ = Varβt /(Var βt + Var ε).

Thus, the larger is the random error of the estimate, the smaller is λ and the greater the weight placed on the mean teacher effect. Though such an adjustment is conceptually straightforward, it can be difficult to implement in practice because it requires the standard errors for each of the estimated teacher effects, which can be difficult to estimate (Lockwood, McCaffrey and Sass, 2008). One implication of this shrinkage procedure is that teachers who teach small numbers of students are unlikely to be identified as either particularly effective or particularly ineffective teachers. Although the outcome on the low side may be appropriate since it would protect decent teachers with small classes from being unjustly sanctioned, the shrinkage procedure could also keep some very effective teachers from being recognized.

Additional considerations Though much more could be said about this standard value added (or gains model), I add here only two additional considerations. The first refers to the role of parents. As pointed out by Todd and Wolpin ( 2003) compensating behavior by parents could potentially mute the estimated differences in teacher effectiveness. That outcome would occur if parents spend more productive time working on school work with their children when their children have ineffective teachers than when they have effective teachers.

Another is whether to include school fixed effects in the model. Often they are not included, particularly if student fixed effects are in the model, as in equation 6. In the absence of student fixed effects, the addition of school fixed effects can help mitigate the problem caused by the non-random assignment of teachers to students. Their inclusion in the model means that teacher effects are identified solely by differences in teacher quality within schools. As a result, the estimates of teacher effects are not contaminated by the fact that the more effective teachers are more likely to end up in schools with the more able and more motivated students. Including school, rather than student fixed effects, however, does not account for the possibility that the

** Ladd, Teacher Effects, Draft April 21, 2008**

more able students within a school may be assigned to the higher quality teachers. 4 At the same time, their inclusion means that a teacher’s effectiveness is measured relative to other teachers in the school rather than to a broader set of teachers. As shown in Table 1 above, the overall estimated variation in teacher effectiveness will be smaller when school fixed effects are included than when they are excluded.

How stable are teacher effects?

In most cases one would expect that a teacher who is very effective (or ineffective) in one year would be similarly effective (or ineffective) in the following year. Hence, one way to evaluate the validity of the teacher effects that emerge from value added models is to examine their stability from one year to the next. The more unstable they are they less useful they are likely to be for making high stakes decisions about teachers.

Only a few studies have explored the stability of teacher effects (Ballou, 2005, Aaronson et al (2007) and Koedel and Betts (2007). Such studies find that teacher effects are quite unstable. For example, consider the findings of Koedel and Betts(2007) for teachers in San Diego. After ranking the teachers by their estimated fixed effects for two years in a row, the researchers find that among those who are ranked in the lowest quintile in the first year, only 30 percent stay in that quintile in the next year and another 31 percent move up to one of the top two quintiles. A similar pattern emerges at the top of the distribution. While 35 percent of teachers who are initially ranked in the top quintile remain in that quintile in the second year, 30 percent of them fall to the first or second quintile (cited in Lockwood, McCaffrey and Sass, 2008, p. 3).

The most complete study of the stability of teacher effects is by Lockwood, McCaffrey and Sass (2008). This study is based on middle school math teachers in six large Florida districts from 2000/01 to 2004/05. The authors focus on middle school teachers because the fact that they teach multiple sections of students means the teacher effects estimated for them are likely to be more stable than those for elementary school teachers, and on math teachers because teacher effects are generally larger for math than for reading. The authors start with a very simple gains model -- one with student fixed effects and teacher-by-year fixed effects – and then examine how modifying the model changes the results. They estimate all models as the district level and do not include school fixed effects. Thus, the teacher effects are measured relative to the average of all teachers in the district in the relevant subject and grade range, not relative to the average teacher at a given school.

The findings are quite clear. The correlations of teacher fixed effects across adjacent years in each district are moderately low, typically in the range of 0.3 to 0.5 and do not change much as the model is modified with additional covariates or modified. In addition, like Koedel and Betts (2007), the authors find substantial movement of teachers from one part of the effectiveness distribution to another in successive years. For example, among teachers who were in the top quintile (not quartiles as in Koedel and Betts) in 2003/04,the percentages of teachers who remained in that quintile in the next year ranged from a high of 46 in Broward County to a low of 23 percent in Palm Beach (Lockwood, McCaffrey and Sass, 2008, Table 3).

At the elementary level, the nonrandom matching of students to teachers appears to be a far larger problem than the nonrandom matching of students to teachers across classrooms within schools in North Carolina (Clotfelter, Ladd and Vigdor, 2006).

Ladd, Teacher Effects, Draft April 21, 2008 The authors tried to determine the causes of the instability by examining the effects of class size, whether or not the test scores are normalized, the extent to which teachers have some students in common, and the addition of covariates to the value-added model. With a few minor exceptions the instability of the effectiveness rankings was not very sensitive to the various changes. 5 The authors conclude that their findings suggest the need for caution in using valueadded estimates of individual teacher productivity for high-stakes personnel decisions.

The Rothstein challenge Another challenge to the validity of the value-added approach to estimating teacher effects appears in a recent paper by Jesse Rothstein (2007). As emphasized above, one of the advantages of longitudinal data sets for estimating teacher effects is that they permit the researcher to use student fixed effects to control for the time-invariant student-level characteristics – both measured and unmeasured – that may be correlated with the teacher measures. The inclusion of fixed effects for students solves the problem of the non-random matching of students to teachers, however, only when such matching is based on the time invariant characteristics of the students, such as their basic ability or motivation. Rothstein refers to such matching as “static tracking” and contrasts it to the “dynamic tracking ” that occurs when school administrators sort students into classrooms and teachers in a non-random way that is based in part on the student’s current performance.

He correctly emphasizes the importance of testing the assumption of static tracking and does so by introducing a placebo. In particular, using data for one cohort of elementary school students in North Carolina, he estimates a value added model that includes not only the student’s current teacher (e.g. her fourth grade teacher ) but also the student’s subsequent teacher in the following grade (e.g. her fifth grade teacher). If the basic value-added model is correct, the fifth grade teacher should have no impact on the student’s four grade test scores (or more precisely in the context of a model with student fixed effects, on the extent to which the student’s fourth grade test score deviates from the her average test scores). In fact, however, he finds that the student’s fifth grade teacher has almost as big an impact on her fourth grade scores (in reading) as does her fourth grade teacher. He argues that that this outcome occurs because the student’s fourth grade test score is used to determine her fifth grade teacher.