there is nothing particularly special about the years covered by our data panel. For example, some of the teachers in group (A) surely left the district in the year after our data panel ended or didn‟t teach in the year before it started, and some of the teachers in group (B) surely taught in three or more contiguous years outside of the data panel (for example, if a teacher taught in the year prior to the first year of our data panel, and then the first two years of our data panel but not the third, we would assign the teacher to group (B)).

The second explanation for the observed variance gap is that it occurs by chance. To evaluate this possibility, we use a bootstrap to derive empirically the distribution from which the variance-gap estimate would be drawn if the sample were split at random. We randomly assign the teachers from our sample into two groups that are equivalent in size to groups (A) and (B) above, and calculate the adjusted-variance gap between these randomly assigned groups. We repeat this procedure 500 times and use the 500 variance-gap estimates to define the variancegap distribution based on randomly splitting the teacher sample. The variance gap is calculated as the adjusted variance of teacher effects in the smaller group minus the adjusted variance of teacher effects in the larger group, all divided by the adjusted variance of teacher effects in the

larger group. In other words, we calculate:

[var(GroupA)  var(GroupB)] / var(GroupB)).

The average variance gap generated by the bootstrap analysis is +1 percent. The standard deviation in this variance gap is quite big though, at 24 percent. Thus, the variance gap estimated between the teachers in groups (A) and (B), -24 percent as shown in Table 7, is just over a standard deviation away from the average of the empirical variance-gap distribution (at approximately the 13th percentile of the range of bootstrapped estimates). Although the empirical variance-gap distribution is wide (the 90-percent confidence interval ranges from -35 and +45 percent), which limits our ability to detect statistical significance even when the observed variance gap is large, the gap estimated between groups (A) and (B) is suggestive of a transitory-sorting-bias effect.

VI. Conclusion On the one hand, our results corroborate Rothstein‟s key finding that value-added models of student achievement can produce biased estimates of teacher effects. In fact, we show that even detailed value-added models that estimate teacher effects across multiple cohorts of students can still produce biased estimates, as evidenced by the future-teacher “effects” documented in Tables 5 and 6.

However, on the other hand, our results are encouraging because they indicate that sorting bias in value-added estimation need not be as large as is implied by Rothstein‟s work. A key finding here is that using multiple years of classroom observations for teachers will reduce sorting bias in value-added estimates. This result raises concerns about using single-year measures of teacher value-added to evaluate teacher effectiveness. For example, one may not want to use achievement gains of the students of novice teachers who are in their first year of teaching to make decisions about which novice teachers should be retained.

In our setting in San Diego, using a student-fixed-effects model and evaluating teachers who teach students in three consecutive years mitigates the contribution of sorting-bias to the teacher-effect estimates. Although this result may not universally generalize, and depends on the degree of student-teacher sorting in our data, it suggests that under some circumstances valueadded modeling can continue to be a powerful tool in the analysis of teacher effectiveness.

However, to the extent that our results corroborate Rothstein‟s findings, they highlight an important issue with incorporating value-added measures of teacher effectiveness into highstakes teacher evaluations. Namely, value-added is manipulable by administrators who determine students‟ classroom assignments. Our entire analysis is based on a low-stakes measure of teacher effectiveness. If high stakes were assigned to value-added measures of teacher effectiveness, sufficient safeguards would need to be put in place to ensure that the system could not be gamed through purposeful sorting of students to teachers for the benefit of altering value-added measures of teacher effectiveness.


Table 1. Standard Deviations of Teacher Effects from a Model with Controls for Past, Current and Future Teachers.

Dependent Variable: Fourth-Grade Gain in Test Score

–  –  –

Grade 5 610 (253) 0.01 0.30 0.15 Teachers The Wald statistics and p-values refer to tests that all teachers in the given grade have identical effects on student gain sin grade 4. The standard deviations refer to the standard deviations of estimated teacher effects, both raw and adjusted as explained in the text.

–  –  –

Standard Deviations of Lagged Scores 0.81 0.90 0.32 0.99 0.01 Note: In the “Perfect Sorting” columns students are sorted by period (t-1) test-score levels in math. For the randomized assignments, students are assigned to teachers based on randomly generated numbers from a uniform distribution. The random assignments are repeated 25 times and estimates are averaged across all random assignments and all teachers. The estimates from the simulated random assignments are very stable across simulations.

Table 3. Controls from Value-Added Models

–  –  –

*The share of days missed by students is sometimes considered endogenous. Fourth-grade students, however, are not likely to have much influence over their attendance decisions.

Table 4. Extension of Rothstein‟s Analysis Using the Value-Added Models from Section IV

–  –  –

* Note that these estimates are for the composite effects documented in equation (7).

Table 5. Extension of Rothstein‟s Analysis Using the Value-Added Models from Section IV and Only Modeling Future Teachers Who Taught Students in Each Year of the Data Panel

–  –  –

Notes: For the basic and within-schools models this analysis includes fifth-grade teachers who teach in all four years of our data panel. In the within-students model we evaluate just three year-cohorts of students and therefore we include fifth-grade teachers who teach in three consecutive years.

* Note that these estimates are for the composite effects documented in equation (7).

** Adjusted-variance estimate was marginally negative.

Table 6. Within-Students Model Using Only Future Teachers Who Taught Students in Each Year of the Data Panel, With Each of the Three Year Cohorts Individually Omitted from the Dataset.

–  –  –

* The number of grade-4 teachers included in the model changes across rows because some of the grade-4 teachers only taught in a single year. Note that the 2001-2002 cohort of students was somewhat larger than the other two cohorts, which explains why there are fewer grade-4 teachers in the model when this cohort is dropped.

** These estimates are for the composite effects documented in equation (7).

