«Does Student Sorting Invalidate Value-Added Models of Teacher Effectiveness? An Extended Analysis of the Rothstein Critique Cory Koedel University of ...»
there is nothing particularly special about the years covered by our data panel. For example, some of the teachers in group (A) surely left the district in the year after our data panel ended or didn‟t teach in the year before it started, and some of the teachers in group (B) surely taught in three or more contiguous years outside of the data panel (for example, if a teacher taught in the year prior to the first year of our data panel, and then the first two years of our data panel but not the third, we would assign the teacher to group (B)).
The second explanation for the observed variance gap is that it occurs by chance. To evaluate this possibility, we use a bootstrap to derive empirically the distribution from which the variance-gap estimate would be drawn if the sample were split at random. We randomly assign the teachers from our sample into two groups that are equivalent in size to groups (A) and (B) above, and calculate the adjusted-variance gap between these randomly assigned groups. We repeat this procedure 500 times and use the 500 variance-gap estimates to define the variancegap distribution based on randomly splitting the teacher sample. The variance gap is calculated as the adjusted variance of teacher effects in the smaller group minus the adjusted variance of teacher effects in the larger group, all divided by the adjusted variance of teacher effects in the
larger group. In other words, we calculate:
[var(GroupA) var(GroupB)] / var(GroupB)).
The average variance gap generated by the bootstrap analysis is +1 percent. The standard deviation in this variance gap is quite big though, at 24 percent. Thus, the variance gap estimated between the teachers in groups (A) and (B), -24 percent as shown in Table 7, is just over a standard deviation away from the average of the empirical variance-gap distribution (at approximately the 13th percentile of the range of bootstrapped estimates). Although the empirical variance-gap distribution is wide (the 90-percent confidence interval ranges from -35 and +45 percent), which limits our ability to detect statistical significance even when the observed variance gap is large, the gap estimated between groups (A) and (B) is suggestive of a transitory-sorting-bias effect.
VI. Conclusion On the one hand, our results corroborate Rothstein‟s key finding that value-added models of student achievement can produce biased estimates of teacher effects. In fact, we show that even detailed value-added models that estimate teacher effects across multiple cohorts of students can still produce biased estimates, as evidenced by the future-teacher “effects” documented in Tables 5 and 6.
However, on the other hand, our results are encouraging because they indicate that sorting bias in value-added estimation need not be as large as is implied by Rothstein‟s work. A key finding here is that using multiple years of classroom observations for teachers will reduce sorting bias in value-added estimates. This result raises concerns about using single-year measures of teacher value-added to evaluate teacher effectiveness. For example, one may not want to use achievement gains of the students of novice teachers who are in their first year of teaching to make decisions about which novice teachers should be retained.
In our setting in San Diego, using a student-fixed-effects model and evaluating teachers who teach students in three consecutive years mitigates the contribution of sorting-bias to the teacher-effect estimates. Although this result may not universally generalize, and depends on the degree of student-teacher sorting in our data, it suggests that under some circumstances valueadded modeling can continue to be a powerful tool in the analysis of teacher effectiveness.
However, to the extent that our results corroborate Rothstein‟s findings, they highlight an important issue with incorporating value-added measures of teacher effectiveness into highstakes teacher evaluations. Namely, value-added is manipulable by administrators who determine students‟ classroom assignments. Our entire analysis is based on a low-stakes measure of teacher effectiveness. If high stakes were assigned to value-added measures of teacher effectiveness, sufficient safeguards would need to be put in place to ensure that the system could not be gamed through purposeful sorting of students to teachers for the benefit of altering value-added measures of teacher effectiveness.
ReferencesAaronson, Daniel, Lisa Barrow and William Sander. 2007. Teachers and Student Achievement in the Chicago Public High Schools. Journal of Labor Economics 25:95-135.
Anderson T.W. and Cheng Hsiao. 1981. Estimation of Dynamic Models with Error Components. Journal of the American Statistical Association. 76:598-609.
Betts, Julian R., Andrew Zau and Lorien Rice. 2003. Determinants of Student Achievement:
New Evidence from San Diego, San Francisco: Public Policy Institute of California.
Downloadable from www.ppic.org.
Clotfelter, Charles T., Helen F. Ladd and Jacob L. Vigdor. 2007. Teacher-Student Matching and the Assessment of Teacher Effectiveness. Journal of Human Resources. 41:778-820.
Goldhaber, Dan and Michael Hansen. 2008. Is It Just a Bad Class? Assessing the Stability of Measured Teacher Performance. CRPE Working Paper #2008-5.
Hanushek, Eric, John Kain, Daniel O‟Brien, and Steven Rivkin. 2005. The Market for Teacher Quality. Working Paper no. 11154, National Bureau of Economic Research, Cambridge, MA.
Hanushek, Eric. 1996. Measuring investment in Education. The Journal of Economic Perspectives 10:9-30.
Harris, Douglas and Tim R. Sass. 2006. Value-Added Models and the Measurement of Teacher Quality. Unpublished manuscript, Department of Economics, Florida State University, Tallahassee.
-- 2007. What Makes for a Good Teacher and Who Can Tell? Unpublished manuscript, Department of Economics, Florida State University, Tallahassee.
Jacob, Brian and Lars Lefgren. 2007. Principals as Agents: Subjective Performance Assessment in Education. Working Paper no. 11463, National Bureau of Economic Research, Cambridge, MA.
Kane, Thomas and Douglas Staiger. 2002. The Promise and Pitfalls of Using Imprecise School Accountability Measures. Journal of Economic Perspectives 16:91-114.
-- 2008. Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. Working Paper no. 14607, National Bureau of Economic Research, Cambridge, MA.
Koedel, Cory (forthcoming). An Empirical Analysis of Teacher Spillover Effects in Secondary School. Economics of Education Review.
Koedel, Cory and Julian R. Betts. 2007. Re-Examining the Role of Teacher Quality in the Educational Production Function. Working Paper 07-08, University of Missouri, Columbia.
-- (forthcoming). Value-Added to What? How a Ceiling in the Testing Instrument Influences Value-Added Estimation. Education Finance and Policy.
McCaffrey, Daniel F., Tim R. Sass, J.R. Lockwood and Kata Mihaly. 2009. The InterTemporal Variability of Teacher Effect Estimates. Unpublished manuscript, Department of Economics, Florida State University, Tallahassee.
Murnane, Richard J., Judith D. Singer, John B. Willett, James J. Kemple and Randall J.
Olson. 1991. Who Will Teach? Policies That Matter, Cambridge, MA: Harvard University Press.
Nye, Barbara, Spyros Konstantopoulos and Larry V. Hedges. 2004. How Large are Teacher Effects? Educational Evaluation and Policy Analysis 26:237-257.
Podgursky, Michael J. and Mathew G. Springer. 2007. Teacher Performance Pay: A Survey.
Journal of Policy Analysis and Management, 26:909-950.
Rockoff, Jonah. 2004. The Impact of Individual Teachers on Student Achievement: Evidence from Panel Data. American Economic Review, Papers and Proceedings.
Rothstein, Jesse. 2009. Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement. Unpublished Manuscript, Princeton University.
Rothstein, Jesse (forthcoming). Student Sorting and Bias in Value-Added Estimation:
Selection on Observables and Unobservables. Education Finance and Policy.
Table 1. Standard Deviations of Teacher Effects from a Model with Controls for Past, Current and Future Teachers.
Dependent Variable: Fourth-Grade Gain in Test Score
Grade 5 610 (253) 0.01 0.30 0.15 Teachers The Wald statistics and p-values refer to tests that all teachers in the given grade have identical effects on student gain sin grade 4. The standard deviations refer to the standard deviations of estimated teacher effects, both raw and adjusted as explained in the text.
Standard Deviations of Lagged Scores 0.81 0.90 0.32 0.99 0.01 Note: In the “Perfect Sorting” columns students are sorted by period (t-1) test-score levels in math. For the randomized assignments, students are assigned to teachers based on randomly generated numbers from a uniform distribution. The random assignments are repeated 25 times and estimates are averaged across all random assignments and all teachers. The estimates from the simulated random assignments are very stable across simulations.
Table 3. Controls from Value-Added Models
*The share of days missed by students is sometimes considered endogenous. Fourth-grade students, however, are not likely to have much influence over their attendance decisions.
Table 4. Extension of Rothstein‟s Analysis Using the Value-Added Models from Section IV
* Note that these estimates are for the composite effects documented in equation (7).
Table 5. Extension of Rothstein‟s Analysis Using the Value-Added Models from Section IV and Only Modeling Future Teachers Who Taught Students in Each Year of the Data Panel
Notes: For the basic and within-schools models this analysis includes fifth-grade teachers who teach in all four years of our data panel. In the within-students model we evaluate just three year-cohorts of students and therefore we include fifth-grade teachers who teach in three consecutive years.
* Note that these estimates are for the composite effects documented in equation (7).
** Adjusted-variance estimate was marginally negative.
Table 6. Within-Students Model Using Only Future Teachers Who Taught Students in Each Year of the Data Panel, With Each of the Three Year Cohorts Individually Omitted from the Dataset.
* The number of grade-4 teachers included in the model changes across rows because some of the grade-4 teachers only taught in a single year. Note that the 2001-2002 cohort of students was somewhat larger than the other two cohorts, which explains why there are fewer grade-4 teachers in the model when this cohort is dropped.
** These estimates are for the composite effects documented in equation (7).