«Douglas Harris Tim R. Sass Dept. of Educational Leadership & Policy Studies Dept. of Economics Florida State University Florida State University ...»
B. Measurable Student Inputs Many studies in the literature on teacher quality estimate achievement models using observed time-invariant student and family characteristics, rather than student-specific effects to control for student ability and family inputs. Examples include Aaronson, Barrow and Sander (2003), Clotfelter, Ladd and Vigdor (2005) and Goldhaber and Brewer (1997). As with teacher covariates, the use of time invariant student characteristics like free lunch eligibility, race/ethnicity and disability status is potentially problematic. Any time-invariant student/family heterogeneity that is not captured by observed student characteristics becomes part of the error term. If the remaining unobserved student heterogeneity is correlated with observed timevarying student and school based independent variables in the model (ie. Xit, P-ijmt, Tkt), estimates of the model parameters will be biased. To minimize this problem, Clotfelter, Ladd and Vigdor (2005) analyze North Carolina classrooms with “apparent” random assignment of students (based on observed characteristics) to study the impact of teachers on student performance.
V. Data In order to test alternative model specifications we utilize data come from the Florida Department of Education's K-20 Education Data Warehouse (EDW), an integrated longitudinal database covering all Florida public school students and school employees from pre-school through college. The EDW currently contains data for the 1995/1996 through 2003/2004 school years. Unlike most state-level administrative databases, the EDW includes not only test scores and demographic and programmatic information for individual students, but information on student enrollment, attendance and disciplinary actions as well. In addition, Florida’s Education Data Warehouse incorporates employment records of all school personnel. Both the student and employee information can be linked to specific classrooms.
Although student records are available since the 1995/1996 school year, statewide standardized testing in consecutive grade levels did not begin in Florida until school-year 1999/2000. The state currently administers two sets of reading and math tests to all third through tenth graders in Florida. The “Sunshine State Standards” Florida Comprehensive Achievement Test (FCAT-SSS) is a criterion-based exam designed to test for the skills that students are expected to master at each grade level. The second test is the FCAT NormReferenced Test (FCAT-NRT), a version of the Stanford-9 achievement test. The Stanford-9 is a vertical or development-scale exam. Hence scores typically increase with the grade level and a one-point increase in the score at one place on the scale is equivalent to a one-point increase anywhere else on the scale. We use FCAT-NRT scale scores in all of the analysis. The use of the FCAT-NRT minimizes potential biases associated with "teaching to the test," since all school accountability standards, as well as promotion and graduation criteria in Florida are based on the FCAT-SSS, rather than the FCAT-NRT.
Although achievement test scores are available for both math and reading in grades 3-10, we limit our initial analysis to mathematics achievement in middle school, grades 6-8. We select middle-school mathematics classes for a number of reasons. First, it is easier to identify the relevant teacher and peer group for middle-school students than for elementary students. The overwhelming majority of middle school students in Florida move between specific classrooms for each subject whereas elementary school students typically receive the majority of their core academic instruction in a “self-contained” classroom. However, for elementary school students enrolled in self-contained classrooms, five percent are also enrolled in a separate math course and nearly 13 percent are enrolled in either special-education or gifted courses.
Second, parent “lobbying” and allocation of students to classrooms based on principals’ information regarding unmeasured student characteristics are more likely to lead to non-random classroom assignment in elementary school than in middle school. Since middle-school teachers often teach multiple sections of the same course, there is likely to be less parental pressure to have their child enrolled in a particular classroom (though they may still seek to have their child taught by a particular teacher). Also, because middle schools are larger, have more students per grade and students generally attend a different school for the preceding elementary grades, middle-school principals are less likely to possess information on unmeasured characteristics that can be used to make classroom assignments.
Third, because middle-school teachers often teach multiple sections of a course during an academic year, it is easier to clearly identify the effects of individual teachers on student achievement. In elementary school, teachers typically are with the same group of students all day long and thus teacher effects can only be identified by observing multiple cohorts of students taught by a given teacher over time. In contrast, both variation in class composition across sections at a point in time as well as variation across cohorts over time help to distinguish teacher effects from other classroom-level factors affecting student achievement in middle school.
We initially focus on math achievement rather than reading because it is easier to clearly identify the class and teacher most relevant to the material being tested. While some mathematics-related material might be presented in science courses, direct mathematics instruction almost always occurs in math classes. In contrast, middle school students in Florida may be simultaneously enrolled in “language arts” and reading courses, both of which may cover material relevant to reading achievement tests.
In addition to selecting middle-school math courses for analysis, we have limited our sample in other ways in an attempt to get the cleanest possible measures of classroom peers and teachers. First, we restrict our analysis of student achievement to students who are enrolled in only a single mathematics course (though all other students enrolled in the course are included in the measurement of peer-group characteristics). Second, to avoid atypical classroom settings and jointly taught classes we consider only courses in which 10-50 students are enrolled. Third, we eliminate any courses in which there is more than one “primary instructor” of record for the class. Finally, we eliminate charter schools from the analysis since they may have differing curricular emphases and student-peer and student-teacher interactions may differ in fundamental ways from traditional public schools.
Estimation of the achievement models with lagged test scores and individual fixed effects requires at least three consecutive years of student achievement data. Given statewide testing began in 1999/2000, our analysis is limited to Florida traditional public school students in grades 6-8 over the years 1999-2000 through 2003-2004 who took the FCAT-NRT for at least three consecutive years. This includes four cohorts of students, with over 120,000 students in each cohort. Unfortunately, it is not computationally tractable for us to consistently estimate models with lagged dependent variables using the entire sample. We therefore randomly select 100 middle schools for analysis, which results in approximately a twelve percent sample of the relevant population.19 We randomly select 100 middle schools from all those operating in the 2002/2003 school year. However, we track students across all schools attended in the state and thus the number of schools in the sample exceeds 100.
VI. Results A. The Value-Added Model and Persistence of Prior School Inputs Recall from section II that lagged schooling inputs enter directly into the cumulative achievement function, but are captured by the lagged level of achievement in value-added formulation. Further, the restricted form of the value-added specification or “gain score” model assumes that the persistence of lagged schooling inputs is one. We present tests of these two assumptions in Table 2. In the first column of Table 2 we present estimates of the unrestricted value-added model (equation (10)) obtained by using the Arellano and Bond (1991) dynamic panel estimator. The coefficient on the lagged achievement score is not significantly different from zero, suggesting that past educational inputs do not affect current scores. However, this runs counter to previous work by Sass (2006), which estimates the coefficient on lagged achievement at 0.1 for mathematics and 0.2 for reading using data spanning grades 3-10.
We test the assumption that lagged achievement serves as a sufficient statistic for all past schooling inputs by adding twice-lagged measured inputs in the model (not shown in the table).20 If lagged achievement does not capture the effects of prior inputs then past inputs should have statistically significant effects on achievement when added to the value-added model. Estimates of the unrestricted value-added model with twice-lagged schooling inputs are presented in the second column of Table 2. A Wald test on the joint significance of the lagged inputs fails to reject the null hypothesis, suggesting that lagged achievement serves as a sufficient statistic for historical schooling inputs.
The twice lagged inputs included in the model were student mobility (number of schools attended, “structural” move, “non-structural” move), peer identity (proportion female, proportion black), class size and teacher experience (0 years, 1 year, 2-4 years).
To check the sensitivity of measures of teacher quality to assumptions about the persistence of lagged schooling puts we parametrically vary the persistence parameter, λ, from 0 to 1 in increments. A value of 1 corresponds to the gain-score model and 0 to the contemporaneous model. Estimates are presented in Tables 3A and 3B. The results in Table 3A indicate that the estimated impacts of time-varying teacher characteristics are similar across value-added specifications, but are qualitatively different for the contemporaneous specification.
This is consistent with a recent study of charter schools by Sass (2006), which obtains qualitatively similar results for the restricted value-added (gain score) model and the unrestricted value-added model.
Table 3B presents correlations among the teacher effects that were estimated in the models displayed in Table 3A. Once again, the value-added models with different persistence levels all produce strikingly similar estimates. While the correlations decrease with the divergence in persistence assumptions, for values of λ from 0.2 to 1.0, the estimated teacher fixed effects are correlated at.88 and higher. Only for the contemporaneous model are the teacher effects much different than those from the gain-score model. In that case the correlation of teacher effects is only 0.76. This is consistent with the results of McCaffrey et al. (2004) who utilize two small-scale datasets to compare models: a simulation of 200 students and sample of 678 students from a single large suburban school district. McCaffey et al. (2004) find a high correlation between estimated teacher effects from models that include the restriction λ=1 versus one that leaves λ unrestricted.
Taking the results from Tables 3A and 3B together, it appears that use of the gain-score model to estimate teacher quality, rather than the unrestricted value-added model, should produce similar estimates. In the following analyses we utilize the gain score model and investigate how alternative specifications of the gain-score model impact estimates of teacher quality.
B. Alternative Measures of Time-Invariant Teacher Characteristics Table 4 presents estimates of the impact of time-varying teacher characteristics on student achievement using alternative methods of controlling for time-invariant teacher attributes.21 In the model presented in the first column of Table 4, teacher demographic characteristics (race, ethnicity, gender) are included as regressors while in the second column these variables are replaced with a set of teacher fixed effects.22 Interestingly, with only teacher demographic characteristics both teacher experience and possession of an advanced degree are found to boost student achievement while these effects are statistically insignificant when unobserved teacher heterogeneity is taken into account with fixed effects. These results suggest that the use of teacher fixed effects is important to adequately control for unmeasured aspects of teacher quality.23 B. Differing Controls for Classroom and School Characteristics Table 5A presents estimates of the impact of time-varying teacher characteristics on student achievement from models with differing controls for peer influences, class size and school characteristics. Inclusion/exclusion of variables to control for peer quality, class size and school quality has virtually no effect on the estimated impacts of teacher experience, professional development and advanced degrees. At first blush this might seem surprising. However, it is important to recognize that all of the estimated models include teacher fixed effects and thus Unlike the previous set of tables, there are no correlations of teacher effects since only one model include teacher fixed effects.
Ideally, one would want to include measures of pre-service ability and training, such as college entrance exam scores, college coursework, etc. in the vector of time-invariant teacher characteristics. Unfortunately, we currently possess this information for only a small fraction of Florida teachers.
In contrast, Hanushek (1992) obtains similar results when comparing teacher covariates with teacher fixed effects.
control for unobserved teacher quality that might be correlated with student and school characteristics. If the models had excluded teacher fixed effects we would expect significant variation in the estimated coefficients across the various models.
In Table 5B, the correlations in the estimated teacher effects from alternative models clearly indicate that inclusion/exclusion of school effects greatly impacts estimated teacher effects while controls for class size and peer characteristics have only minor impacts. This is consistent with the notion that there is significant sorting in teacher quality across schools. If unobserved teacher quality is correlated with school quality then removing school fixed effects would greatly alter the estimated teacher effects, which is what we observe. Put differently, if teachers are not randomly assigned across schools, then the performance of a teacher relative to her average colleague within a school (ie. the fixed teacher effect when school effects are included) will be different than her performance relative to the average teacher in the school system (ie. the fixed teacher effect when school effects are excluded).
Our findings regarding the importance of school effects are consistent with the results of Aaronson, Barrow and Sander (2003). They find that the exclusion of school fixed effects can be easily rejected by an F-test. Similarly, McCaffrey, et al. (2004) find that when school effects are excluded, estimated teacher effects are negatively correlated with the proportion of students receiving free and reduced-price lunch. When school fixed effects are included this correlation is essentially eliminated. Further, with school fixed effects the within-school variance in teacher effects is much smaller than in a model with school-specific controls.