# «Consortium for Educational Research and Evaluation– North Carolina Comparing Value-Added Models for Estimating Individual Teacher Effects on a Statewide ...»

Because each VAM handles the specification of these covariates differently, the weaker form of the assumption will affect the comparability of the models with the true teacher effect. The Consortium for Educational Research and Evaluation–North Carolina 6 Comparing Value-Added Models August 2012 strong form is violated under any associations with measured or unmeasured covariates; the weak form is violated because of the association between omitted variables—unmeasured inputs to learning—with (1) each student’s potential outcomes, and (2) each student’s assignment to teachers (Reardon & Raudenbush, 2009). This is a common problem because large-scale administrative data files do not contain measures of all of the inputs to learning (including such predispositions as motivation, for example, or the availability of reading material in the home) and because students are not usually randomly assigned to teachers. These problems manifest as associations between the average characteristics of students assigned to a teacher and the teacher effect. The ignorability assumption makes it explicit that unmeasured inputs to student learning are not confounders of the teacher effect unless these unmeasured inputs are also associated with treatment (student-teacher) assignment. While still a demanding assumption, this nevertheless substantially reduces the burden for satisfying causality.

**Violations of Assumptions**

Non-ignorable assignment of students to teachers is widely viewed as the key problem that VAMs must address. The most widely accepted form of ignorable assignment is randomization.

The quantitative evidence suggests that observed student and teacher characteristics are associated, and thus regardless of what the assignment process is (such as a tracking mechanism or a compensatory assignment policy), it may not be effective at equalizing over students’ propensity to perform on end-of-grade tests or teachers’ propensity to promote student achievement (Rothstein, 2010; Clotfelter, Ladd, & Vigdor, 2006). That is, it is not equivalent to a random process. An alternative to randomization is to adjust the models for the factors that are associated with both assignment and student outcomes, an approach that is dependent upon model specification. The information needed to sufficiently adjust the value-added models for these confounding effects in the presence of non-random assignment may not be known or available in the form of quantified measures. Longitudinal data files spanning multiple years on both the students and teachers, and linking students to teachers, are required in order to estimate these models. In many states these datasets are available, but certain information, like students’ home and family background characteristics, may not be measured, though proxies for these characteristics (e.g., subsidized lunch for family income) may be.

Interactions among peers in attenuating or magnifying a teacher’s effectiveness, a manifestation of the violation of SUTVA, may also confound estimation of a teacher’s true level of effectiveness. As a consequence of this violation, the risks are high that the estimate of the teacher effect is actually some combination of the teacher effect and these peer learning effects.

The problems manifesting under violations of this assumption should not be confused with those related to assignment. SUTVA violations occur after assignment and regardless of the satisfaction of the ignorability of assignment. Just as in the case of ignorable assignment, the data typically used to estimate teacher effects do not contain the information needed to directly measure these interactions. Instead, the data may contain only proxies for these interactions, such as the average prior performance of peers in a class. Proxies for manifestations of SUTVA violations, like average performance, may also be evidence of assignment non-randomness.

Under recent federal mandates, such as eligibility for Race to the Top funding, state education agencies may be required to use VAMs to rank and compare teachers across an entire state (Henry, Kershaw, Zulli, & Smith, 2012). In many cases, these agencies may also impose Consortium for Educational Research and Evaluation–North Carolina 7 Comparing Value-Added Models August 2012 requirements on principals that they incorporate VAM estimates into their teacher evaluation criteria with high-stakes consequences for teachers. These consequences may include being placed under a performance review or being dismissed. Regardless of the particular model selected for generating value-added teacher estimates, therefore, the intended uses of these models imply that VAM estimates are causal effects of teachers on student achievement growth.

Data limitations and the failure of assumptions for causality to hold in practice raise important questions about teacher effect estimation and whether the estimates can in fact be interpreted as causal effects for which teachers should be held accountable. Most empirical analyses, including VAMs, are based on theoretical assumptions such as those of the potential outcome model that are rarely met in practice. The effects of these deviations from theoretical assumptions should be studied to determine how they affect conclusions about teacher performance. Further, with several models available from the literature on value-added modeling, comparisons between models on the differential effects of deviations should be undertaken. The present study, like several other studies that precede it, represents such an attempt. Before discussing the studies that this effort is intended to build upon, we formally present and describe seven typical valueadded model specifications.

Consortium for Educational Research and Evaluation–North Carolina 8 Comparing Value-Added Models August 2012 Typical Value-Added Models In order to facilitate a discussion of the existing body of research comparing value-added models with each other or with some absolute standard, we present seven value-added models from a broad selection of statistical and econometric specifications that are most typically found in the literature. We provide the model specification, define the teacher effect in each model, and state the practical implications within each specification of satisfying the key assumptions from the potential outcome framework. For consistency, we use common notation for each model, despite variations in the source material. There are three large classes of VAMs being widely used: (1) nested random effects models, (2) econometric fixed effects models, and (3) hybrid fixed and random effects models, such as the educational value-added assessment system (EVAAS;

Ballou, Sanders, & Wright, 2004). (It is important to bear in mind that “fixed effects” refers to using unit-specific dummies or demeaning to control for unmeasured confounders and not to the coefficients on student covariates.) We use this organizing framework in the sections that follow.

**Nested Random Effects Models**

Nested random effects models treat the multiple levels of data—student, classroom, teacher, and school—as hierarchically nested units, where each lower level (e.g., student) can only be nested within one higher level unit. As such, these are generally single-year cross-sectional models, though the effects of previous years’ teachers on individual students’ current performance are typically accounted for by pretest covariates that are included in the model. These include hierarchical linear models and multilevel models (Raudenbush & Bryk, 2002). These models are

**based on the following general specification:**

(1),, Subscripts indicate the student ( ), teacher ( ), school ( ), and period ( ). The variable is a test score on a standardized end-of-grade exam in a selected subject area; is a vector of timeinvariant student characteristics or predispositions to learning that are associated with the accumulation of knowledge, as well as a constant term; is a vector of school characteristics; w 1 = the prior period; 2= = the period for which the teacher effect is being estimated;

two periods’ prior. Therefore, appears both as the dependent variable in the current period, as well as predictors of student achievement in the current period as both a one-year and two-year lag or pretest. The and terms are errors for (in order) the school, the teacher, and the student.

The teacher effect is estimated from the empirical Bayes residual or empirical best linear unbiased predictor estimate of (the teacher random effect). Variations on this model exist, including those that ignore the nesting of students and teachers within schools or use fewer prior test scores. All of the models of this type use a random effects model to estimate the teacher effect and may use covariates at the student and school levels to control for correlates of nonrandom assignment. To satisfy the ignorability requirement of the potential outcomes model, cannot be associated with excluded covariates that are also associated with the outcome. Because are errors, all factors associated with both assignment and the outcome must be measured and included in the model. This is a strong and potentially impractical requirement.

Consortium for Educational Research and Evaluation–North Carolina 9 Comparing Value-Added Models August 2012 Fixed Effects Models The second major type of VAM consists of a variety of fixed effects specifications. Fixed effects are within-group variance estimators frequently used in econometric models in which students or teachers (or both) are used to constrain variance in the context of panel data. The ignorability assumption is generally satisfied by way of indirect controls for confoundedness (or in econometric terms, endogeneity) rather than via specification of the confounding factors in the model. Student fixed effects models use students as their own control and aggregate withinstudent variation only. Teacher fixed effects models are similar, but rather than use teachers as their own controls, each teacher effect is explicitly estimated. In both cases, the models are assumed to be adjusted for confounders that do not vary “within” (over time for student fixed effects, or over students for teacher fixed effects). Alternative specifications, developed from Arellano and Bond (1991) and used in previous VAM research (Guarino et al., 2012), add instrumental variables (IV) in the form of twice-lagged measures of the dependent variable (the outcome two periods removed) to circumvent endogeneity between time-varying inputs, manifesting on the one-year lagged outcome, and the teacher effect. There are four major variations on fixed effects models: (1) a student fixed effects model; (2) a teacher fixed effects model; (3) a student fixed effects instrumental variable model; and (4) a teacher fixed effects instrumental variable model. Numerous models appear throughout the econometric evaluation literature but are largely variations on these major types.

The student fixed effect model (SFE) uses a multi-year panel with demeaning of all

**characteristics:**

The terms with bars (e.g., ) are the within-student means of each parameter. The term μ represents time-varying predictors of student achievement; the student fixed effect, which absorbs the fixed effects of time-invariant unmeasured confounders, such as predispositions to learning, is accordingly eliminated by demeaning. Alternatively, time-varying effects of these predispositions, if they exist, are not. The teacher effect is estimated as the mean of the ̅. To satisfy the ignorability requirement of composite residuals within each teacher, the potential outcomes model, this complex error must not be associated with any of the terms in the model, including serially.

The teacher fixed effects model (TFE) is a cross-sectional model much like the random effects models, focusing on the nesting of students in teachers but estimated using teacher dummy

**variables. Unlike the SFE, the teacher effect parameters in this case are estimated:**

(3), The teacher fixed effect model bears a resemblance to some nested random effects models;

instead of estimating the teacher effect with the random effect, the teacher effect is estimated using the dummy variables represented by, and the difference between the random and fixed effects estimates of the teacher effect is due to Bayesian shrinkage in accordance with the reliability of each teacher’s sample average (Raudenbush & Willms, 1995). To satisfy the Consortium for Educational Research and Evaluation–North Carolina 10 Comparing Value-Added Models August 2012 ignorability requirement of the potential outcomes model, must not be associated with excluded covariates that are also associated with the outcome.

Student and teacher fixed effects control for, respectively, between-student variation and between-teacher variation. The student fixed effects model does not address within-student variation, which is controlled for via covariates. The potential outcomes model informs us that within-student factors that are associated with assignment to treatment as well as the outcome represent confounders that are therefore not accounted for without explicitly including them in the model. If unobserved, they manifest on the error. The same applies to the teacher fixed effects model, though in that case, it is between-student variation at any point in time that represents the source of confounding.

Other econometric models use both fixed effects and instrumental variables in a two-stage least squares framework in order to eliminate endogeneity not eliminated by the fixed effects (i.e., serial or time varying effects in the student fixed effects model, such as the student’s previous year test score). Rather than demeaning over the entire panel, the fixed effect component is estimated using a “first difference.” With the two-stage framework, a first difference of the lag or pretest period is estimated with the outcome two periods removed (twice lagged) as the instrument. Subsequently, the predicted value of this first differenced pretest is entered as a

**covariate into the model for the current period, with first differencing:**

∆ ∆ (4.1),, ∆ ∆ ∆ (4.2), First differencing, much like demeaning, eliminates the fixed effects of time invariant characteristics. In the student fixed effect IV model (SFEIV), the term is not directly estimated; the teacher effects are calculated from the mean of the residuals within each teacher.

In the teacher fixed effect IV (TFEIV), the teacher effect is estimated as the term. As in the previous specifications of the SFE, the SFE teacher effects, derived from the residual, must not be associated with any other terms in the model. For the SFEIV, must not be associated with, contemporaneously. In both cases, serial correlation in is addressed by the instrument.

Note also that the IV specification assumes, in contrast to the random effects models and other

**fixed effects models, that the outcome two periods removed satisfies the exclusion restriction:**

that its effect on the outcome operates strictly through the endogenous variable and does not have a net direct effect on the outcome used to estimate the teacher effect. This implies that the relationship between 3rd grade performance and 5th grade performance must be completely mediated by 4th grade performance.

A final model is a simple pooled regression model (labeled by Guarino, Reckase, & Wooldridge [2012] as a dynamic ordinary least squares, or DOLS) that uses the panel of data but ignores the

**nesting of time within students, treating each observation as independent:**

Consortium for Educational Research and Evaluation–North Carolina 11 Comparing Value-Added Models August 2012 The DOLS bears a resemblance to both the HLM2 and TFE models, but uses the panel of data over multiple years instead of treating each grade level as a separate cross section. The teacher term. To satisfy the ignorability effects are estimated for all grade levels from the requirement of the potential outcomes model, the teacher effect must not be associated with excluded covariates that are also associated with the outcome.

**Hybrid Fixed and Random Effects Models**