# «University of Nebraska - Lincoln DigitalCommons of Nebraska - Lincoln Dissertations and Theses in Statistics Statistics, Department of 8-2010 ...»

Consequently, several limitations exist when defining what teacher effects really describe. Defining teacher effects requires identifying to what a particular teacher’s impact on a student’s growth in achievement will be compared, such as other teachers in the school, district, or entire state. The definition also depends on the outcomes used to measure achievement; the scope and purpose of the instruments can limit what is measured and, consequently, restrict the part of a teacher’s total impact on a student that can be estimated (McCaffrey et al., 2003). Other factors affecting students’ growth in achievement, such as characteristics of classrooms and schools, can be confounded with teacher effect estimates, so the purpose for obtaining such estimates needs to be clearly defined and should dictate how precisely the effects need to be estimated. Typically, teacher effects merely account for unexplained classroom-level heterogeneity (Lockwood, McCaffrey, Mariano, & Setodji, 2007).

Studies investigating value-added teacher effects provide evidence teachers have differing effects on student learning (Rivkin, Hanushek, & Kain, 2005; Rowan, Correnti, & Miller, 2002; Wright, Horn, & Sanders, 1997) that persist over time (Sanders & Rivers, 1996), but these studies have shortcomings (McCaffrey et al., 2003). Section 2.2 describes proposed value-added models for estimating teacher effects and discusses their respective advantages and disadvantages. Section 2.3 covers the impact of different modeling decisions on the estimation of teacher effects, and Section 2.4 highlights various statistical and psychometric issues associated with estimating such effects. The chapter concludes with a summary of the current state of value-added modeling research and recommendations for future work.

2.2 Value-Added Models Multiple authors have championed the use of value-added models to analyze longitudinal student achievement data (Doran, 2003; Drury & Doran, 2003; Hershberg, Simon, & Lea-Kruger, 2004; Lissitz, 2005; Sanders, Saxton, & Horn, 1997). These methods fall into three categories: covariate adjustment models, gain score models and multivariate models (McCaffrey et al., 2003).

2.2.1 Covariate Adjustment Models Covariate adjustment models, for example,

regress each student’s current achievement score, y ig, on his or her prior score, y i, g 1, for the year of data collection, g = 1, 2, 3, …, m. The student-specific mean, ig, adjusts for factors affecting a student’s level of achievement, such as free-and-reduced lunch and English Language Learner (ELL) identifiers. It can also account for many other factors, including characteristics of schools. The teacher effect, Tg, estimates the current year

scores and teacher effects.

Covariate adjustment models are easy to specify and fit, and they do not require performance on measures used in successive years to be placed on a single developmental scale so growth can be measured across grades or ages. This is particularly beneficial for school systems using a mixture of norm-referenced and/or criterion-referenced tests, where reported student scores from the two types of instruments reflect different measurements: either relative academic performance or proficiency on predetermined criteria, respectively. Teacher effects from prior years are embedded in the previous year’s score, so the effects persist indefinitely even though they are not explicitly specified in subsequent years’ models. However, information is lost about student performance by estimating models separately for each year, so critics argue these methods are not really measuring student growth. Covariate adjustment methods also require complete student records, so missing student outcomes must either be imputed or removed from the analysis. When data are not missing completely at random, list-wise deletion can lead to biased estimates of all effects. In general, covariate adjustment models are easy to work with, but potentially over-simplify the complexity of student growth over time.

treat the difference between two successive scores, d ig, for student i as the response for the gth year of data collection. The student-specific mean, ig, adjusts for factors affecting a student’s growth from one year to the next. It can account for many factors, including characteristics of schools. The teacher effect, Tg, estimates the current year teacher’s impact on a student’s growth. The residual errors, eig, are assumed to be normally distributed with mean zero and variance eg and independent of the teacher effects.

Gain score models are also easy to specify and fit. These methods model students’ gains in scores, so time-invariant factors, such as gender, race and poverty level, affecting a student’s level of achievement need not be estimated. Prior year teacher effects are not typically specified in the model, which assumes they persist undiminished over time.

Although “covariate” methods do not require tests to be on a single developmental scale, “gain” methods do, so changes in performance are not confounded with changes in tests (McCaffrey et al., 2003). In addition, gain score models require complete student records and lose information about student growth by assuming pairs of gains for the same student are independent. Overall, gain score models are easy to work with and explicitly model student growth in achievement, but they have stringent scale requirements and potentially over-simplify the complexity of student growth over time.

Multivariate methods jointly model all student scores, including relationships between each student’s set of outcomes. These approaches also accommodate missing data, making efficient use of all available information. Specifying a multivariate model provides more flexibility, allowing the exploration of several assumptions, such as the persistence of teacher effects and the residual covariance structure of student outcomes.

In some instances, these models are robust to omitted covariates, but they are computationally intensive and require much more in the way of computing resources than either the gain score or covariate adjustment methods. Even though multivariate methods are often recommended, they are not widely adopted because the required resources and high-quality longitudinal data are not readily available. Three common multivariate approaches include the cross-classified model (Raudenbush & Byrk, 2002), the Education Value-Added Assessment System (EVAAS) teacher model (Sanders et al., 1997), and the variable persistence model (McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004).

Hierarchical linear models (HLMs) can model students’ growth over time, but when assessing teachers’ influence on rates of learning, the models require each outcome to belong to only one student, who in turn remains in a single teacher’s classroom over the course of the study (Raudenbush & Byrk, 2002). The nested structure required by HLMs is a limitation when studies want to model students’ growth over the course of several years and, subsequently, several teachers. Rather, students who share the same teacher(s) in one year do not share the same teacher(s) in the next year, resulting in a cross-classified structure (Rasbash & Browne, 2008; Raudenbush & Byrk, 2002); HLMs do not model this additional complexity.

Cross-classified models, for example,

model scores, y i,( g 1), for student i at the g = 1, 2, 3, 4 year of data collection as a function of the average achievement, , and the average learning rate, . The studentspecific intercepts, mi, and slopes, bi, are assumed to be independent across students and normally distributed with mean zero, variances 0 and 1, respectively, and covariance 01. The random teacher effects, Tk, are the expected deflections to students’ growth curves when encountering teacher k. These effects are assumed to be independently, normally distributed with mean zero and constant variance across years. Teacher effects are also assumed to be independent of all other effects in the model. The random errors, eig, are assumed to be normally distributed and independent of each other, both within and between students, because the individual growth curves are assumed to “capture all the student-related influences on scores” (McCaffrey et al., 2003, p. 58).

More generally, the cross-classified model from above can be specified as,

where ai,( g 1) assumes the value (g-1) at year g, and the term Dhik 1 if student i encounters teacher k at time h; Dhik 0 otherwise. The teacher effects, Tk, are summed over time, so a student’s score is attributed to all previous and current teachers the student had for a particular subject. These types of models can also be extended to include other factors, such as student- and teacher-level covariates (Raudenbush & Byrk, 2002), as well as higher-order polynomials in g (Raudenbush, 2004).

Cross-classified models explicitly model individual growth curves, often using a linear trend instead of separate means for each year. The linear trend used to model student growth places restrictions on the residual error covariance matrix. Subsequently, whenever the covariance between mi and bi, 01, is greater than zero, the variability of scores increases over time (McCaffrey et al., 2004). Raudenbush (2004) acknowledged cross-classified approaches have stronger variance assumptions than models with unstructured variance-covariance matrices, but he argued this additional assumption potentially makes more appropriate and efficient use of student achievement data.

Because cross-classified models can become complex quickly, other constraints may also need to be imposed to simplify a model.

In the cross-classified model described, teacher effects persist undiminished into the future, so contributions of past as well as current teachers, are accounted for in a student’s set of scores. Consequently, the variability due to teacher inherently increases with each additional year of data collection (McCaffrey et al., 2004). Scores must also be on a single developmental scale (McCaffrey et al., 2003).

One prominent multivariate longitudinal linear mixed model is the Education Value-Added Assessment System (EVAAS) layered model (Sanders et al., 1997). This approach assumes teacher effects are independent and persist undiminished over time and subject. For a single track of students within a school system, a simplified version of the EVAAS model for a particular subject, such as math or reading,

models scores, yig, for student i at year g = 1, 2, 3 as a function of year-specific means, g. Random teacher effects are included for all teachers a student encounters for the subject during the course of the study. The teacher effects are assumed to be normally distributed with zero mean and year-specific variances; these effects are assumed independent both within and across year. The random errors, eig, are assumed to be normally distributed and independent across students. Within-student correlations are assumed to follow an unstructured covariance structure with time-specific variances.

EVAAS jointly models more than one subject per grade for multiple cohorts of students across several school systems. The EVAAS teacher model (Sanders et al., 1997),

is much more complex, where y ijklmn is the measurement on the nth student in the mth subject and the lth grade who encountered the jth teacher in the ith school system during the kth year. Separate fixed means, iklm, are estimated for each grade, year, subject and school system combination. The random effect of the jth teacher who taught in the ith school system during the kth year, lth grade and mth subject is t ijklm, and cijklm is the fractional contribution of teacher j to the student’s score, accounting for instances when a student has multiple teachers for a subject in the same year. Finally, eijklmn, is the random deviation of the nth student’s measurement from the fixed mean.

The cm ( ijkl ) p t m (ijkl ) p terms are summed so a student’s score is attributed to all previous and current teachers the student had for subject m, creating a layered model. The teacher effects are summed over the index p, which tracks the student across years and allows for multiple teachers in the same year. The total number of teachers a student had though year k in subject m is N mk.

The random teacher effects are assumed independent across teacher, subject, grade and time, even when the same teacher teaches multiple subjects, grades and/or years. Separate teacher variance components are estimated for each year, subject and grade combination, creating a heterogeneous, diagonal variance-covariance matrix for the random teacher effects. The EVAAS teacher model also uses an unstructured variancecovariance matrix to account for relationships between each student’s set of scores across subjects and grades, but assumes different students’ scores are independent. Teachers’ impact is analyzed based on at least three years of student data (Sanders et al., 1997).

In both the cross-classified and the EVAAS teacher models, teacher effects persist undiminished into the future, so contributions of both current teachers and past teachers are accounted for in a student’s set of scores. Consequently, the total teacher contribution to the variability of scores increases over time, even though the total variance may not, depending on the testing instrument used (McCaffrey et al., 2004). However, the EVAAS model, unlike the cross-classified model, does not place restrictions on the overall gradespecific means or the covariance structure of repeated measurements on the same student (McCaffrey et al., 2004; Wright, Sanders, & Rivers, 2006). The unstructured withinstudent covariance matrix allows each student to serve as his or her own control, making it unnecessary to account for factors affecting a student’s level of achievement (Sanders et al., 1997). Yet, both of these models are computationally intensive and require scores be on a single developmental scale (McCaffrey et al., 2003).

McCaffrey et al. (2004) proposed the variable persistence model, a generalized version of multivariate longitudinal models for student outcomes. This approach is similar to the EVAAS teacher model, but it allows prior teachers to have variable contributions to current scores rather than assuming complete persistence of these effects.

For a single track of students within a school system, a simplified version of the variable persistence model for a particular subject,

models scores, yig, for student i at year g = 1, 2, 3 as a function of year-specific means, g. Random teacher effects are included for all teachers a student encounters for the subject during the course of the study. The teacher effects are assumed to be normally distributed with zero mean and year-specific variances; these effects are assumed independent both within and across year. The persistence of prior teacher t on subsequent scores at year g is gt, which is estimated. The random errors, eig, are assumed to be normally distributed and independent across students. Within-student correlations are estimated using an unstructured covariance structure with time-specific variances.

The variable persistence model can be extended to include student- and schoollevel covariates (McCaffrey et al., 2003). A special case of the variable persistence model (Lockwood, McCaffrey, Mariano, et al., 2007),