# «University of Nebraska - Lincoln DigitalCommons of Nebraska - Lincoln Dissertations and Theses in Statistics Statistics, Department of 8-2010 ...»

Issues associated with the construction and scaling of instruments used to measure student achievement also exist. Typically, measures of student achievement are assumed to be on an interval scale, where any difference in scores has the same meaning at any point on the scale. For example, a student with test scores of 40 and 60 in consecutive years is assumed to have made the same amount of growth as a student with scores of 20 and 40. However, linking scores from different tests to a single scale for comparisons across grades may require nonlinear transformations, where rates of growth no longer have the same meaning across all ability levels. As illustrated in the previous example, students with different levels of ability may have the same rates of achievement on the original test scales, but after transforming the scores nonlinearly to a single developmental scale, the students could potentially have different rates of change; the change from 20 to 40 may be more or less significant than the change from 40 to 60 and may subsequently impact teacher effect estimates (McCaffrey et al., 2003). Teacher effect estimates may also be sensitive to the alignment of test content with curricula, and the weighting of test content to obtain one overall achievement score (Lockwood, McCaffrey, Hamilton, et al., 2007; Martineau, 2006).

2.5 Summary and Future Work Value-added modeling techniques estimate the contribution of educational factors, such as teachers, to growth in student achievement, while allowing for the possibility to control for the effect of non-educational factors. Several value-added models for estimating teacher effects have been proposed as alternatives to current testbased accountability procedures, such as AYP, but each has its respective advantages and disadvantages. Although these methods have the potential to identify highly effective teachers, teacher effect estimates are sensitive to different modeling specifications, including the persistency of teacher effects. Several statistical and psychometric issues exist, and sensitivity of teacher effects to such issues needs to be explored. Consequently, considerations should be made when defining what teacher effects really describe, and teacher effect estimates should be linked to other valid measures of teacher effectiveness.

Although value-added teacher effect estimates should not be used in isolation to make high-stakes decisions, value-added methodology can help researchers identify what characteristics highly effective teachers possess and motivate informed improvements in education (McCaffrey et al., 2003).

Chapter 3 Estimating the Impact of a Professional Development Program on Student Learning

3.1 Introduction Professional development programs focus on preparing teachers to meet the recent initiatives on improving the quality of mathematics instruction, but rigorous evaluations are needed to determine whether these programs are actually effective (Carey, 2004; Guskey, 1994; Hill, 2007a; Loucks-Horsley, Stiles, & Hewson, 1996;

National Mathematics Advisory Panel [NMAP], 2008; Shaha, Lewis, O’Donnell, & Brown, 2004). According to Hill (2007a), Almost no local professional development—and even most efforts offered by respected university faculty, nonprofit, and commercial professional developers—is rigorously evaluated, in the sense of researchers looking for changes in teacher knowledge and instructional practice. Even more seldom do researchers investigate the effect on student learning. More often, evaluations simply ask participants to report whether and how the program affected their own teaching. (p. 122) However, teacher self-reports tend to be unreliable and subjective measures of program effectiveness (Borko, 2004; Wilson & Berne, 1999).

Past research has examined the relationship between teaching quality and student learning, investigating the effect teachers’ degrees (Ackerman, Heafner, & Bartz, 2006;

Rowan, Correnti, & Miller, 2002; Wayne & Youngs, 2003), coursework (Hill, Rowan, & Ball, 2005; Wayne & Youngs, 2003), certification status (Ackerman et al., 2006; Hill et al., 2005; Rowan et al., 2002; Wayne & Youngs, 2003), teaching experience (Ackerman et al., 2006; Hill et al., 2005; Rowan et al., 2002; Wayne & Youngs, 2003), licensure examination scores (Wayne & Youngs, 2003), pedagogical practices (Ackerman et al., 2006; Rowan et al., 2002) and amount of professional development (Ackerman et al.,

2006) have on student achievement. Recent efforts measure not only teachers’ mathematical content knowledge, but also their mathematical knowledge for teaching (Ball, Hill, & Bass, 2005; Hill, 2007b; Hill, Schilling, & Ball, 2004), examining its relationship to student achievement (Hill et al., 2005). Various covariate adjustment models and gain score models have been used in conjunction with statistical models, such as analysis of variance (ANOVA), analysis of covariance (ANCOVA) (Sanders, 2006) and hierarchical linear models (HLM) (Raudenbush & Bryk, 2002; Rowan et al., 2002;

Wright, Sanders, & Rivers, 2006), to estimate teacher effects. However, researchers criticize these procedures for biasing teacher effects and/or modeling students’ achievement status instead of changes in achievement (Rowan et al., 2002), as well as for the methods’ approaches for handling missing data (Sanders, 2006). Unfortunately, “assertions about the magnitude of teacher effects on student achievement depend…on the methods used to estimate these effects and on how the findings are interpreted” (Rowan et al., 2002, p. 1536).

Cross-classified models (Raudenbush & Bryk, 2002) and the Educational ValueAdded Assessment System (EVAAS) model (Sanders, Saxton, & Horn, 1997) are currently recommended over other models to provide estimates of teacher effectiveness.

The EVAAS model is a longitudinal linear mixed effects model that has each student serve as his or her own control, similar to the cross-classified model which models individual growth curves. Using the EVAAS model, Sanders et al. (1997) have been able “to produce estimates of school and teacher effects that are free of socioeconomic confoundings and do not require direct measures of these concomitant variables” (Wright, Horn, & Sanders, 1997, p. 58). In fact, Sanders (2000) claims, “Our research work…clearly indicates that differences in teacher effectiveness is the single largest factor affecting [students’] academic growth” (p. 334); teachers are the dominant factor impacting student progress (Sanders, 2004; Wright et al., 1997). Darling-Hammond (2000) adds, “[E]ffects of well-prepared teachers on student achievement can be stronger than the influences of student background factors, such as poverty, language background, and minority status” (Conclusions and Implications, 6).

With teacher effectiveness linked to student achievement, questions remain about what factors influence the quality of teaching (Carey, 2004). Professional development is one factor thought to influence teaching quality. Desimone, Porter, Garet, Yoon, and Birman (2002) write, “Professional development is considered an essential mechanism for deepening teachers’ content knowledge and developing their teaching practices” (p.

81). Various authors list characteristics of effective professional development programs (Desimone et al., 2002; Garet, Porter, Desimone, Birman, & Yoon, 2001; Guskey, 1994;

Loucks-Horsley et al., 1996), but rigorous evaluations are needed to determine whether these programs actually affect teaching quality (Carey, 2004; Hill, 2007a; LoucksHorsley et al., 1996; NMAP, 2008; Shaha et al., 2004).

Previous research has examined the relationship between teacher quality and student learning (Ackerman et al., 2006; Darling-Hammond, 2000; Hill et al., 2005;

Presley, White, & Gong, 2005; Rowan et al., 2002; Wayne & Youngs, 2003), while other research has investigated the value-added effects of teachers on student learning (McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004; Rowan et al., 2002; Sanders & Rivers, 1996; Sanders et al., 1997; Wright et al., 1997). Yet, the relationship between teacher development and teacher practices, as well as student learning still needs to be explored (Fishman, Marx, Best, & Tal, 2003; Frome, Lasater, & Cooney, 2005; NMAP, 2008).

Existing research tries to estimate the effect of professional development programs with a dichotomous variable (Shaha et al., 2004; Stroup, 2007; Stroup & Fang, 2006). Typically, teachers are assigned either a value of one to indicate their participation or a value of zero to indicate their absence of participation in a program. However, the discrete nature of this approach neglects the interactive nature of teachers and the possibility of creeping excellence, where program participants share their newly acquired knowledge and ideas with non-participating teachers. This approach also disregards the teachers’ varying degrees of participation and changes in practice. Hill (2007a) writes, “Although teachers might be required to engage in professional development, they are not required to learn from it” (p. 123). A teacher’s participation in professional development opportunities does not necessitate actual learning or changes in teacher beliefs and practices. It may also be the case a teacher’s absence of participation in a professional development program does not predicate his or her lack of teaching quality.

Instead, estimating the change in a teacher’s effect on student achievement after participating in a professional development program can be an alternative approach for estimating the impact of such a program; this approach allows each teacher to serve as his or her own control and helps address the complexities ignored by merely comparing the effects of participating teachers on student achievement to those of non-participating teachers. In this chapter, alternative methodology for using less-than-ideal longitudinal student achievement data to estimate the impact of a professional development program is proposed and applied to data collected from a mathematics professional development program, the Math in the Middle Institute Partnership (M2). The chapter concludes with a summary of the results and recommendations for future work.

3.2 Methods for Estimating the Impact of a Professional Development Program on Student Learning In recent years, education systems, in theory, have held students to higher academic standards (No Child Left Behind, 2001) by holding states accountable for assessing measurable student outcomes. Research efforts have addressed issues associated with analyzing student achievement data (McCaffrey, Lockwood, Koretz, & Hamilton, 2003), but many of the recommended approaches have not been widely adopted because the required resources and high-quality longitudinal data are not readily available. Most value-added modeling (VAM) approaches require student achievement data to be vertically scaled, or at least linearly related, over time (McCaffrey et al., 2003).

Such requirements limit analyses that can be conducted on available assessment data, which often are not on a single developmental scale. Few studies have addressed how to use value-added models to analyze achievement data not on a single developmental scale (Rivkin, Hanushek, & Kain, 2005), and even fewer have discussed how to use these data to estimate the impact of professional development on students. The purpose of this section is to investigate how to use a value-added model for analyzing longitudinal student achievement data collected from a mixture of norm- and criterion-referenced assessments to estimate the impact of a professional development program on student learning.

Multiple authors have championed the use of value-added models to analyze longitudinal student achievement data (Doran, 2003; Drury & Doran, 2003; Hershberg, Simon, & Lea-Kruger, 2004; Lissitz, 2005; Sanders et al., 1997). These methods fall into three categories: covariate adjustment models, gain score models and multivariate models (McCaffrey et al., 2003). Covariate adjustment models regress each student’s current achievement score on his or her prior year score, while gain score models treat the difference between two successive years’ scores as the response. Both methods require complete student records and lose information about a student’s performance over time by estimating models separately for each year. Although “covariate” methods do not require tests to be on a single developmental scale, “gain” methods do so changes in performance are not confounded with changes in tests (McCaffrey et al., 2003).

Multivariate models jointly model all student scores, including relationships between each student’s set of outcomes. These approaches also accommodate missing data, making efficient use of all available information. One prominent multivariate longitudinal linear mixed model is the Education Value-Added Assessment System (EVAAS) layered model (Sanders et al., 1997). This approach assumes teacher effects are independent and persist undiminished over time and subject. A more general version was proposed by McCaffrey et al. (2004) where prior year teachers have variable contributions to current year scores. The variable persistence model only requires scores be on linearly related scales, but the EVAAS model requires scores be on a single developmental scale (McCaffrey et al., 2003). Although computationally intensive, the layered modeling approaches have advantages over other methods (Sanders, 2006;

Wright & Sanders, 2008).

Studies investigating VAM teacher effects provide evidence teachers have differing effects on student learning (Rivkin et al., 2005; Rowan et al., 2002; Wright et al., 1997) that persist over time (Sanders & Rivers, 1996), but these studies have shortcomings. Statistical and psychometric issues arise when estimating teacher effects using longitudinal student achievement data (McCaffrey et al., 2003). Lockwood, McCaffrey, Hamilton, et al. (2007) showed estimated VAM teacher effects are sensitive to the ways student achievement is measured. This is particularly problematic when scores are not on a single developmental scale. Rivkin et al. (2005) standardized criterion-referenced test scores to have a mean of zero and a standard deviation of one for each cohort of students within a given academic year to address differences between tests. However, students could not be matched to specific teachers, so only subject- and grade-level means were used to model gains. Although a Z-score approach has limitations (McCaffrey et al., 2003), it is more appropriate to use for modeling gains in achievement than raw test scores when instruments are not on the same scale. It is important to investigate how a Z-score approach can be used on less-than-ideal data to not only estimate teacher effects, but also the impact of a professional development program on student learning.

3.2.2 EVAAS Layered Teacher Model For a single subject, such as mathematics, a simplified version of the EVAAS model,

is a special case of the linear mixed model (Laird & Ware, 1982) where y is a vector of test scores, μ is a vector of means, and Z is the coefficient matrix for t, the vector of

assumed to be normally distributed with E (e) 0 and Var (e ) R. Residuals from different students are assumed to be independent, but residuals on the same student are assumed to be correlated and are modeled using an unstructured within-student covariance structure. This complex covariance structure accounts for variables affecting students’ levels of achievement and is used instead of non-instructional student-level covariates (Wright & Sanders, 2008).

** Table33.1: Comparison of Z Matrix in Non-Layered and Layered Models Wright and Sanders (2008) distinguish between the layered and non-layered model in the construction of the Z matrix (Table 3.**