# «University of Nebraska - Lincoln DigitalCommons of Nebraska - Lincoln Dissertations and Theses in Statistics Statistics, Department of 8-2010 ...»

1). In the non-layered model, each student’s outcome in a given year is linked only to the current teacher. In contrast, the layered model links a student’s achievement to current and previous teachers within a given time span. Therefore, the Z matrix for the layered model can have several “1”s in a row, connecting past teachers with subsequent student outcomes.

matrices for t b, t a and t c, respectively. The covariate, βa, accounts for the impact of a professional development program, and X a is the coefficient matrix to track participation. The intercept, μ, can be estimated to be the same value for all students.

The Zb, Z a and Z c matrices are constructed so that for each teacher, the teacher effect before participating in a professional development program is pb tb tc,

p a ~ N X a βa, a I c2 I ; and their covariance is Cov pb, pa c2. In this model, the teacher effects before and after program participation are assumed to have different variances, but not to be independent. This is a reasonable assumption, because teachers’ effects before and after participating in a professional development program should be related.

The difference between each participating teacher’s effects, pa pb a ta tb, is specified as the impact of the professional development program on his or her students’ achievement. While Sanders et al. (1997) and others (McCaffrey et al., 2004) estimate separate teacher effects for each combination of grade and year in a set of longitudinal data, only two overall teacher effects (one before and one after participation) are estimated for each teacher over the time period spanned by the data.

** Table 3.2 illustrates how the Z matrices change in the modified model (Equation 3.**

2). A new variable, “PD,” is added, indicating whether a teacher had participated in the professional development program (1) or not (0). The corresponding Z b and Z a matrices displayed link each teacher to the student outcomes, while also distinguishing whether or

teacher A. However, student 01 had teacher A before the teacher had participated in the program, while student 02 had teacher A after participation. Therefore, the Zb matrix links student 01 to teacher A, while the Za matrix links student 02 to teacher A. The Zc matrix is the sum of the Z b and Z a matrices, resulting in the same Z matrix for the layered model (Table 3.1).

With the EVAAS model, changes in raw scores are not meaningful when test scores in successive years are not on a single developmental scale. To compensate for this problem, standard Z-scores can be used. More specifically, for grade i and student j in a given academic year, the original test score is S ij. The corresponding Z-score is

of scores in grade i.

In a given academic year, Z ij indicates how many standard deviations the original score S ij is away from the average score for a grade. Changes in Z-scores reflect changes in relative position across years for a group of students, but not necessarily changes in academic achievement, when measures are on different developmental scales (McCaffrey et al., 2003). The standardized scores allow for within-group comparisons across academic years.

where g is the number of grades for which data have been collected on a student through that academic year. In the EVAAS model, ck 1 for all k, so the variance of an outcome inherently increases with g. When the standard Z-score is used as the response variable in Equation 3.1, its variance is restricted to one. Thus, the layered Z matrix was

proposed method of equally weighting each previous and current teacher’s contribution to a student’s score in a given academic year was adapted to take into account the constraint imposed by the fact Z-scores, by definition, must have a variance of one.

The same method was also applied to the standardization of the Zc matrix when using Z-scores as the response for Equation 3.2. The nonzero elements in the Za and Zb matrices were then assigned the corresponding standardized weights from Zc (Table 3.4).

Elements in the coefficient matrix, X a, were defined as the sum of the standardized weights in the Za matrix to track the weighted frequency of previous and current teachers participating in the professional development program and attributed to each student outcome.

3.3 Example: Math in the Middle Institute Partnership The Math in the Middle Institute Partnership (M2) is an NSF-funded mathematics professional development program at the University of Nebraska-Lincoln aiming to “create a University/Educational Service Unit (ESU)/Local School District partnership with the capacity to educate and support teams of outstanding middle level (Grades 5-8) mathematics teachers to become intellectual leaders in their school, districts and ESUs” (Lewis, Heaton, McGowan, & Jacobsen, 2004, p. 1). One major component of M2 is the M2 Institute, a multi-year institute offering participants a coherent program of study to deepen their mathematical knowledge for teaching and develop their pedagogical and leadership skills. The second M2 project component is a research initiative to understand how changes in teachers’ mathematics teaching practice translate into measurable improvement in student performance. The program consists of six cohorts, or groups, of mathematics teachers in Nebraska whose entrances were staggered yearly, with the first cohort beginning in October, 2004.

As part of the research initiative, this example focuses only on the analysis of student achievement data from 2003-04 to 2007-08. During this time, each school district in Nebraska was free to choose whatever student achievement measure it deemed appropriate; a variety of criterion- and norm-referenced tests were administered to various grade levels at various points during the school year across districts.

Consequently, student achievement scores from different districts are not directly comparable. Considering these issues, attention is restricted to one of the larger participating school districts, Middleview Public Schools1 (MPS). Typically, MPS middle schools have roughly 27 students in a mathematics class. The district’s mobility rate has ranged between 14.29% and 17.82% since the 2003-2004 academic year. Data were collected from 317 MPS 5th-8th grade teachers who taught mathematics between the 2003-04 and 2007-08 academic years, 37 of whom were M2 participants. Student achievement data were collected for 5th-8th grade students in the district (Table 3.5), as Names are pseudonyms.

were survey data from the teachers. Yet, no existing statistical models were adequate to address the MPS assessment practice of administering a mix of norm- and criterionreferenced tests to various grade levels each year.

** Table73.5: MPS Middle School Assessments between 2003-04 and 2007-08 MPS has shared annual middle school student achievement data with M2 researchers since the 2003-2004 academic year.**

This example examines the data from 2003-04 to 2007-08. These data include several variables, such as each student’s grade level, mathematics teachers, mathematics courses and mathematics achievement scores.

Within each grade level and year combination students took a criterion-referenced test (CRT) and/or the Metropolitan Achievement Test (MAT). The MAT is a normreferenced test whose purpose is to test “a broad range of students with real world content,” and contains questions to measure basic skills and knowledge, as well as “critical thinking processes and strategies” (Pearson, 2008, 1). The CRTs were developed by MPS mathematics teachers. Questions were written according to specific, predetermined MPS mathematics standards, exceeding the state mathematics content standards. Students’ CRT scores reflect proficiency on the criteria, rather than relative academic performance.

The EVAAS model used by Sanders et al. (1997) and the variable persistence model proposed by McCaffrey et al. (2004) require, respectively, scores from year-toyear be on a single developmental scale or linearly related. Each of the two types of tests used by MPS has a different purpose and potentially measures different mathematical abilities. Scores are not provided on a single developmental scale, so changes in student achievement are not necessarily directly comparable from one grade to the next.

3.3.1 Modified Layered Model Implementation The modified layered model (Equation 3.2) using standardized layered Zb, Z a and Z c matrices was applied to the MPS middle schools mathematics achievement scores, standardized within each grade and academic year combination. When a student had scores available for both tests in a given academic year and grade combination, the CRT score was used in the analysis. This decision was made in conjunction with MPS senior personnel. They reported students tended to take the CRT more seriously than the MAT, because performance on the CRT is a component of students’ math course grades, and performance on the MAT is not. Additionally, curricula and assessment analyses performed by Smith (2004) revealed the CRT to be a better match to M2 goals than the MAT. Thus, CRT scores were used in years when students had both CRT and MAT scores.

The majority of fifth grade scores were missing teacher links, because fifth grade student achievement data are part of the district’s elementary, rather than middle, school data set; with the exception of fifth grade scores linked to M2 participants, none of the elementary data are linked to specific teachers. This was the lowest grade level for which data were collected, and removal of the scores would eliminate information about student performance in an entire grade level. Consequently, information key to establishing a baseline for student performance would be lost. Hence, missing fifth grade teachers effects were assigned the value zero. In the few other instances where teacher links were missing, the student scores were still included in the analysis to better track each student’s changes in achievement across time.

In some cases, a student had more than one math teacher in a given school year.

This is due to the MPS practice of assigning students performing below grade level in mathematics to a Mathematics Intervention class in addition to the regular grade level mathematics course; thus, students are enrolled in a second math course, in which they receive additional instruction in an effort to bring their achievement up to meet grade level standards. Prior to eighth grade, such intervention courses meet every other day (while mathematics classes meet daily); in eighth grade students attend both courses daily. While ideally students would have the same teacher for both the regular mathematics course and the intervention course, scheduling difficulties meant some students had two different teachers. When this happened, students were linked to the regular mathematics course teacher rather than the intervention course teacher.

Using standard mixed models methodology (Littell, Milliken, Stroup, Wolfinger, & Schabenberger, 2006) with Z Z b | Z a | Z c and t' t 'b | t 'a | t 'c , the effect of a

GLIMMIX procedure in SAS, Version 9.2 (SAS Institute Inc., 2008).

3.3.2 Results Table 3.6 shows the estimated variance components and fixed effects. The estimated variances of the before and after participation teacher effects are 0.179 and 0.141, respectively, indicating the teacher effects before participating in M2 are estimated

** Table 3.6: Variance Components and Fixed Effects Estimates to be slightly more variable than the teacher effects after participation.**

The covariance between a teacher’s before and after participation effects is estimated to be 0.018, and the resulting estimated correlation is 0.112. Consequently, these variance components do not seem reasonable, because a teacher’s effect on student learning before participating in a professional development program is expected to be at least moderately correlated with his or her effect after participation. The overwhelming number of fifth grade scores with missing teacher links appears to create a lot of noise, as evidenced by the relatively large estimated fifth grade residual error variance, 0.945. Yet, subsequent residual errors on the same student are strongly related, with estimated correlations ranging between 0.710 and 0.825.

The estimated intercept, -0.065, is the starting point from which to base subsequent teacher impacts. Although the covariate accounting for program participation is not significantly different from zero, it is estimated to be 0.019 and allows the center of the distribution of teacher effects after participating in M2 to shift from the assumed mean of zero.

** Figure 3.1 compares the estimated before participation teacher effects for M2 participants and non-participants, illustrating the mean teacher effect for program participants (mean = 0.**

053, standard deviation = 0.372, n = 37) is estimated to be slightly higher but not statistically (p =.1671) different than that of non-participants (mean =

-0.038, standard deviation = 0.359, n = 280). From this, there do not appear to be strikingly obvious differences between non-participants and participants, prior to entering the program.

Note. The horizontal axis is in terms of standardized units.

Figure43.1: Comparison of Before Participation Effects between Teachers Participating (n = 37) and Not Participating (n = 280) in M2

in participants’ effects on student learning (Figure 3.2). The average difference between each M2 participant’s estimated after and before participation teacher effects is -0.030 with a standard error of 0.080, while the median difference, -0.007, is slightly higher.

This indicates the average change in each teacher’s predicted teacher effect after program participation is slightly, though not statistically (p =.7099, df = 36), different from zero.

While these results do not suggest a participation effect, the potential for ceiling effects with criterion-referenced tests exists. The tests are constructed to determine whether students meet pre-determined proficiency levels on specific mathematics criteria, and it is possible for students to consistently answer most to all questions on these assessments correctly. In such instances, the tests are unable to detect changes in a student’s achievement across time and, consequently, limit discernment of value-added teacher effects. Thus, some teacher’s value-added effects could be underestimated due to many students in teachers’ classrooms reaching the ceiling of the criterion-referenced tests, which is a limiting feature of the instrument, but not necessarily of the model.

Note. The horizontal axis is in terms of standardized units.

Figure53.2: Comparison of Differences between Before and After Participation Effects for MPS Teachers (n = 37) and a Subset of MPS Teachers (n = 22) Participating in M2