# «University of Nebraska - Lincoln DigitalCommons of Nebraska - Lincoln Dissertations and Theses in Statistics Statistics, Department of 8-2010 ...»

Student achievement data from Middleview Public Schools2 (MPS) exhibit such complexities. Over time, a variety of criterion- and norm-referenced tests have been administered across the middle-level grades within MPS schools. As shown in Table 4.2, student achievement data were collected for 5th-8th grade students in the district between the academic years 2003-2004 and 2007-2008. Green et al. (under review) proposed ZNames are pseudonyms.

score methodology to address the MPS assessment practice of administering a mix of norm- and criterion-referenced tests to various grade levels each year. However, the proposed method does not use information from multiple instruments administered in a given year; in each year, scores from only one instrument are used. Thus when multiple assessments are given in a year, the researcher must choose which test to use. A simulation study was conducted to compare the use of a Z-score model to a curve-offactors model when data were simulated assuming a curve-of-factors model structure. Of particular interest was how estimated teacher effects and percentiles change when analyzing data with this alternative approach instead of the correct model. Model parameters specified for the simulation were based on summary statistics obtained from the MPS student achievement data. Comparisons across models were made for both a complete and a missing tests case to investigate the impact of having only one, instead of two, indicators of a construct in one or more years.

** Table94.2: MPS Middle School Assessments between 2003-04 and 2007-08 4.**

3.1 Simulation Study Description Student achievement data were simulated for 2,000 students over the course of four years. Each year, data were simulated from two different instrument scales assuming a curve-of-factors model structure (Equation 4.3), for a total of eight observations per student. The MA-indicators and the CA-indicators, representing the MAT and CRT

**assessments in the MPS data set, were simulated assuming the following mean structures:**

variance for each indicator was 225 units squared. Random measurement errors, e, were assumed to be normally distributed with E (e) 0 and Var (e) R e2 I 2000 I 8 .

Measurement error variances were simulated to be equal for both instruments across all four years, with 20 percent ( e2 = 45) of the total indicator variance representing measurement error variance. Factor-level disturbances were simulated assuming an unstructured within-student covariance structure. The vector of random, factor-level

student were allowed to covary but disturbances for two different students were assumed to be independent. Because a layered model with complete persistence was assumed, the

which data were collected on a student through that academic year. Consequently, the variance of an outcome inherently increases with g unless other adjustments are made.

For this simulation, the disturbances were simulated to have decreasing variance over time to adjust for the assumed complete persistency of teacher effects, as specified by the layered Zt coefficient matrix, and the subsequent increase in variability accounted for by teachers across time. An alternative approach would be to allow indicator variances to increase over time, but the MPS student achievement data motivating the simulation did not support the use of this strategy. Within each year, 20 different teachers were each randomly assigned 100 students. The vector of random teacher effects, t, was assumed to be distributed N 0, t2 I80 with constant variance, t2 = 22.5.

Each of the 1000 simulations was analyzed using both the curve-of-factors model from which the data originated (Equation 4.3) and an alternative approach, referenced in this chapter as the Z-score model. The Z-score model,

is a modified version of the EVAAS model (Sanders, Saxton, & Horn, 1997), where Zt is a standardized layered coefficient matrix (Green et al., under review) and z is a vector of simulated achievement scores, standardized within each year for each of the two different types of instruments. The vector of standardized test scores for an instrument, z, is modeled by an overall intercept, μ, and a vector of random teacher effects, t, assumed to be distributed N 0, t2 I80 . Random errors, e, are also assumed to be normally distributed with E (e) 0 and Var (e ) R. Residuals from different students are assumed to be independent, but residuals on the same student are assumed to be correlated and are modeled using an unstructured within-student covariance structure.

Because a student had scores available for both tests in all four years, the Z-score method was applied twice: once using MA scores across all four years (MA Z-score model) and once using CA scores (CA Z-score model).

The 1000 simulated data sets were also modified to reflect one pattern of missingness occurring in the MPS data. The modified data sets were identical to those previously described; however, it was assumed the CA1 and the MA4 assessments were not given. Therefore, those simulated test scores were removed from the original data sets. Each of the 1000 modified simulations were analyzed using both a curve-of-factors model (Equation 4.3), where the random measurement errors, e, were assumed to be distributed N 0, e2 I 2000 I 6 , and the Z-score model (Equation 4.4). As shown in Table 4.3, three different Z-score models were specified in the missing tests case. The CA Z-score model used the standardized CA scores as its response when both instruments were administered in the same year. Alternatively, the MA Z-score model used the standardized MA scores when available. In the third Z-score model, the MA/CA

** Table 4.3: Assessments used as Responses for Simulation Analysis with Missing Tests model, the standardized MA2 assessment scores were specified as the response in year two, and the standardized CA3 assessment scores were specified as the response in year three.**

This third Z-score model differed from the other two because it used scores from the same type of instrument in two subsequent years rather than three subsequent years.

Data were simulated in SAS, Version 9.2 (SAS Institute Inc., 2008), and teacher effect estimates were obtained for each set of analyses using REML implemented in ASReml, Version 3.0 (Gilmour, Gogel, Cullis, & Thompson, 2009). For each simulation and model combination, the teacher effect estimates were ranked within year. Among the

largest. In instances where multiple teachers had the same predicted teacher effect, the teachers were assigned the mean of those corresponding ranks. The true teacher effects for each simulation were similarly ranked and denoted Rk ( i ). The corresponding

from model m for teacher k whose true percentile is p in year i and simulation q (Lockwood, Louis, & McCaffrey, 2002). Taking into account scaling considerations, the

For the Z-score models, the means and standard errors of the estimated teacher effect variance components are almost identical in the complete and missing tests scenarios. The same is true for the curve-of-factors model. In both the complete and the missing tests cases, the means and standard errors of the estimated teacher effect variances for the Z-score models are all 0.27 and 0.04, respectively. This finding is expected, because the only difference between a student’s mean-centered MA and CA scores for a given year is the random measurement error, assumed to have constant variance across test and year. The curve-of-factors models in the complete and missing cases have similar means and standard errors of the estimated teacher effect variances, with averages for both cases of 22.59 and standard errors of 3.72 and 3.73 for the complete and missing tests scenarios, respectively. The means of the estimated teacher effect variance components for the curve-of-factors models are close to the true teacher effect variance, 22.5, but this is to be expected because the data are generated assuming a curve-of-factors structure with complete data for all eight tests. The means of the estimated teacher effect variances for the Z-score models, 0.27, indicate teachers are estimated to account for approximately 27% of the total variability present in the data, or

60.75 units squared (as opposed to the true teacher effect variance, 22.5). This result appears contrary to what the curve-of-factors models suggest, but the increasing proportion of total variability accounted for by teachers each year is modified in the Zscore models through the standardized layered Zt coefficient matrix. Consequently, the percent of total variability accounted for by teachers in the Z-score models is a weighted average of the increasing proportion of total variability attributed to teachers in the simulation across years.

** Figure 4.3 shows the estimated RMSE for the different models with complete and missing tests for specific percentiles across the four years.**

The estimated RMSE is largest at the middle percentiles for all models, indicating the difficulty of accurately predicting a teacher’s true rank increases as the teacher’s true percentile moves toward the average. In year one, the estimated RMSE across percentiles is larger than that of the other three years, irrespective of the availability of all test scores or the model used. Across percentiles, the estimated RMSE does not change drastically from the complete to the missing case for the Z-score models. However, the magnitude of the estimated RMSE across percentiles for the curve-of-factors model that is noticeably smaller in the complete tests case tends toward the magnitude of the estimated RMSE of the Z-score models in the missing tests case.

The estimated bias at specific percentiles for the various models ranges from approximately two percentiles above to three percentiles below the true percentiles across the four years. As depicted in Figure 4.4, the bias at the extreme percentiles is largest in year one for all of the models, irrespective of whether data for all eight tests are available.

Similar to the pattern evidenced in the RMSE plots, the estimated bias across percentiles does not change drastically from the complete to the missing case for the Z-score models.

True Percentile mˆ Figure94.4: Comparison of Bias iP for the Curve-of-Factors (solid), CA Z-score (dash) and MA Z-score (dotted) Models with Complete and Missing Tests, and MA/CA Z-score (dot-dash) Model with Missing Tests for Specific Percentiles and Years curve-of-factors model in the complete tests case increases toward that of the Z-score models in the missing tests case. Because the magnitude of the bias is relatively small,

teachers truly at the 48th percentile and teachers truly at the 76th percentile. Because the simulation has only 20 teachers each year, the 48th and 76th percentiles are closest to the true percentiles of interest, the 50th and 75th percentiles. These distributions are compared across all years for the curve-of-factors and MA Z-score models with complete and missing tests. For clarity, the estimated distributions of other Z-score models are not included due to their similarity to the estimated sampling distributions for the MA Zscore models. In instances where multiple teachers have the same predicted teacher effect, the teachers are assigned the mean of those corresponding ranks. When creating these graphs, teachers originally assigned a non-integer rank are reassigned the next highest integer rank. For example, teachers assigned a rank of 14.5 are reassigned a rank of 15. This strategy errs in favor of the teachers, because it allows the teachers to be ranked higher, rather than lower, than their non-integer rank. Because ties in rankings are infrequent, this strategy only eliminated low relative frequencies representing rare occurrences, rather than meaningful information, in the plots. Comparing the estimated sampling distributions across true percentiles, years and models, there do not appear to be noteworthy differences between those for complete tests and those for missing tests. The estimated distributions at the 48th percentile in years one and four are slightly more

under the Curve-of-Factors (gray) and MA Z-score (black) Models with Complete and Missing Tests for Teachers Truly at the 48th (solid) and 76th (dotted) Percentiles associated with accurately predicting a teacher’s true rank when he or she is close to the average percentile. Although slight differences between the curve-of-factors model and the MA Z-score model appear in the complete tests case, these differences almost disappear in the missing tests case.

Across all four years, the estimated probability of classifying teachers in the upper quartile for each of the models with complete and missing tests increases nonlinearly as the true percentile increases (Figure 4.6). Although there do not appear to be noticeable differences between the estimated probabilities across models and availability of tests, the slopes of the curves in year one differ from those in subsequent years due to the

true percentiles. The range of estimated probabilities of correctly ranking a teacher at the 76th percentile is not centered at.5, as might be expected. Instead, it ranges between approximately.5 and.7, because of the bias associated with the estimator at the 76th percentile.

4.4 Summary and Future Work Curve-of-factors methodology applied in a value-added context extends the analysis of student achievement data to situations in which multiple tests with potentially different scales are given each year in a particular subject. Instead of estimating a teacher’s effect on changes in a student’s score over time, the curve-of-factors models allow the estimation of a teacher’s effect on changes in some common, latent trait measured by the

1.0 1.0 0.8 0.8 0.6 0.6

0.4 0.2 0.2 0.0 0.0 1.0 1.0 0.8 0.8 0.6 0.6

0.4 0.2 0.2 0.0

1.0 1.0 0.8 0.8 0.6 0.6

0.4 0.2 0.2 0.0 0.0 1.0 1.0 0.8 0.8 0.6 0.6