«Research Report Hua Wei Tracey Hembry Daniel L. Murphy Yuanyuan McBride May 2012 COMPARISON OF VALUE-ADDED MODELS AND OUTCOMES 1 About Pearson Pearson, ...»
Model 5: Layered Mixed Effects Model. This model was different from Model 3 and Model 4 in that it did not estimate teacher effectiveness from residualized gains of students’ scores. Rather, it focused on a student’s growth trajectory across years and measured teacher effectiveness as the deviation from the student’s average trajectory. The Layered Mixed Effects
Model can be simply specified as:
the random effect of teacher j in the sth subject area at time t (where t = 1,...,T ), and εijst is the random error for the ith student in the classroom of the jth teacher in the sth subject area at time t.
Unlike the other models described above, this model made full use of students’ TAKS scores in mathematics or ELA in the current and two prior grades and predicted the average rate of growth that the students were expected to achieve in the current grade. The degree to which students had attained or failed to attain their predicted average was taken as the measure of teacher effectiveness. In addition, within this modeling methodology, each student served as his or her own control and there was no need to control for student demographic variables, such as ethnicity and socioeconomic status. Therefore, no student-level background variables were included in the model.
This model has several advantages over all the previous models. First, this model allocates credit for a student’s score gains to each teacher who taught the student at each grade.
Second, this model incorporates multiple years of test scores even if they are not on the same scale. Disadvantages of this model are that it involves highly complex statistical techniques and requires extensive data that link student test scores across years.
The advantages and disadvantages of the five models that were applied in this study are
The five value-added models were applied to the mathematics and ELA data separately, and generated teacher measures in each content area. As mentioned above, the five models used data from different numbers of teachers and students in the analyses. For example, the Layered Mixed Effects Model, which has the most stringent data requirements among the five models, required that student test scores for three years be available. In the area of mathematics, the model utilized student data from a total of 3,623 students and generated value-added measures for 63 teachers, after excluding students with missing records. In contrast, the Hierarchical Linear Regression Model, which is less sophisticated than the Layered Mixed Effects Model, incorporated only students with both mathematics and ELA test scores from grade 4 and grade 5.
After exclusions of students with missing data, this model used data from 4,212 students and produced measures for 70 teachers.
Within each model, teachers were rank ordered by the teacher measure. Teacher rankings were reported for the teachers, with a value of 1 representing the greatest teacher effect. The larger the number is, the smaller the teacher impact. It is also worth mentioning that all of the models except for Model 1 excluded some teachers from model analysis, and a rank order of “NA” was assigned to those teachers for whom the required data were not available. Table 3 provides the rankings for the 73 teachers based on the five value-added models in the area of
The results in Tables 3 and 4 show a great deal of variability in the rankings for each teacher. For example, in the content area of mathematics, Teacher 56 received a ranking of 1 from Models 4 and 5 but received one of the worst rankings from Model 1. Depending on the value-added model used, it can be concluded that either Teacher 56 has a great impact on student scores or that Teacher 56 has a relatively small impact. Similarly, in the content area of ELA, Teacher 50 received a ranking of 1 from Model 1 but received much worse rankings from Models 2 and 3. The great discrepancies among the rankings from the different models result in vastly different conclusions about the teacher. Moreover, a closer look at the two tables reveals that, for each teacher, the rankings from Models 2 and 3 are more consistent with each other than the rankings from the other models. This can be explained by the fact that Models 2 and 3 bear the most similarities among all pairs of models.
To further investigate the relationship between the outcomes of each pair of models, the Spearman correlation coefficient was computed on the rank orders obtained from every two of the five value-added models. Tables 5 and 6 present the correlation coefficients of all pairs of models for the two content areas.
Table 5: Spearman Correlation Coefficients between Rank Orders in Mathematics
As shown in Table 5 and Table 6, the correlation of teacher rankings from each pair of models ranges from medium to low. Negative correlations are obtained for the rank orders between Model 1 and Model 4 and between Model 1 and Model 5 for both Math and ELA. The patterns of correlation coefficients in the two content areas indicate that the outcomes from the five value-added models are only moderately related at best, and in some cases, remotely or even negatively correlated. The largest correlation is observed between Models 2 and 3, and this finding is consistent with above-mentioned findings.
This study compared and contrasted five value-added models and illustrated the impact of model choice on the estimates of teacher effectiveness through application of the five models to the same data. The five models differed from one another in many aspects, including how to define student growth, whether and how to adjust for a student’s prior performance and background characteristics, and whether to model the grouping effect. Differences in the underlying assumptions largely accounted for the large discrepancies among the results of the models. In addition, the five models did not use the same amount of student data, which also explained why the teacher rankings varied substantially across models. Models that were more
teacher rankings that bear a higher degree of resemblance to one another than other pairs of models. On the other hand, models that shared fewer similarities, like the Percent Passing Change Model and the Hierarchical Linear Regression Model, resulted in negative correlations in teacher rankings, which could raise concerns when making comparisons in teacher evaluations.
Choosing a value-added model for the purpose of estimating teacher effectiveness is not purely a technical issue. As summarized in Table 2, each of the five models has its own strengths and limitations. Proponents of value-added modeling typically will not recommend the Percent Passing Change Model or the Average Score Change Model because these models make no efforts to isolate teacher effects from the effects of confounding factors such as student’s background characteristics or prior performance. However, these models have been or are still being implemented in some schools and districts simply because of their transparency and low data requirements. In contrast, the Multiple Regression Model, the Hierarchical Linear Regression Model, and the Layered Mixed Effects Model are statistically more sophisticated, but their applications have been limited due to the lack of transparency or understanding. Moreover, the application of these models requires longitudinal data for both teachers and students.
Currently, only a few states have created data systems that support longitudinal tracking of teachers and students. For schools or teachers that do not meet the data requirements, these models cannot produce indicators of teacher effectiveness. This partly explains why these models are not broadly implemented. Therefore, the choice of a particular value-added model should not only be informed by an evaluation of its technical properties, but also by many other policy and practical considerations, such as whether the required data are available, and whether
is clear that in addition to these practical considerations, policymakers must keep in mind that the results of value-added analyses are not definitive and will depend to a significant degree on the model that is specified.
In addition to considering the tradeoffs when implementing a value-added model, efforts should also be spent on how to communicate the model and interpret the results to stakeholders and the public, especially when the model of choice involves complex statistical methodology.
As James Mahoney, executive director of Battelle for Kids, stated at the Roundtable Discussion on Value-Added Analysis of Student Achievement, “We don’t need to take a complex model and make it simplistic; we need to make it simply understood” (The Working Group on Teacher Quality, 2007, p. 6).
Value-added modeling has been widely accepted as a more objective approach to estimating the value of teachers than other methods since it expresses a teacher’s unique contribution to student learning in precise, quantitative terms. However, most researchers and policymakers agree that results from value-added analyses should not be used alone to make high-stakes decisions about teachers because the fundamental methodological issues and technical limitations of value-added models often lead to “noisy” measures of teacher effectiveness (McCaffrey, Lockwood, Koretz, & Hamilton, 2003; Braun, 2005). As illustrated in this study, the sensitivity of value-added estimates to modeling choices can lead to the uncertainties in the resulting teacher-effectiveness measures. In addition to modeling choices and strategies, sampling errors, missing data, and omitted student- and teacher-level variables are all potential sources of uncertainty. These uncertainties should be taken into account when value
Value-added measures hardly convey any information as to the areas that a teacher should focus on to improve instruction and what strategies and practices the teacher should employ to make the improvement. Therefore, other forms of teacher measures, such as those obtained through expert observation, portfolio reviews, student surveys, and conversations should be used in combination with value-added measures to provide a more comprehensive picture of a teacher’s impact on student learning and, more importantly, to help the teacher to improve instruction.
This study has several limitations, some of which point to directions for future research.
First, teacher rankings based on the value-added measures, rather than the value-added measures themselves, were reported in the results for the models. Given the purpose of the study, a display of the rankings suffices. However, it would be necessary to report and examine the value-added measures and their standard errors of measurement if our goal was to make reasonable inferences about the effectiveness of the teachers. Second, the five types of models applied in this study represent five different classes of models, each of which has a number of variations. For example, the Average Score Change Model implemented in this study required that the two scores share the same scale metric. However, in cases where this requirement is not met, we could transform the scores into percentiles at each point in time and conduct an analysis based on mean growth in percentiles. Another limitation is that there is no attempt in this study to evaluate the degree of reasonableness of the results among the five models being compared. It would be interesting to compare the rankings that were obtained from the models with a more qualitative evaluation of teacher effectiveness to determine which model matches other measures of teacher
Braun, H. I. (2005). Using student progress to evaluate teachers: A primer on value-added models. Princeton, NJ: Educational Testing Service. Retrieved September 23, 2011, from http://www.ets.org/Media/Research/pdf/PICVAM.pdf McCaffrey, D. F., Lockwood, J. R., Koretz, D., & Hamilton, L. (2003). Evaluating value-added models for teacher accountability. Santa Monica, CA: The RAND Corporation.
Retrieved September 23, 2001, from www.rand.org/pubs/monographs/2004/RAND_MG158.pdf McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29 (1), 2004, 67–101.
Raudenbush, S., & Bryk, A. S. (1986). A hierarchical model for studying school effects.
Sociology of Education, 59, 1–17.
Sanders, W. L., & Horn, S. P. (1994). The Tennessee Value-Added Assessment System (TVAAS): Mixed model methodology in educational assessment. Journal of Personnel Evaluation in Education, 8, 299–311.
Sanders, W., Saxton, A., & Horn, B. (1997). The Tennessee Value-Added Assessment System:
A quantitative outcomes-based approach to educational assessment. In J. Millman (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluational measure? (pp. 137–162). Thousand Oaks, CA: Corwin Press, Inc.
Tekwe, C. D., Carter, R. L., Ma, C., Algina, J., Lucas, M. E., Roth, J., Ariet, M., Fisher, T., and
assessment of school performance. Journal of Educational and Behavioral Statistics, 29 (1), 2004, 11–35.
The Council of Chief State School Officers (CCSSO) Accountability Systems and Reporting Group (2008). Implementer’s Guide to Growth Models. The Council of Chief State School Officers, Washington, DC. Retrieved April 27, 2012, from http://www.ccsso.org/Documents/2008/Implementers_Guide_to_Growth_2008.pdf The Working Group on Teacher Quality (2007, October). Roundtable discussion on value-added analysis of student achievement: A summary of findings. Washington, D.C.: The Working Group on Teacher Quality. Retrieved September 23, 2011 from http://www.tapsystem.org/pubs/value_added_roundtable_08.pdf Webster, W. J., & Mendro, R. L. (1997). The Dallas value-added accountability system. In J.
Millman (Ed.), Grading teachers, grading schools: Is student achievement a valid