«Consortium for Educational Research and Evaluation– North Carolina Comparing Value-Added Models for Estimating Individual Teacher Effects on a Statewide ...»
Consortium for Educational Research and Evaluation–North Carolina 34 Comparing Value-Added Models August 2012 This study showed that the most common VAMs produce teacher rankings having markedly high correlations with each other, particularly in models with no violations or only modest violations of SUTVA. With few exceptions, the Spearman rank correlation did not highly discriminate between models. Further, many of the observed rank correlations that we deemed inadequate in comparison to those of the best performing models would have been viewed as high in any other research setting. Much the same could be said for the percentage of agreement in being categorized in the bottom 5%, which had both very high levels of agreement and little in the way of discriminating power. On the other hand, when categorization into the bottom 5% was framed as a question of the rate and number of teachers falsely identified as ineffective—false positives—severe differences between some models emerged, particularly under the scenario of non-ignorable assignment. Further, the findings as presented actually understate the risk to education agencies from choosing the wrong model. For a negative assignment scenario, under the best model (the HLM3), 221 5th grade teachers would be misclassified, while under the worst model (the DOLS), 436 5th grade would be misclassified, a difference of 215 teachers.
This points to the second implication that this study raises: the risks to these misclassified teachers and their students will depend upon the uses of these evaluations, particularly the stakes assigned. Based on our findings, we believe that four of the tested VAMs, the HLM3+, HLM3, URM, and SFE, can provide objective information about the effectiveness of individual teachers in affecting the test score changes of their students for low stakes purposes. The evidence in this study suggests that the use of any VAMs for high stakes purposes is quite risky, even for the best performing models. The evidence also suggests that several of the VAM models are likely to be less accurate in estimating the actual effects of teachers, including the TFE, TFEIV, and DOLS.
Until additional research shows otherwise, these models should be considered risky even for low stakes purposes, although some of them performed well on some criteria even in this study.
Consortium for Educational Research and Evaluation–North Carolina 35 Comparing Value-Added Models August 2012 References Amrein-Beardsley, A. (2008). Methodological concerns about the Education Value-Added Assessment System. Educational Researcher, 37(2), 65–75.
Arellano, M., & Bond, S. (1991). Some tests of specification of panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies, 58, 277–298.
Ballou, D., Sanders, W., & Wright, P. (2004). Controlling for student background in value-added assessment of teachers. Journal of Educational and Behavioral Statistics, 29(1), 37–65.
Browne, W. J., Goldstein, H., & Rasbash, J. (2001). Multiple membership multiple classification (MMMC) models. Statistical Modeling 2001 (1), 103–124.
Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2006). Teacher-student matching and the assessment of teacher effectiveness. The Journal of Human Resources, 41(4), 778–820.
Goldhaber D., & Hansen, M. (2008). Assessing the potential of using value-added estimates of teacher job performance for making tenure decisions. Washington, DC: National Center for Analysis of Longitudinal Data in Education Research.
Gordon, R., Kane, T. J., and Staiger, D. O. (2006). Identifying effective teachers using
performance on the job (Hamilton Project Discussion Paper). Washington, DC:
Guarino, C. M., Reckase, M. D., & Wooldridge, J. M. (2012). Can value-added measures of teacher performance be trusted? (Working Paper #18). East Lansing, MI: The Education Policy Center at Michigan State University.
Hanushek, E. A. (1986). The economics of schooling: Production and efficiency in public schools. Journal of Economic Literature, 24, 1141–1177.
Harris, D. N. (2009). Teacher value-added: Don’t end the search before it starts. Journal of Policy Analysis and Management, 28(4), 693–699.
Henry, G. T., Kershaw, D. C., Zulli, R. A., & Smith, A. A. (in press). Incorporating teacher effectiveness into teacher preparation program evaluation. Journal of Teacher Education.
Hill, H. C. (2009). Evaluating value-added models: A validity argument approach. Journal of Policy Analysis and Management, 28(4), 700–712.
Holland, Paul W. (1986) Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960.
Consortium for Educational Research and Evaluation–North Carolina 36 Comparing Value-Added Models August 2012 Koedel, C., &Betts, J. R. (2011). Does student sorting invalidate value-added models of teacher effectiveness? An extended analysis of the Rothstein critique. Education Finance and Policy, 6(1), 18–42.
McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T.A., & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29(1), 67–101.
Morgan, S. L., & Winship, C. (2007). Counterfactuals and Causal Inference: Methods and Principles for Social Research. Cambridge, UK: Cambridge University Press.
Nye, B., Konstantopolous, S., & Hedges, L. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3), 237–257.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.
Raudenbush, S. W. & Willms, J. D. (1995). The estimation of school effects. Journal of Educational and Behavioral Statistics, 20(4), 307–335.
Reardon, S. F., & Raudenbush, S. R. (2009). Assumptions of value-added models for estimating school effects. Education Finance and Policy, 4(4), 492–519.
Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review, 94(2), 247–252.
Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay and student achievement. The Quarterly Journal of Economics, 25(1), 175–214.
Rowan, B., Correnti, R., & Miller, R. J. (2002). What large-scale, survey research tells us about teacher effects on student achievement: Insights from the Prospects study of elementary schools. Teachers College Record, 104(8), 1525–1567.
Rubin, D. B., Stuart, E. A., & Zanutto, E. L. (2004). A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics, 29(1), 103– 116.
Sass, T. (2008). The stability of value-added measures of teacher quality and implications for teacher compensation policy. Washington, DC: National Center for Analysis of Longitudinal Data in Education Research.
Schochet, P. Z., & Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student test score gains. Washington, DC: Institute for Education Sciences.
Consortium for Educational Research and Evaluation–North Carolina 37 Comparing Value-Added Models August 2012 Tekwe, C. D., Carter, R. L., Ma, C. X., Algina, J., Lucas, M. E., et al. (2004). An empirical comparison of statistical models for value-added assessment of school performance.
Journal of Educational and Behavioral Statistics, 29(1), 11–36.
Todd, P. E., & Wolpin, K. I. (2003). On the specification and estimation of the production function for cognitive achievement. The Economic Journal, 113, f3–f33.
Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions.
Psychometrika, 48, 465–471.
Wright, S. P., White, J. T., Sanders, W. L., & Rivers, J. C. (2010). SAS EVAAS statistical models. Cary, NC: The SAS Institute.