FREE ELECTRONIC LIBRARY - Abstracts, books, theses

Pages:   || 2 | 3 | 4 |

«Does Student Sorting Invalidate Value-Added Models of Teacher Effectiveness? An Extended Analysis of the Rothstein Critique Cory Koedel University of ...»

-- [ Page 1 ] --

Does Student Sorting Invalidate Value-Added Models of Teacher Effectiveness?

An Extended Analysis of the Rothstein Critique

Cory Koedel

University of Missouri

Julian R. Betts*

University of California, San Diego

National Bureau of Economic Research

April 2009

Value-added modeling continues to gain traction as a tool for measuring

teacher performance. However, recent research (Rothstein, 2009,

forthcoming) questions the validity of the value-added approach by

showing that it does not mitigate student-teacher sorting bias (its presumed primary benefit). Our study explores this critique in more detail. Although we find that estimated teacher effects from some valueadded models are severely biased, we also show that a sufficiently complex value-added model that evaluates teachers over multiple years reduces the sorting-bias problem to statistical insignificance. One implication of our findings is that data from the first year or two of classroom teaching for novice teachers may be insufficient to make reliable judgments about quality. Overall, our results suggest that in some cases value-added modeling will continue to provide useful information about the effectiveness of educational inputs.

* The authors thank Andrew Zau and many administrators at San Diego Unified School District (SDUSD), in particular Karen Bachofer and Peter Bell, for helpful conversations and assistance with data issues. We also thank Zack Miller, Shawn Ni and Mike Podgursky for useful comments and suggestions, and the National Center for Performance Incentives for research support. SDUSD does not have an achievement-based merit pay program, nor does it use valueadded student achievement data to evaluate teacher effectiveness. The underlying project that provided the data for this study has been funded by a number of organizations including The William and Flora Hewlett Foundation, the Public Policy Institute of California, The Bill and Melinda Gates Foundation, the Atlantic Philanthropies and the Girard Foundation. None of these entities has funded the specific research described here, but we warmly acknowledge their contributions to the work needed to create the database underlying the research.

Economic theory states that in an efficient economy workers should be paid their value marginal product. Implementing this rule in the service sector is not simple, as it is often not obvious how to measure the output of a white collar worker. Teachers provide an example of this problem: public school teachers‟ salaries are determined largely by academic degrees and credentials, and years of experience, none of which appears to be strongly related to teaching effectiveness.

Perhaps in recognition that teacher pay is not well aligned with teaching quality, President Obama has recently called for greater use of teacher merit pay as a tool to boost student achievement in America‟s public schools. And yet, in the United States, teacher merit pay is hardly a new idea. It has been used for at least a century, but most programs are shortlived, or survive either by giving almost all teachers bonuses or by giving trivial bonuses to a small number of teachers. Teachers have traditionally complained that principals cannot explain why they gave a bonus to one teacher but not another (Murnane et al., 1991, pp. 117-119).

Opponents of teacher merit pay would raise the question of whether we can reliably measure teachers‟ value marginal products such that informed merit-pay decisions can be made.

The advent of widescale student testing, partly in response to the requirements of the federal No Child Left Behind law, raises the possibility that it is now feasible to measure the effectiveness of individual teachers in the classroom. Indeed, recently developed panel datasets link students and teachers at the classroom level, allowing researchers to estimate measures of „outcome-based‟ teacher effectiveness.1 Because test scores are generally available for each For recent examples see Aaronson, Barrow and Sander (2007), Hanushek, Kain, O‟Brien and Rivkin (2005), Harris and Sass (2006), Koedel and Betts (2007), Nye, Konstantopoulos and Hedges (2004), and Rockoff (2004).

student in each year, test scores lend themselves comfortably to a “value-added” approach where the effectiveness of teacher inputs can be measured by student test-score growth. The conjuncture of President Obama‟s recent calls for teacher merit pay and the development of panel data-sets that provide information on student achievement growth raise the stakes considerably: can we use student testing to reliably infer teaching quality?

In most schools, students are not randomly assigned to teachers. This raises a major challenge to the idea of using value-added models to infer teacher effectiveness. If certain teachers perennially receive students with low test scores, they would lose out in the merit pay sweepstakes through no fault of their own. A presumption in value-added modeling is that by focusing on achievement growth rather than achievement levels, the problem of student-teacher sorting bias is resolved because each student‟s initial test-score level is used as a control in the model. The value-added approach is intuitively appealing, and increasing demand for performance-based measures by which teachers can be held accountable - at the federal, state and district levels – has only fueled the value-added fire.2 However, despite the popularity of the value-added approach among both researchers and policymakers, not everyone agrees that it is reliable. Couldn‟t it be the case that a given teacher either systematically or occasionally receives students whose gains in test scores are unusually low, for reasons outside the control of the teacher? Ability grouping would be one source of

–  –  –

accompanied by mean reversion, would be a source of fleeting differences that a value-added model might wrongly attribute to a given teacher.

No Child Left Behind legislation is one example of this demand at the federal level (e.g., adequate yearly progress), and states such as Florida, Minnesota and Texas have all introduced performance incentives for teachers that depend to some extent on value-added. For a further discussion of the performance-pay landscape, particularly as it relates to teachers, see Podgursky and Springer (2007).

Recent research by Rothstein (2009, forthcoming) shows that future teacher assignments have non-negligible predictive power over current student performance in value-added models, despite the fact that future teachers cannot possibly have causal effects on current student performance. This result suggests that student-teacher sorting bias is not mitigated by the valueRothstein‟s critique of the value-added methodology comes as numerous added approach.

studies have used and continue to use the technique. It raises serious doubts about the valueadded methodology just as other work, such as Kane and Staiger (2008), Jacob and Lefgren (2007) and Harris and Sass (2007), appears to confirm that value-added is a meaningful measure of teacher performance.

We further explore the reliability of value-added modeling by extending Rothstein‟s analysis in two important ways. First, Rothstein estimates teacher effects using only a single year of data for each teacher. We consider the importance of using multiple years of data to identify teacher effects. If the sorting bias uncovered by Rothstein is transitory to some extent, using multiple cohorts of students to evaluate teachers will help mitigate the bias.3 For example, a principal may alternate across years in assigning the most troublesome students to the teachers at her school, or teachers may connect with their classrooms more in some years than in others.

These types of single-year idiosyncrasies will be captured by single-year teacher effects, but will be smoothed out if estimates are based on multiple years of data.4 Second, we evaluate the Rothstein critique using a different dataset. Given that the degree of student-teacher sorting may Rothstein notes this in his appendix, although he does not explore the practical implications in any of his models.

Additionally, some of what we observe to be sorting bias may be attributable to the random assignment of students to teachers across small samples (classrooms). In an omitted analysis, we perform a Monte Carlo exercise to test for this possibility. Although any given teacher may benefit (be harmed) in any given year from a random draw of highperforming (low-performing) students, we find no evidence to suggest that this would influence estimates of the distribution of teacher effects.

differ across different educational environments, his results may or may not be replicated in other settings.

Our extension of Rothstein‟s analysis corroborates his primary finding – value-added models of student achievement that focus on single-year teacher effects will generally produce biased estimates of value-added. However, in our case, when we estimate a detailed value-added model and restrict our analysis to teachers who teach multiple classrooms of students, we find no evidence of sorting bias in the estimated teacher effects. Although this result depends on the degree of student-teacher sorting in our data, it suggests that at least in our setting, sorting bias can be almost completely mitigated using the value-added approach and looking across multiple years of classrooms for teachers.

Our results in this regard are encouraging, but less detailed value-added models that include teacher-effect estimates based on single classroom observations fare poorly in our analysis. That some value-added models will be reliable but not others, and that value-added modeling may only be reliable in some settings, are important limitations. They suggest that in contexts such as statewide teacher-accountability systems, large-scale value-added modeling may not be a viable solution. Because the success of the value-added approach will depend largely on data availability and the underlying degree of student-teacher sorting in the data (much of which may be unobserved), post-estimation falsification tests along the lines of those proposed by Rothstein will be useful in evaluating the reliability of value-added modeling in different contexts.

Although our analysis does not uncover a well-defined set of conditions under which value-added modeling will universally return causal teacher effects across different schooling environments (outside of random student-teacher assignments such conditions are unlikely to exist), we do identify conditions under which value-added estimation will perform better. The most important insight is that teacher evaluations that span multiple years will produce more reliable measures of teacher effectiveness than those based on single-year classroom observations. Often implicitly, the value-added discussion in research and policy revolves around single-year estimates of teacher effects. Our analysis strongly discourages such an approach.

The remainder of the paper is organized as follows. Section I briefly describes the Rothstein critique. Section II details our dataset from the San Diego Unified School District (SDUSD). Section III replicates a portion of Rothstein‟s analysis using the San Diego data.

Section IV details our extended analysis of value-added modeling and presents our results.

Section V uses these results to estimate the variance of teacher effectiveness in San Diego.

Section VI concludes.

–  –  –

Rothstein raises concerns about assigning a causal interpretation to value-added estimates of teacher effects. His primary argument is that teacher effects estimated from value-added models are biased by non-random student-teacher assignments, and that this bias is not removed by the general value-added approach, nor by standard panel-data techniques. Consider a simple

value-added model of the general form:

–  –  –

In equation (1), Yit is a test-score for student i in year t, Xit is a vector of time-varying student and school characteristics (for the school attended by student i in year t), and Tit is a vector of indicator variables indicating which teacher(s) taught student i in year t. This model could be re-formulated as a “gainscore” model by forcing the coefficient on the lagged test score to unity and moving it to the left-hand side of the equation. The error term is written as the sum of two components, one that is time-invariant (  i ) and another that varies over time (  it ).

Rothstein discusses sorting bias as coming from two different sources in this basic model.

First, students could be assigned to teachers based on “static” student characteristics. This type of sorting corresponds to the typical tracking story – some students are of higher ability than others, and these students are systematically assigned to the best teachers. Static tracking may operationalize in a variety of ways including administrator preferences, parental preferences, or teacher preferences (assuming that primary-school aged children, upon whom we focus here, are not yet able to form their own preferences). Given panel data, the typical solution to the statictracking problem is the inclusion of some form of a student fixed effect whereby the timeinvariant component to the error term in equation (1) is controlled for (e.g., first-differencing or demeaning). If student-teacher sorting is only based on static student characteristics, this approach will be sufficient.

However, the student-fixed-effects solution to the static tracking problem necessarily imposes a strict exogeneity assumption. That is, to uncover causal teacher effects from a model that controls for time-invariant student characteristics, it must be the case that teacher assignments in all periods are uncorrelated with the time-varying error components in all periods.

To see this, note that we could estimate equation (1) by first differencing to remove the timeinvariant component to the error term:5

–  –  –

First, note that the first-differencing induces a mechanical correlation between the lagged testscore gain and the first-differenced error term in equation (2). This correlation can be resolved In the case of first differencing, it is more accurate to describe the assumption as “local” strict exogeneity in the sense that the error terms across time must be uncorrelated with teacher assignments only in contiguous years.

by instrumenting for the lagged test-score gain with the second-lagged gain, or second-lagged level (following Anderson Hsiao, 1981 – for examples see Harris and Sass, 2006; Koedel, forthcoming; and Koedel and Betts, 2007). In addition, year-t teacher assignments may also be correlated with the first-differenced error term. Specifically, if students are sorted dynamically based on time-varying deviations (or shocks) to their test-score-growth trajectories, then lagged shocks to test-score growth, captured by  it 1, will be correlated with year-t teacher assignments, and the teacher effects from equation (2) cannot be given a causal interpretation.6 Rothstein‟s critique can be summarized as follows: If students are assigned to teachers based entirely on time-invariant factors, unbiased teacher effects can in principle be obtained from a wellconstructed value-added model. However, if sorting is based on dynamic factors that are unobserved by the econometrician, value-added estimates of teacher effects cannot be given a causal interpretation.

Pages:   || 2 | 3 | 4 |

Similar works:

«What Every Guitarist Should Know: A Guide to the Prevention and Rehabilitation of Focal Hand Dystonia Jason W. Solomon Disclaimer: This article is not intended to diagnose or to definitively prevent or cure any form of injury, including focal dystonia. The author drew from recent medical publications and personal discourse with medical professionals, but he is not a medical doctor. Any person with any type of injury, including but not limited to focal dystonia, should consult a health care...»

«“Almost as annoying as the Yank; better accent, though.” -Attitudes and Conceptions of Finnish Students toward Accents of English Pro Gradu Henrik Hakala Department of English University of Helsinki Instructor: Prof. Anna Mauranen Contents Contents 1. Introduction 2. Theoretical Framework 2.1 Attitudes 2.2 Language Attitude Studies 2.2.1 Techniques in language attitudes studies 2.2.2 Previous Studies on Non-native Speaker Attitudes 2.3 Language Attitudes 2.3.1 Language attitude assessment...»

«Harmonic Motion: The Pendulum Lab Teacher Version In this lab you will set up a pendulum using rulers, string, and small weights and measure how different variables affect the period of the pendulum. You will also use the concept of resonance to make pendulums swing without any initial push.Prerequisites: Students doing the basic version of this lab should be comfortable dividing by 10.California Science Content Standards:  1. Newton's laws predict the motion of most objects.  1a....»

«New Zealand Journal of Asian Studies 13, 1 (June 2011): 46-61 Bridging the Cultural gulfs ChC teaChers in new Zealand sChools Dekun Sun Victoria University of Wellington Introduction The rapid increase in Chinese language programmes worldwide has created an increased demand for qualified Chinese language teachers. As Wang (2009, p. 283) notes, “the lack of quantity and quality of Chinese language teachers constitutes the key bottleneck in building capacity” for the sustainable development...»

«3T firefighting Firefighters who are actively trying to stay informed on new developments in firefighting will raise some eyebrows when reading the title. “3T firefighting?” Shouldn’t that be “3D firefighting?” The term 3D firefighting was introduced in the 90’s by Paul Grimwood. He wanted to show that firefighting had evolved from a two dimensional environment to a three dimensional one. Techniques such as gas cooling were a result of this. 3D firefighting has led to the fire...»

«The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL 2011 33–50 Machine Learning Approach for the Classification of Demonstrative Pronouns for Indirect Anaphora in Hindi News Items Kamlesh Duttaa, Saroj Kaushikb, Nupur Prakashc a National Institute of Technology, Hamirpur b Indian Institute of Technology, Delhi c Guru Gobind Singh Indra Prastha University Abstract In this paper, we present machine learning approach for the classification indirect anaphora in Hindi corpus. The...»

«Teaching Contingent Valuation and Promoting Civic Mindedness in the Process Roland Cheo Department of Economics, Monash University Abstract Economics majors are often assumed to lack civic mindedness.The purpose of this paper then is to demonstrate how by engaging students in the proper understanding of contingent valuation (CV) methodology and by evaluating a social service, we can improve student outcomes in two areas: increasing their competence in research design as well as in the process...»

«Teachers’ Background & Capacity to Teach Personal Finance: Results of a National Study Wendy L. Way, Ph.D., Professor & Associate Dean School of Human Ecology, University of Wisconsin-Madison Karen Holden, Ph.D., Professor School of Human Ecology & Robert M. La Follette School of Public Affairs, University of Wisconsin-Madison e full report is available at www.nefe.org/tntfinalreport. ©2010 National Endowment for Financial Education. All rights reserved. Introduction Teachers are pivotal to...»

«Do the teachers share the greater ‘burden’ of Blended Learning? : An evaluation of innovative approaches to economics teaching Ian MacDonald Deaprtment of Accounting, Economics and Finance Faculty of Commerce Lincoln University Email: ian.macdonald@ lincoln.ac.nz Nazmun N. Ratna* Deaprtment of Accounting, Economics and Finance Faculty of Commerce Lincoln University Email: nazmun.ratna@ lincoln.ac.nz Maurice Ward Library, Teaching and Research Lincoln University...»

«Fractions Lesson 8 Lesson 8: Fractions Greater than 1 Objective By the end of the lesson, students will be able to label whole numbers as fractions (e.g., 2 and 8/4) and label points greater than 1 and between whole numbers as both mixed numbers and as fractions with the denominator greater than the numerator (e.g., 11/4 and 5/4). What teachers should know. About the math. A whole number can be represented as a fraction with a numerator equal to the denominator, and a fraction greater than 1...»

«4. Towards a Motivationally-Intelligent Pedagogy: How should an intelligent tutor respond to the unmotivated or the demotivated? Benedict du Boulay Interactive Systems Research Group, School of Informatics, University of Sussex, Brighton BN1 9QH, UK 4.1 Introduction This paper delineates some of the pedagogy needed by a motivationallyintelligent tutoring system. Such a system combines the expertise and knowledge of systems able to reason and react effectively at the cognitive and metacognitive...»

«LESSONS FOR CLASSROOM USE OF VIDEO MUHAMMAD: LEGACY OF A PROPHET Written by Susan L. Douglass, Principal Researcher and Editor and Aiyub Palmer, Researcher -Council on Islamic Education Shabbir Mansuri, Founding Director Modular Lessons for Muhammad, Legacy of a Prophet Table of Contents Table of Contents Note to Teachers Overview Vocabulary Pre-Viewing Activity Handout 1:1a VOCABULARY LIST Handout 1:1b – VOCABULARY Map and Background Info: Geography and History of the Arabian Peninsula, and...»

<<  HOME   |    CONTACTS
2017 www.sa.i-pdf.info - Abstracts, books, theses

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.