WWW.SA.I-PDF.INFO
FREE ELECTRONIC LIBRARY - Abstracts, books, theses
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 3 | 4 || 6 | 7 |

«Consortium for Educational Research and Evaluation– North Carolina Comparing Value-Added Models for Estimating Individual Teacher Effects on a Statewide ...»

-- [ Page 5 ] --

For the consistency analysis, which required two sequential within-year estimates for each grade level, there were limitations to the amount of information available for the models that required multi-year panels to estimate (the SFE, SFEIV, TFEIV, and DOLS). For the SFE, SFEIV, and TFEIV, two sequences of three years’ data were required for each model. However, among all time-varying covariates, only test score data were available prior to 2007–08 (giving us only three years of complete data), and therefore no time-varying covariates could be included in these models (differencing eliminates the time invariant covariates). For the DOLS, the panel was estimated with two years only, allowing for the inclusion of time-varying covariates. There were 503,370 student records in 5th grade math (8,826 teachers) and 728,008 student records in 5th grade reading (9,402 teachers).

Comparison Criteria

Three criteria were used to compare the absolute performance of each VAM on estimating the true teacher effects in the simulated data. Two of these and a different third criterion were used to assess relative performance of the models using actual NC data. First, Spearman rank order correlation coefficients, a non-parametric measure capturing the association between the rankings of two variables, was estimated for each pairing of a VAM with the true effect (simulation only) and with each other VAM (simulation and actual). For the simulated data, the estimates in each simulation needed to be combined into a single point estimate, which required a Fisher z transformation; the mean of this z-transformed correlation was calculated, and then back-transformed using the hyperbolic tangent function. High-performing VAMs have relatively higher Spearman coefficients.

Consortium for Educational Research and Evaluation–North Carolina 22 Comparing Value-Added Models August 2012 Second, we calculated the percent agreement on the lowest 5% of the teacher quality distribution.

The teachers in the bottom 5% of the distribution under each version of the teacher effect (the “true” effect in the simulation or from each VAM in both the simulated and actual) were identified. In the simulated data analysis, teachers' true and estimated scores agreed if they both ranked the teacher above the fifth percentile or they both ranked the teacher below the fifth percentile. The statistic was the proportion of all teachers with agreement. In the actual data, teachers’ scores on any two methods agreed if the scores were both observed and were both above the fifth percentile or below. High-performing VAMs have relatively higher levels of agreement. Due to the normal distributions used in the data generation processes for the simulations, the findings for teachers in the 95th percentile were nearly identical. We chose this approach to correspond with a likely policy use of VAMs: to identify the lowest performing teachers.

Third, we examined the false identification of ineffective teachers in the simulated data only. For this analysis, special focus was placed on identifying a teacher who is actually relatively effective as ineffective based on their VAM score, due to the significant consequences that teachers and states may face under high stakes evaluation systems. We assumed a cutoff of -1.64 standard deviations from the mean teacher score, which is consistent with a finding of 5% of teachers being ineffective. First, we identified those teachers above the cutoff for ineffectiveness on the “true” measure. Then we identified those teachers who were below the cutoff on the estimated teacher effect. The teachers who satisfied both conditions were considered false positives or falsely identified as ineffective. This approach is a combination of the false positive/false negative methods used by Schochet and Chiang (2010). High-performing VAMs have relatively low proportions of false positives. Due to the normal distributions of the simulated data, we can assume that findings about falsely identifying a teacher as highly effective when he or she is not would be very similar. We also calculated the mean true score for teachers falsely identified as ineffective, and the number of teachers in North Carolina who would be affected by these findings. Actual data estimates for this comparison were not possible.

Fourth, we examined the year-to-year reliability in the VAMs in the actual NC data. For this criterion, the teacher estimates were obtained for each of two years individually. For the SFE, SFEIV, and TFEIV models, this required a substantial simplification of the models due to limitations in the actual NC data; further, for the DOLS no reliability analysis was possible, given these same limitations. Each teacher effect distribution on the eight remaining VAMs was divided into quintiles in each of the two years, and then each of these quintile classifications was cross-tabulated. If reliability were high, and allowing for some year-to-year variability including improvement, the teachers would have tended to fall along the diagonal where the quintiles were equal or roughly equal, with some off-diagonals suggesting an allowable amount of error and with the above-diagonal proportions slightly greater, allowing for improvement. If teachers did not fall along the diagonal, we could not tell which part would be due to estimate reliability and which part would be due to actual teacher improvement or change. We focused on three characteristics of the cross-tabulations: the proportion of teachers on the diagonal—that is, those teachers who were in the same quintile in each year—and the proportions of teachers in the most extreme “switchers” groups—those who were in the lowest quintile one year and the highest the next; or in the higher one year and the lowest the next. This method or one similar to it has been used by Sass (2008) and Goldhaber and Hansen (2008).





Consortium for Educational Research and Evaluation–North Carolina 23 Comparing Value-Added Models August 2012 Results We compared nine models’ performance on a set of criteria that together were used to answer the six questions regarding rank ordering and identification of ineffective teachers with and without violations of potential outcomes assumptions, consistency across VAMs, and year-to-year reliability of VAMs. In reporting the results, we focus on the criteria and summarize the results into answers to the questions in the discussion section that follows.

Spearman Rank Order Correlations

Assessing performance by rank order correlations with the “true” effect assuming no classroom level variance (0% classroom variance), the best-performing VAM was the HLM3+ with three VAMs closely following in order: URM, SFE, and HLM3 (Table 2). The increase in the classroom proportion of variance for testing the influence of SUTVA and confoundedness reduced the Spearman rank order correlations of all models with the true effect (Table 2). The violation of SUTVA implied by 4% of variance at the classroom level did not affect the relative ranking of the VAMs on this criterion. The HLM3+ was highest at.955 at 0% classroom variance and remained highest at 4% classroom variance (.864). The HLM2, TFE, and DOLS were nearly equal (.909 and.822, respectively), as were the SFEIV and TFEIV (.893 and.808, respectively). The classroom variance simulated in this analysis at 4% should be considered reasonable, given the analysis of Schochet and Chiang (2010).

Table 2. Spearman Rank Order with True Effect, Simulated Data

–  –  –

Consortium for Educational Research and Evaluation–North Carolina 24 Comparing Value-Added Models August 2012 When the strong ignorability of assignment (confounded assignment) was violated, there was substantial variation (Table 2) in the Spearman rank order for either moderate positive or negative correlation between the student covariate and the classroom, teacher, and school covariates, with two random effects models, the HLM3 (.796 and.746, respectively) and HLM3+(.771 and.755, respectively) being the top performers, followed by the HLM2 (.716 and.662, respectively), URM (.660 and.670, respectively), SFE (.648 and.628, respectively), and SFEIV (.562 and.526, respectively), but with the TFE, TFEIV, and DOLS very low. In the rank order correlation with optimal conditions, SUTVA violations, and confounded assignment, the HLM3 and HLM3+ were consistently the highest performing VAMs, and several models, including the four fixed effects and DOLS VAMs, performed much worse than the others.

Table 3. Spearman Rank Order Matrices of Value-Added Models, Actual Data

–  –  –

On the actual NC data (see Table 3 containing two correlation matrices), the rank order correlations between the VAM estimates varied considerably in both math and reading, from.970 to.642 for mathematics and.948 to.488 for reading. In both subjects, the URM was most highly correlated with the other models, averaging.850 and.774, respectively. The TFEIV was the least highly correlated with the other models, averaging.793 and.594, respectively. The two most highly correlated VAMs were HLM3 and HLM3+, with.970 for mathematics and.948 for reading. There was a tendency for the random effects models to be highly correlated with each Consortium for Educational Research and Evaluation–North Carolina 25 Comparing Value-Added Models August 2012 other and the URM and TFE models. The TFE model was highly correlated with the HLM2 (.944 for 5th grade math and.813 for 5th grade reading) and DOLS (.904 for 5th grade math and.861 for 5th grade reading) but not with the other fixed effects models. The fixed effects models did not exhibit an overall tendency to be highly correlated with each other or to be more highly correlated with each other than with the random effects models. Overall, it appears that the choice of a VAM model over some others can yield quite different rank orderings of the teacher effect estimates. It is important to note that higher correlations between the VAM model estimates from the actual data do not imply that they recover the “true” teacher effect estimates more consistently because the models may be reproducing a similar bias.

Agreement on Classification in Fifth Percentiles

The agreement on classification provides an indication of the extent to which the VAMs agree with the true effect or each other in terms of identifying the lowest performing 5% of teachers in the state. This criterion is quite important when the teacher effect estimates are to be used for teacher evaluations with consequences, since there are significant costs associated with falsely identifying an average teacher in the lowest performing group or falsely identifying a lowperforming teacher in the “acceptable” range of performance. Nearly all of the VAMs performed very well in the absence of assumption violations, with between 97.7% and 96.3% agreement on the bottom 5% and top 95%, which is less than a 1.5% difference (Table 4). In the test of the SUTVA violation with 4% of the variance at the classroom level, the VAM exhibited lower agreement rates, about 95%–96%, with the difference between the models much less, having a range of only 0.82. The HLM3+ was the highest, with 97.71% agreement with zero classroom variance, and it remained the highest with 4% classroom variance (96.01%). Nevertheless, all of the coefficients were very similar.

–  –  –

Consortium for Educational Research and Evaluation–North Carolina 26 Comparing Value-Added Models August 2012 In the test of the confounded assignment, the level of agreement was reasonably high with all models at or above 90% agreement in the positive assignment and negative (compensatory) assignment scenarios. The HLM3 and HLM3+ were the highest agreement models (for the positive assignment, 95.04 and 94.78, respectively), followed by the HLM2, URM, SFE, and SFEIV (for the positive assignment, 94.25, 93.70, 93.56, and 92.98, respectively). Three consistently lower performers were the TFE, TFEVI, and DOLS in the positive assignment (90.93, 90.74, and 90.48, respectively), with the negative assignment following the same pattern.

There was a more sizeable gap between the higher and lower ranking models than for the variance decomposition findings, and the direction of the correlation did not alter the pattern.

With the actual NC data, the agreement between the VAMs was quite high with all models, averaging from 94%–95% agreement with each other for mathematics and reading (Table 5).

There was a tendency for the random effects VAMs to be in greater agreement with each other, and for the fixed effects VAMs (including the DOLS) to be in greater agreement with each other, with lower agreement across type. This tendency was not as great in math—the percentage of agreement in each partition of the matrix was very similar—but was obvious in reading.

Table 5. Percent Agreement Across Models, Actual Data

–  –  –

Consortium for Educational Research and Evaluation–North Carolina 27 Comparing Value-Added Models August 2012 False Positives: Average Teacher Identified as Ineffective The third type of analysis assessed the extent of false positives; that is, how many teachers in the top 95% of the distribution would be falsely identified as bottom 5% performers. This criterion is relevant because several have proposed to use VAM estimates of teacher effectiveness to identify “ineffective” teachers as a step toward dismissal. False positives were examined on the simulated data only (Table 6, following page). In the variance decomposition simulation, at low levels of classroom variance (an absence of assumption violations), the HLM3+ (1.2% false positives), URM (1.3%), HLM3 (1.4%), and SFE (1.4%) performed the best; the other models were 1.7% or higher. To get a more concrete estimate of the breadth of the differences in model performance, assuming 9,000 5th grade teachers (the approximate number statewide in North Carolina), between 108 and 170 would be falsely identified as ineffective by the best and worst performing VAMs. In other words, the worst performing VAM would falsely identify 62 more 5th grade teachers as ineffective. The mean of the true z-score for these teachers was -1.43 for the HLM3+, the best performing VAM, and -1.30 for the worst performing VAMs, the SFEIV and TFEIV, which indicates that the teachers being falsely identified as ineffective by the worst performing VAMs were on average better performers; false identification of ineffectiveness casts a wider net in the worst performing models.

When the level of classroom variance was set at 4%, however, the relative performance advantage of all models declined somewhat, with all of the models demonstrating higher proportions of false positives (2.0%–2.4 % at 4% classroom variance). While these rates were seemingly modest, the number of teachers affected in each grade level and subject can be large, with up to 210 teachers misclassified under a scenario with 4% variance. The differences among the models, however, were modest, with a difference of 28 teachers at most.

With a heterogeneous fixed effect simulation, there was substantial variation between the models in the proportion of teachers misidentified as ineffective in the positive assignment scenario (Table 6, following page), with the HLM2, HLM3, and HLM3+ being the best performers (less than 3% misidentified), followed by the URM, SFE, and SFEIV misidentifying 3.1%, 3.2%, and 3.5%, respectively, and the TFE, TFEIV, and DOLS misidentifying more than 4%. The direction of correlation altered this pattern only slightly with only the HLM3 and HLM3+ incorrectly identifying less than 3% of the ineffective teachers followed closely by the HLM2, URM, and SFE. The number of teachers affected nearly doubled from the best to the worst performing models on this criterion, ranging from 221 (HLM3) to 436 (DOLS). Finally, for the worst performing VAMs, the TFE, TFEIV, and DOLS, the point estimates for the mean true effect of the misidentified teachers were actually above zero, meaning that the misclassified teachers included above-average teachers.

–  –  –



Pages:     | 1 |   ...   | 3 | 4 || 6 | 7 |


Similar works:

«REPORT NO: 02 2829 VESSEL NAME: UNNAMED KAYAK _ KEY EVENTS 1.1 On Friday 1 February 2002, at 0800 hours NZDT (New Zealand Daylight Saving Time), a group of 12 students and three Kayaking Instructors from Tai Poutini Polytechnic (TPP) gathered and began preparing equipment for a day's kayaking on the Buller River, on the West Coast of the South Island.1.2 At approximately 0830 hours, three mini buses and their trailers were loaded with the equipment. The group was driven approximately 100...»

«EXPERT BLIND SPOT Expert Blind Spot Among Pre-Service Mathematics and Science Teachers Mitchell J. Nathan Anthony Petrosino University of Colorado-Boulder University of Texas-Austin Submitted to ICLS 2002 Abstract It is widely accepted that subject-matter expertise is critical for effective teaching, especially in secondary and post-secondary education. Yet, there are few examinations of the pitfalls for instruction that may be ascribed to expert subject-matter knowledge. One concern is when...»

«Robert Brooks, Ph.D. The following is a version of a chapter that appears in Understanding and Managing Children’s Classroom Behavior: Creating Sustainable, Resilient Schools (2007) by Sam Goldstein, Ph.D. and Robert Brooks, Ph.D. published by John Wiley & Sons. Developing the Mindset of Effective Teachers In Chapter Two we outlined the key characteristics of the mindset of effective educators. We noted that the differing mindsets or assumptions that educators possess about themselves and...»

«Lesson 11 | 187 Lesson Plans Affixes With Unchanging Base Words Lesson 11 OBJECTIVES • Students will read words with affixes.• Students will form words with affixes. NOTE: This lesson focuses on base words whose spelling does not change when an affix is added. Base words whose spelling changes when adding a suffix (e.g., plan–planned, funny–funnier, make–making) are taught in a later lesson. MATERIALS • Lesson 11 letter cards* • Word cards from previous lesson (featuring base...»

«University of Nebraska Lincoln DigitalCommons@University of Nebraska Lincoln Dissertations and Theses in Statistics Statistics, Department of 8-2010 Estimating Teacher Effects Using Value-Added Models Jennifer L. Green University of Nebraska at Lincoln, jennifer.green@huskers.unl.edu Follow this and additional works at: http://digitalcommons.unl.edu/statisticsdiss Part of the Statistical Models Commons Green, Jennifer L., Estimating Teacher Effects Using Value-Added Models (2010). Dissertations...»

«Building Motivation in the K-12 Art Classroom Page Andrews July 01, 2011 Abstract Student motivation is a universal challenge among teachers of every instructional level and content area. Motivation, particularly in the art classroom, is a challenge for many art educators. My study looks to unwind the intricate web of student motivation and to identify effective motivational strategies that art educators use on a daily basis within their classrooms. This qualitative study, conducted in a...»

«2014 I ‘Teens and Tudors: The Pedagogy of Royal Studies’ Nadia Thérèse van Pelt University of Southampton Article: Teens and Tudors: the pedagogy of royal studies Teens and Tudors: the pedagogy of royal studies* Nadia Thérèse van Pelt Abstract: This article describes the outreach project Teens and Tudors: Performing Heywood's Play of the Weather in the Classroom Context, undertaken from February to May 2014, involving the University of Southampton, ICLON at the University of Leiden, and...»

«Understanding the Student With Asperger's Syndrome: Guidelines for Teachers Karen Williams University of Michigan Medical Center Child and Adolescent Psychiatric Hospital Understanding the Student with Asperger Syndrome: Guidelines for Teachers by Karen Williams, 1995, FOCUS ON AUTISTIC BEHAVIOR, Vol. 10, No. 2, Copyright, June l995 by PRO-ED, Inc. Reprinted by permission. Children diagnosed with Asperger syndrome present a special challenge in the educational milieu. This article provides...»

«RQ, Vol. 27 no. 4, 1988, pp.528-534. ISSN: 0033-7072 http://www.ala.org/ http://www.rusq.org/ © 1988 American Library Association. All rights reserved. Up the down staircase: establishing library instruction programs for teachers. Author: Nancyanne O'Hanlon. Nancyanne O’Hanlon is Head, Reference and Automated Services, Undergraduate Library at Ohio State University. Library-literate teachers, working cooperatively with school library personnel, could ensure successful instructional programs...»

«Teacher’s Guide to The Core Classics Edition of Jonathan Swift’s Gulliver’s Travels By Lisa Marshall Copyright 2003 Core Knowledge Foundation This online edition is provided as a free resource for the benefit of Core Knowledge teachers and others using the Core Classics edition of Gulliver’s Travels. Resale of these pages is strictly prohibited. Table of Contents Note from the Publisher Introduction to Jonathan Swift (1667-1745) and the Age of Reason Teaching Gulliver’s Travels...»

«Document extract Title of chapter/article One-To-One Student Interviews Provide Powerful Insights and Clear Focus for the Teaching of Fractions in the Middle Years Author(s) Doug Clarke, Anne Roche & Annie Mitchell Copyright owner The Australian Association of Mathematics Teachers (AAMT) Inc. Fractions: Teaching for Understanding Published in Year of publication 2011 Page range 23–41 ISBN/ISSN 978-1-875900-68-8 This document is protected by copyright and is reproduced in this format with...»

«3T firefighting Firefighters who are actively trying to stay informed on new developments in firefighting will raise some eyebrows when reading the title. “3T firefighting?” Shouldn’t that be “3D firefighting?” The term 3D firefighting was introduced in the 90’s by Paul Grimwood. He wanted to show that firefighting had evolved from a two dimensional environment to a three dimensional one. Techniques such as gas cooling were a result of this. 3D firefighting has led to the fire...»





 
<<  HOME   |    CONTACTS
2017 www.sa.i-pdf.info - Abstracts, books, theses

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.