Harvard professor Heather Hill and several research colleagues have published one of the more thoughtful pieces of research on value-added models and their validity for evaluating teachers on the basis of student test score gains.
The article, “A Validity Argument Approach to Evaluating Teacher Value-Added Scores,” appears in the American Educational Research Journal.
Using sophisticated quantitative measures, the researchers tested how well the value ratings of middle school math teachers correlated with their mathematical knowledge and quality of instruction. The results are compelling, revealing the utility of VAM in some circumstances, but also pointing to a serious flaw in the models because of the mis-identification of teachers.
The researchers found high correlations of VAM estimates among three different types of statistical models used to rate teachers. And, to quote from the abstract, they “found teachers’ value-added scores correlated not only with their mathematical knowledge and quality of instruction but also with the population of students they teach.”
Perhaps most importantly the researchers found that a large percentage of math teachers studied had very low ‘quality of teaching’ ratings, but very high VAM estimates. In this case, low quality meant operationally that examination of the teachers’ instruction revealed “very high rates of mathematical errors and/or disorganized presentations of mathematical content.”
Case studies of these teachers explained a lot of these “false positive” VAM results — results that could make such teachers eligible for significant performance bonuses in most merit pay plans (and, not insignificantly, send the message that their teaching practice was exemplary).
Case 1: This teacher has been trained as an elementary generalist, with 8 years of teaching experience. The case study recounts that she reads a problem out of the text as “3/8 + 2/7” but then writes it on the board and solves it as “3.8 + 2.7.” She calls the commutative property the community property. She says proportion when she means ratio. She talks about denominators being equivalent when she means the fractions are equivalent. In many instances, the teacher’s teaching “clouded rather than clarified the mathematics of the lesson.”
But she does exhibit the “tough-love style of behavior management” popular today in many high-need schools. She knows how to organize discussions of math among her students. If you entered a classroom as an evaluator she would be rated as a teacher who has high expectations — another reform mantra. The researchers were puzzled as to how she earn such a high VAM rating, suggesting that it may have been the consistent computational exercises she assigned (in other words, teaching to the test).
Case 2: This teacher has been teaching for four years, entering the profession as mid-career switcher. While he does not hold a degree in mathematics, he has completed many math courses in his academic training, and his previous work outside of education gave him practical experience in using math. Knowledge of math was not his problem. He understands math conceptually, but he has little skill in detecting problems in his students’ mathematical reasoning. And while he made frequent mathematical errors (some serious), often “his students quickly correct(ed) them.”
Overall the researchers concluded that there was “very little mathematics occurring” in his classroom as the teacher offers only the “briefest of mathematical presentations, typically referring students to the text and assigning a series of problems.” He only provided routine supervision of and feedback on student work. The researchers concluded that his high VAM ratings were best explained by the high ability of the students he taught, who had many other in-and out-of-school opportunities to learn math.
Conclusion: Based on the data analysis and case studies, the researchers offered a clear warning to policymakers about the proper use of value-added methods: “Although we do recommend the use of value-added scores in combination with discriminating observation systems, evidence presented here suggests that value-added scores alone are not sufficient to identify teachers for reward, remediation, or removal.”
Our own work with teacher leaders in a variety of CTQ programs suggests that many classroom practitioners are ready to accept and use VAM as part of a comprehensive teacher development and evaluation system — but not in any automated way. In a soon-to-be released policy paper, which includes expert insights from teachers Renee Moore, Marsha Ratzel, and David Orphal, we make the case that if teachers are able to unpack and use VAM data as part a comprehensive evaluation process, then more practitioners will embrace the tool.
Under our proposal, instead of a far-off statistician make a summative judgment of an individual teacher on the basis of a VAM rating, trained onsite evaluators (including expert teachers) will use VAM data in a non-automated way to make sense of who is effective or not, and why that’s the case, in the context of their teaching and working conditions.
Now is the time to cultivate the many accomplished teachers in our schools who can lead the way in ensuring that VAM’s potential as a useful evaluation tool is not ultimately lost because policymakers ignored the warnings of Heather Hill and other VAM researchers to do it right.
Reference: Hill, H., Kapitula, L. and Umland, K. (2010) A Validity Argument Approach to Evaluating Teacher Value-Added Scores. American Educational Research Journal.