If you wanted to be more objective about student and professor evaluation, you would have standardized measures of student performance across professors. In the rare case in which this is done, we learn all sorts of fascinating things, including things which raise questions about the unintended consequences of our evaluation systems.
Tyler Cowen points me to a paper in the Journal of Political Economy, by Scott E. Carrell and James E. West [ungated version].
In the U.S. Airforce Academy students are randomly assigned to professors but all take the same final exam. What makes the data really interesting is that there are mandatory follow-up courses so you can see the relationship between which Calculus I professor you had, and your performance in Calculus II! Here’s the summary sentence that Tyler quotes:
The overall pattern of the results shows that students of less experienced and less qualified professors perform significantly better in the contemporaneous course being taught. In contrast, the students of more experienced and more highly qualified introductory professors perform significantly better in the follow-on courses.
Here’s a nice graph from the paper:
Student evaluations, unsurprisingly, laud the professors who raise performance in the initial course. The surprising thing is that this is negatively correlated with later performance. In my post on Babcock’s and Marks’ research, I touched on the possible unintended consequences of student evaluations of professors. This paper gives new reasons for concern (not to mention much additional evidence, e.g. that physical attractiveness strongly boosts student evaluations).
That said, the scary thing is that even with random assignment, rich data, and careful analysis there are multiple, quite different, explanations.
The obvious first possibility is that inexperienced professors, (perhaps under pressure to get good teaching evaluations) focus strictly on teaching students what they need to know for good grades. More experienced professors teach a broader curriculum, the benefits of which you might take on faith but needn’t because their students do better in the follow-up course!
But the authors mention a couple other possibilities:
For example, introductory professors who “teach to the test” may induce students to exert less study effort in follow-on related courses. This may occur due to a false signal of one’s own ability or from an erroneous expectation of how follow-on courses will be taught by other professors. A final, more cynical, explanation could also relate to student effort. Students of low value added professors in the introductory course may increase effort in follow-on courses to help “erase” their lower than expected grade in the introductory course.
Indeed, I think there is a broader phenomenon. Professors who are “good” by almost any objective measure, will have induced their students to put more time and effort into their course. How much this takes away from students efforts in other courses is an essential question I have never seen addressed. Perhaps additional analysis of the data could shed some light on this.
Carrell, S., & West, J. (2010). Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors Journal of Political Economy, 118 (3), 409-432 DOI: 10.1086/653808
Added: Jeff Ely has an interesting take: In Defense of Teacher Evaluations.
Added 6/17: Another interesting take from Forest Hinton.
You make a good point about student evaluations being influenced by factors other than course quality, like the physical appearance of the instructor. Some instructors go so far as having pizza parties or distributing candy near evaluation time.
However, I still believe that consumer demand can be a powerful force of accountability in higher education — if it is informed. How can college students become shrewd consumers of education? What would increase their level of critical analysis and their desire for information that ensures them that they are receiving quality instruction?
Thanks for the comment Forrest. I hope my readers will check out your post which makes some good points about the policy and politics associated with this research.
I certainly don’t want to scrap course evaluations.
What questions should we be asking in the evaluations? For example, they really should have asked how much students studied for the class. We should also be looking for more objective data about, for example, how difficult a class is.
Another explanation may be that more experienced faculty draw more connections across content and/or domain b/c they have more insights into expectations and normal/average student development over time (esp. at the Academy).
They would also, assuming they have been at the Academy for some time, have a big picture understanding of how an introductory course relates to future studies. Thus, could present more challenging content or methods in the intro course vs. a novice faculty who may stick primarily to their prescribed curriculum.
We must remember too, that this was an intro course, cadets are dealing with a major life change outside of the classroom, increasing their need and desire for emotional supports. Therefore may develop more affinities towards younger, novice faculty (who may seem more supportive vs. challenging), leading to higher evaluations for the novice faculty.
Measuring student satisfaction i.e., if their own needs (cognitive and affective) are meet is quite different than measuring student learning over the short- and long-term.
The quandary imposed upon those that work within higher education, and who must depend upon faculty evaluation for tenure is essentially the same as the one present in K-12 surrounding grades/testing and merit pay/tenure/job loss, etc.: how do you measure learning “objectively” and if in fact you can, how do you measure teaching “objectively,” and if in fact you can, how can you build a causal relationship between the two that works for every cadet/student, and every faculty/teacher?
I ask myself every year I teach, how am I an accomplice to this practice? How important is it to me (tenure, grading, learning, evaluations, etc.)? And, what am I going to do about it? GG
GNA, though I am sympathetic to alternative explanations, I think you flesh out a couple plausible factors. There may be no perfect way to measure teaching and learning, but certainly there are better and worse ways. Let’s keep looking for better ones. Thanks for your comment.
It’s great to see some data arising from a research design that is able to control some of the complexity of the way the system often works. Mindy Marx, and economist (in a paper that I can’t find at the moment, but will try to link)shows rather strong connections between grades received and evaluations of the professor (positive association, no surprise), net of amount learned — her data are based on students taking the same before and after tests, but with self-selection of instructors (and she was able to link grades and student course evaluations).
Getting a reputation for easy grading probably exerts selection effects on which students end up in the class, when assignment isn’t random. Faculty often teach to what they perceive their students to want. So one might expect the results to look like they do without random assignment. What’s really interesting here is that they hold up under random assignment. It is amazing how many observational results can’t be replicated under controls.
Here, it looks like there are multiple, mutually re-enforcing processes leading to the same outcomes.