Friday, November 17, 2006

Student Evaluations

Dave Munger of Cognitive Daily is discussing student evaluations in an article titled "Blink" methods now being applied in the classroom". The word "Blink" refers to the best-seller by Toronto author Malcolm Gladwell (an excellent book, BTW). Gladwell mentions a study by Nalini Ambady and Robert Rosenthal in (1992) where they exposed students to short video clips of a lecturer and asked for evaluations. The evaluations weren't much different from those done at the end of the term.

Unfortunately, Dave Munger seems to draw the wrong conclusions from this study as he explains in an earlier posting [The six-second teacher evaluation]. In that article from last May he says ...
So we do appear to be quite effective at making judgements about teaching ability even after viewing only a total of 6 seconds of actual teaching, and without even hearing the teacher's voice.
This is dead wrong. Students are good at evaluating something after six seconds but it sure as heck ain't teaching ability. It's probably whether the students like the teacher or not. We can make snap judgements about personality but not about ability. The correlation with end-of-term evaluations suggests that even after several months, students are still only evaluating the personality of the teacher and not teaching ability.

It makes no sense whatsoever to assume that students can judge how good a teacher you are from a six second video clip. How can they tell whether the lecturer is well prepared, knows the subject, writes fair exams, chooses the appropriate level of difficulty, and communicates important concepts?

The Canadian Association of University Teachers (CAUT) has developed a policy regarding student evaluations. The CAUT report discusses the pros and cons of student evaluations, including the Ambady and Rosenthal (1993) study. Here's what it says in footnote #10 ...
More recently, Ambady and Rosenthal (1993) report findings which point to the conclusion that student ratings of instructors can be strongly influenced by factors that probably bear only a slight relationship to critical dimensions of teaching effectiveness (though one must hasten to add that this is not the conclusion that Ambady and Rosenthal argue for in their study). They report that trained observers' evaluations of very brief segments (30 seconds or less) of silent videotape of college teachers yielded ratings of specific behaviors that correlated positively with students' ratings of the instructors. The experimenters found that appearing to be more active, confident, dominant, enthusiastic, likable, optimistic, supportive and warm, etc., in these "thin slices" of observation correlated positively with students' ratings of the instructors. In one of the experiments, student ratings of the instructors also were found to be "somewhat" influenced by the physical attractiveness of the teachers (p. 435). Whatever aspects of the teaching act have been accessed in this study, and no matter their positive relationship with student ratings, it must be obvious that there is more to effective teaching than demonstrating behaviors that can be documented in 30 seconds or less of silent videotape.
In his discussion of instructor personality and the politics of the classroom, Damron (1994) reviews the extensive literature that suggests that student ratings may be especially sensitive to students' perceptions of instructor personality or aspects of instructors' demeanour that bear little relationship to student learning or achievement.

9 comments :

  1. Yeah, and a hundred million people voted for Bush -- twice -- because they'd rather have a beer with him than the other candidate. The question is, how to teach people to be better evaluators? Is it possible?

    ReplyDelete
  2. This is dead wrong. Students are good at evaluating something after six seconds but it sure as heck ain't teaching ability. It's probably whether the students like the teacher or not.

    Actually the study conducted three separate experiments, with three different measures of teaching ability. In addition to student end-of-semester evaluations, the 30- and 6- second ratings were also correlated with principals' ratings of teacher performance.

    Now obviously what happens in 30 seconds isn't necessarily indicative of good or bad teaching.

    But consider this: I can watch a man run for 30 seconds and have a pretty good idea whether he can run a marathon.

    Does that mean we should train marathoners by having them run 30-second sprints? Of course not; it simply means that the hallmarks of good running can be spotted quickly.

    The point of the quick reviews is to conduct teacher evaluations efficiently, not to trivialize the job of a teacher.

    In my brief career as a teacher, the principal visited my classroom exactly zero times. Having him pop in for a few minutes every couple of days and offer constructive suggestions could very well have made me a better teacher.

    ReplyDelete
  3. Next question, though: is a principal's evaluation, based on a few minutes observation, any more accurate as a measure of teaching ability? Correlating two very superficial methods for evaluating teaching doesn't tell me much, other than that maybe trivial components of the teaching experience have the potential to dominate such measures.

    Another question: can you observe a field of reasonably fit individuals who have chosen to run a marathon, and tell me how they'll rank at the finish line?

    ReplyDelete
  4. Good points, PZ. They all might be superficial means of evaluating teachers. My take home message in the initial post was simply that poring over those handbooks compiling student ratings (or for that matter, ratemyprofessors.com) isn't going to do you any more good than just popping in on a class on the first day.

    Unless the principals involved were also rated, I think you're right--we don't know what went into their evaluations either. Though we might guess that the principals had access to test scores, parental complaints, etc.

    Predicting a marathon winner just based on the start would be harder, but you can generally very easily tell the difference between the numbnut who just wants to be on TV and the serious runner.

    Here's the point, though: are we looking for the #1 teacher, or just trying to distinguish between good and bad teachers? Surely 30 seconds isn't enough to tell in the first case; it might be in the second case.

    My point in the "Blink methods" post was to suggest that a lot of short evaluations might be better than a few longer ones. The way a lot of schools work these days, the principal (or in my case, just your department head) would sit in on one or two classes.

    Arguably, visiting 20 or 30 at random for 3 minutes each would give a better sense of teaching ability, even though you're spending no more time evaluating any one teacher.

    ReplyDelete
  5. dave munger asks,
    Here's the point, though: are we looking for the #1 teacher, or just trying to distinguish between good and bad teachers? Surely 30 seconds isn't enough to tell in the first case; it might be in the second case.

    Are you serious? Do you think you can distinguish between good and bad university lecturers by just popping in at the back of their class for 30 seconds?

    You must be amazing. I would have to attend several lectures to discover whether the lecturer was teaching concepts correctly and focusing on the right material. I'd also have to see their tests.

    The same thing applies to high school teachers.

    ReplyDelete
  6. Do you think you can distinguish between good and bad university lecturers by just popping in at the back of their class for 30 seconds?

    No, but according to the study I could do a reasonable job of it if I watched three, 10-second snippets taken from different segments of their class. That's actually something different. Note, too, that I'm not saying I'd base promotion, or really anything at all on such a judgment, just that I'd get it right more often than I got it wrong. I think if you tried it, you'd find it to be true as well.

    Your method of attending several lectures plus looking at tests would probably be more effective, but is impractical for a high school principal supervising 100+ teachers. In practice, it's never done. The choice isn't between your method and my method, it's between the method described in my second ("Blink") post and nothing at all.

    Note that that method, too, is more than just a 30-second evaluation. What the Ambady and Rosenthal experiments showed is that we can learn a great deal about a teacher in a short amount of time. We're now seeing principals beginning to apply it, but they're giving more than just 30 seconds, and they're repeating it a dozen or more times across the school year.

    Even in a college setting, most teachers are rarely evaluated in the manner you suggest, and when they are, typically right before the tenure decision, they are usually given advance warning. You think lectures given in such a high-stakes setting are actually representative of their teaching as a whole? I'd put my money on a much larger number of shorter evaluations.

    As I say in my post, evaluating performance is the easy part; getting teachers to change is more difficult.

    Also, it's important to distinguish between teaching ability and course content. They're two different things, and no one is suggesting that thin slicing is appropriate for assessing content. You could be the best teacher in the world, but if you're teaching "creation science", then you should be booted out of the classroom.

    ReplyDelete
  7. Or you could walk by the door of the classroom & sniff the air. Lots of information there.

    College undergraduates are masters of the superficial. The correlation is there between thin slice & end of semester because they are no better at the end of the term at evaluating the prof than they were at the beginning.

    ReplyDelete
  8. Let me take an evolutionary psychology approach--it looks to me like human beings are well-adapted to make rapid and accurate judgements about other human beings and their characters, a skill we might expect them to have evolved. The question is how this relates to teaching. Have we evolved a similar capacity to eyeball teaching effectiveness? This seems highly unlikely from an evolutionary psych perspective.

    What seems much more likely is that people who are confident, assertive, competent, enthusiastic, warm, etc. make good teachers--and not necessarily for their big brains. They may not have as rich a command of the depth and breadth of their fields as other professors who get lower evaluations, but their personal qualities make students come to class, pay attention and work for approval. This correlates with increased evaluation scores for personal attractiveness, as well. The terrible truth is that attractive instructors may actually be more effective teachers, precisely because they make students want to come to class, pay attention, and work for instructor approval. I have long been aware that my high teaching evaluations come almost entirely from my enthusiastic (near manic) energy levels in the classroom, rather than from any particular command of the subject or great pedagogical technique. Entertainment may seem like the most superficial and shallow of techniques, but it may perversely generate the right results.

    Finally, on a couple of occasions in my career I've had to begin teaching in a new field with little preparation (such as moving from literature to film studies)--perversely, my evaluations were higher in those courses. My guess is that, percisely because I didn't know the material very well, I was very interested in it, and my fascination was contagious.

    ReplyDelete
  9. It is true that students may end up working harder for a warm, energetic teacher. This teacher, because of his or her warmth and energy, will get higher teacher evaluations.
    Students also end up working harder for demanding but fair teachers. When students are expected to achieve at a higher level, they often do. The irony is that this type of teacher will often teach the most and make the most lasting effect, but get the worst teaching evaluations. Looking back on my own education, I realize I often gave the worst teaching evaluations to teachers who taught me the most.

    ReplyDelete