I’m currently working on a book about the past, present and future of assessment. For the “future” bit I get to talk to researchers like Ryan Baker at Columbia. He’s spent the last ten years working on systems that gather evidence about crucial parts of the learning process that would seem to be beyond the ken of a non-human teacher.
The basis for the observations comes from what’s called “semantic logs” within a computer learning platform, such as Khan Academy’s: Was it a hard or easy question? Did the student enter a right or wrong answer? How quickly did they answer it? How did it compare with their previous patterns of answers? The detectors gather evidence that students are gaming the system, drifting off-task, or making careless errors. They can extrapolate a range of emotional states, like confusion, flow, frustration, resistance, (which Baker calls memorably “WTF” behavior), engagement, motivation, excitement, delight, and yes, boredom.
Baker’s engagement detectors are embedded within systems currently being used by tens of thousands of students in classrooms from K-12 up to medical school. (Medical residents, he says, show the highest rate of “gaming the system,” aka trying to trick the software into letting them move on without learning anything, at rates up to 38% for a program that was supposed to teach them how to detect cancer.) His research, located at the forefront of the rapidly expanding field known as “educational data mining,” has a wide range of fascinating applications for anyone interested in blended learning.
Understanding how good these detectors currently are requires a bit of probability theory. To describe the accuracy of a diagnostic test, you need to compare the rate of true positives to the rate of false positives. The results for the “behavior detectors,” Baker says proudly, are about as good as first-line medical diagnostics. That is, if the question is whether someone is acting carelessly, off task, or gaming the system, his program will be right about as often as an HIV test was in the early 80s–0.7 or 0.8 (“fair” according to this rubric). For emotional states, which require a more sophisticated analysis, the results are closer to chance, but still have some usefulness. These accuracy scores are derived from systematic comparison with trained human observers in a classroom.
So why would someone want to build a computer program that can tell if you are bored?
- To improve computer tutoring programs. Let’s say a learning program provides several levels of hints before the right answer. You want to build something in that prevents a student from simple gaming techniques, such as pressing “hint, hint, hint, hint,” and then just entering the answer.
-To give students realtime feedback and personalization. “I would like to see every kid get an educational experience tailored to their needs on multiple levels: cognitive, emotional, social,” says Baker. Let’s say the program knows you are easily frustrated, and gives you a few more “warmup” questions before moving on to a new task. Your friend is easily bored. She gets “challenge” questions at the start of every session to keep her on her toes.
- To improve classroom practice. Eventually as these systems become more common, “I would envision teachers having much more useful information about their kids,” says Baker. “Technology doesn’t get rid of the teacher, it allows them to focus on what people are best at: Dealing with students’ engagement, helping to support them, working on on one with kids who really need help.” In other words, though technology can provide the diagnostics for affective states that affect learning, it is often teachers that provide the best remedies.
-To reinvent educational research: This is a fascinating one to me.
“I’d like to see educational research have the same methodological scope and rigor that have transformed biology and physics,” Baker says. “Hopefully I would like to see research with, say, 75% of the richness of qualitative methods with ten times the scale of five years ago.”
Modeling qualitative factors related to learning opens up new possibilities for getting really rich answers to really interesting questions. “Educational data mining often has some really nice subtle analyses. You can start to ask questions like: What’s the difference in impact between brief confusion and extended confusion?”
In case you’re wondering, I will clear up the confusion. Brief confusion is extremely helpful, even necessary, for optimal learning, but extended confusion is frustrating and kills motivation.
The very phrase “data mining” as applied to education ruffles feathers. It’s helpful to hear from an unabashedly enthusiastic research scientist, not an educational entrepreneur with a product to sell, about this topic. Privacy, he says, should be given due consideration. “The question is what the data is being used for,” he says. “We have a certain level of comfort with Amazon or Google knowing all this about us, so why not curriculum designers and developers? If we don’t allow education to benefit from the same technology as e-commerce, all we are saying is we don’t want our kids to have the best of what 21st c technology has to offer.”
If you’re interested in learning more, Baker has a free online Coursera course on “Big Data in Education” starting this Thursday. Over 30,000 people have signed up.