An automatic boredom detector? Inside “educational data mining” research


I’m currently working on a book about the past, present and future of assessment. For the “future” bit I get to talk to researchers like Ryan Baker at Columbia. He’s spent the last ten years working on systems that gather evidence about crucial parts of the learning process that would seem to be beyond the ken of a non-human teacher.

The basis for the observations comes from what’s called “semantic logs” within a computer learning platform, such as Khan Academy’s: Was it a hard or easy question?  Did the student enter a right or wrong answer? How quickly did they answer it? How did it compare with their previous patterns of answers? The detectors gather evidence that students are gaming the system, drifting off-task, or making careless errors. They can extrapolate a range of emotional states, like confusion, flow, frustration, resistance, (which Baker calls memorably “WTF” behavior), engagement, motivation, excitement, delight, and yes, boredom.

Baker’s engagement detectors are embedded within systems currently being used by tens of thousands of students in classrooms from K-12 up to medical school. (Medical residents, he says, show the highest rate of “gaming the system,” aka trying to trick the software into letting them move on without learning anything, at rates up to 38% for a program that was supposed to teach them how to detect cancer.) His research, located at the forefront of the rapidly expanding field known as “educational data mining,” has a wide range of fascinating applications for anyone interested in blended learning.

Understanding how good these detectors currently are requires a bit of probability theory. To describe the accuracy of a diagnostic test, you need to compare the rate of true positives to the rate of false positives. The results for the “behavior detectors,” Baker says proudly, are about as good as first-line medical diagnostics. That is, if the question is whether someone is acting carelessly, off task, or gaming the system, his program will be right about as often as an HIV test was in the early 80s–0.7 or 0.8 (“fair” according to this rubric). For emotional states, which require a more sophisticated analysis, the results are closer to chance, but still have some usefulness. These accuracy scores are derived from systematic comparison with trained human observers in a classroom.

So why would someone want to build a computer program that can tell if you are bored?

To improve computer tutoring programs. Let’s say a learning program provides several levels of hints before the right answer. You want to build something in that prevents a student from simple gaming techniques, such as pressing “hint, hint, hint, hint,” and then just entering the answer.

To give students realtime feedback and personalization.  “I would like to see every kid get an educational experience tailored to their needs on multiple levels: cognitive, emotional, social,” says Baker. Let’s say the program knows you are easily frustrated, and gives you a few more “warmup” questions before moving on to a new task. Your friend is easily bored. She gets “challenge” questions at the start of every session to keep her on her toes.

To improve classroom practice. Eventually as these systems become more common, “I would envision teachers having much more useful information about their kids,” says Baker. “Technology doesn’t get rid of the teacher, it allows them to focus on what people are best at: Dealing with students’ engagement, helping to support them, working on on one with kids who really need help.” In other words, though technology can provide the diagnostics for affective states that affect learning, it is often teachers that provide the best remedies.

To reinvent educational research: This is a fascinating one to me. 

“I’d like to see educational research have the same methodological scope and rigor that have transformed biology and physics,” Baker says. “Hopefully I would like to see research with, say, 75% of the richness of qualitative methods with ten times the scale of five years ago.”

Modeling qualitative factors related to learning opens up new possibilities for getting really rich answers to really interesting questions. “Educational data mining often has some really nice subtle analyses. You can start to ask questions like: What’s the difference in impact between brief confusion and extended confusion?”

In case you’re wondering, I will clear up the confusion. Brief confusion is extremely helpful, even necessary, for optimal learning, but extended confusion is frustrating and kills motivation.

The very phrase “data mining” as applied to education ruffles feathers. It’s helpful to hear from an unabashedly enthusiastic research scientist, not an educational entrepreneur with a product to sell, about this topic. Privacy, he says, should be given due consideration. “The question is what the data is being used for,” he says. “We have a certain level of comfort with Amazon or Google knowing all this about us, so why not curriculum designers and developers? If we don’t allow education to benefit from the same technology as e-commerce, all we are saying is we don’t want our kids to have the best of what 21st c technology has to offer.”

If you’re interested in learning more, Baker has a free online Coursera course on “Big Data in Education” starting this Thursday. Over 30,000 people have signed up.

McGraw-Hill executive on Big Data: “Don’t look at us, look at Joe Camel”


Jeff Livingston is a senior VP at McGraw-Hill Education, one of the “Big Three” education companies along with Pearson and Houghton Mifflin Harcourt. The privately held company has offerings that span textbooks, Common Core-aligned assessments and adaptive learning.

In the course of a free-wheeling conversation last week for my upcoming book, Livingston and I got onto the topic of student data and security, which has been much in the news lately. InBloom, the beleaguered cloud-based infrastructure service for making pooled student data available to third-party vendors, has been subject to claims that it has “”stepped in it on the issue of student privacy,” according to one report.  And last month a privacy group called EPIC presented arguments against the Department of Education in federal district court, challenging certain broadening of exemptions to FERPA, the law that governs student privacy, on behalf of new technology organizations including InBloom.

McGraw-Hill’s offerings along these lines include Acuity, which is billed as a “comprehensive K-12 assessment solution that measures the deepest levels of student learning aligned to Common Core state standards” promising “data-driven instruction” by integrating student records with assessment results.

The technology’s potential, Livingston says, is enormous.

“Think about what’s happening in data right now–Target knows you’re pregnant before you do, because of subtle changes in how often you buy and where you are in the store. When we finally get around to applying these statistical tools in education, it will be possible to track my understanding of algebra to the professor who taught my algebra teachers.”

That’s a bold claim, certainly. But what about privacy? Livingston’s number one defense: education companies are far from the worst offenders on student privacy. In most cases, parents voluntarily share far more information on their children than companies like his have access to. 

“The people worried about student privacy are fighting the wrong war,” he told me. “They’re asking who knows that Anya made a B in algebra. They should be asking, why is there a cartoon camel on the billboard on the street across from the school?

If you’re a data scientist who wants information on students and you have to come to McGraw-Hill, or Pearson, or even the school for information, you are a bad data scientist indeed. EA Sports, Nike and Verizon know a whole lot more about your kid than I ever will. Think about the cell phone with your name in the backpack of your elementary school kid. Someone knows everywhere that kid has been–to which stores, and which games it downloaded, and which numbers it called. It’s not an education company who knows those things.”

Livingston has a point. Seventy-eight percent of teens, and 20 percent of 6-11 year olds in a separate survey, have a cell phone. According to the Campaign for a Commercial-Free Childhood (which, incidentally, is part of the campaign against InBloom), private companies spend $17 billion annually marketing to children. And the FTC reported last year on data collection by mobile phone apps marketed to children, saying 60 percent of the apps they surveyed that were were transmitting information to a third party like an advertiser network, while only 20 percent disclosed anything about such transmissions to users.

But Livingston’s attempt to paint student data collection as the lesser of two evils was undermined by what he said next.

“I think people who are worried about [education companies] are looking in the wrong direction because of the elaborate privacy laws that have been in place for a whole generation. School is the safest part of the data day of any student. You can come to me and ask what my privacy policies are. Ask the manager of your local D’Agostino [grocery store] what his privacy policy is and I think you’ll get some looks.”

 It’s my guess that it’s exactly the idea of school as a child’s last remaining privacy refuge that has activists so worried. EPIC’s complaint about proposed changes to FERPA–those “elaborate privacy laws”–stems from suspicion about services like InBloom (although InBloom maintains that it’s fully compliant with existing regulation, with no changes in the law needed or sought). And, particularly in a high-stakes testing environment, data about my child’s school performance is just inherently more sensitive than data about which Pokemons she likes best.

InBloom is wilting thanks to privacy concerns–but they don’t stop with InBloom


In my first post for this blog I covered the splashy debut of InBloom at the SXSWEdu conference in Texas in March. I noted that it’s tough to explain exactly what the company does (essentially, they provide the infrastructure for a variety of smaller applications to harness the data generated by students to make their offerings more efficient and personalized). I also highlighted privacy concerns that are starting to surface about the collecting, repackaging and re-selling of student data for the benefit of for-profit companies.

Several months later it seems that both the inability to explain for the layperson what the company does, and the panic over privacy and security (underlined by the recent upheaval over NSA data mining), are dogging InBloom and may doom it. The number of partner and pilot states for the organization, initially listed at seven, is now down to five according to the website. And in at least two of those states, New York and Colorado, the idea faces vociferous local opposition. The American Federation of Teachers has stepped in, issuing a statement citing a “growing lack of public trust” in the company.

This debate is important, and as the AFT notes, it doesn’t stop with InBloom. The promise of big data for schools is not going away and neither are the perils, so perhaps it’s time to have a more grounded conversation about both the issues and the remedies at hand.

Recently I spoke with entrepreneur Jose Ferreira of the adaptive learning platform Knewton, another kind of big data company in education. During our conversation he said that from the ed-tech point of view, there are several types of student data. Each has different values and different dangers. (Ferreira separated out five kinds of data in his typology, but to simplify I’ll designate just three).

1) The first type is known as personally identifiable information: names, addresses, Social Security numbers. Exposure of this data generally is a security breach of the first order. It may be valuable for spammers, but it’s not all that useful to analyze for educational outcomes. Say you find out that girls named Alana from Phoenix do better in reading–that’s not generalizable. For this reason PII should always be well hidden and inaccessible.

2) The second type of data is the kind collected and tabulated by school, state and federal student information systems–let’s call it SIS. There is academic and behavioral information, like attendance, standardized test scores, suspension rates and class sizes. And there’s demographic data, like ethnicity, learning disability or IEP classification, and the percentage receiving free/reduced lunch. It’s very useful to correlate this kind of data with educational outcomes and interventions. It’s necessary for resource allocation. Because it pertains to groups, not individuals, it’s less sensitive than PII. But there’s still a chance for schools or groups to be stigmatized or stereotyped with the sharing of such information, so it needs to be released judiciously. No one is arguing that a particular student’s test score, for example, should be a state secret, but as with anything that appears on a transcript, its release should be controlled and limited to those who need to know. 

3) The third, and newest, type of data is the user interaction information collected by learning software systems like Dreambox, Khan Academy or Knewton. These systems record time on page and keystrokes and combine them with student responses to assessment questions to construct a picture of the engagement and proficiency of individual students and the efficacy of particular pieces of content. This is where you truly get into “big data.” Some of these systems claim to generate millions of data points per hour.

Let’s leave PII alone for a minute. The power of both SIS and “big data” to improve the practice of teaching and learning depends on aggregating and analyzing as much of it as possible, and making the relevant results available as quickly as possible to students, educators, parents, and the people who build these systems. The system we have today doesn’t do a very good job of this. Adequate Yearly Progress test results, for example, typically become available several months after a student takes the test. If big data is going to be useful at all, the privacy considerations attached to it have to be different because of the sheer volume and velocity at which it is generated. “Opting-in” often becomes impractical.

I would suggest separate tests be applied to determine responsible privacy and security considerations for student data. PII should always be separated out and kept hidden except when explicitly shared or agreed to by informed individuals. SIS and “big data” should be protected and its use disclosed, especially when it’s being made available for the enrichment of private businesses. (For example, I’m not a huge fan of the startup Junyo, founded by a former cofounder of the online game company Zynga, which has introduced a product that scrapes publicly available SIS data and sells the information to textbook and ed-tech companies for marketing purposes).

In all cases, we have to balance the potential harm to vulnerable young people with the potential gains to learning and teaching.



The five most important ed-tech trends at SXSWedu


I’ve been on the ground in Austin for the South By Southwest Education Conference & Festival for 22 hours. In that time, I’ve interviewed six people, chatted with many more, and hit the Java Jive in the Hilton four times. Here’s what I see as the biggest trends coming out of the conference.

  1. Data and analytics. There seems to be a consensus, which Bill Gates will no doubt highlight in his keynote tomorrow, that the most important potential—as yet unrealized—contribution of technology to teaching and learning is the ability to extract meaningful insights from the myriad information that students generate as they travel through life on their learning journeys: diagnostics, individualized goals and plans, demographic information, performance evaluations, and on and on from cradle to mortarboard. Companies like InBloom and Engrade envision a teacher working like a doctor, synthesizing reams of test results and other information with the help of tech tools to arrive at the proper intervention for the proper moment.
  2. Games and adaptive learning. What makes video games fun is that they get harder as you get better at them, keeping you in the right “proximal zone” between bored and frustrated. “In the gaming world, when you don’t get the right outcome, you don’t feel like a failure, you say how do I adjust,” says Dreambox CEO Jessie Woolley-Wilson. This is what is meant, at its simplest, by adaptive learning. Game-like learning platforms range from Dreambox, a math program that “puts the learning in front,” in the words of Woolley-Wilson, to Kuato Studios, which later this month is debuting a fighting-robot coding game made by designers who worked on Call of Duty. Games and adaptive learning are intimately related to #1, data and analytics. In some sense, what defines a game is simply that the players are keeping score, so a key feature of online learning games is the constant generation of data that can, in theory, be used by teachers and parents in coaching mode to help direct students. Taken together, #1 and #2 form the megatrend/buzzword of “personalization”—the “mass customization” of learning.
  3. MOOCs. While many in the education space might be sick of hearing about Massively Open Online Courses, Coursera, edX, et al, they are still adding users and shaping the public imagination about what’s possible when classrooms open a window on the world.
  4. Makers and creativity. I was pleasantly surprised to see a Makerspace onsite at the convention center, where you could drop in and play with Legos, circuits and homemade play-doh. This hands-on, amateur, DIY stuff taps into a deep need for learners to accent what is most fully human, even as we are increasingly overwhelmed by virtual worlds. In addition, John Maeda, president of the Rhode Island School of Design, hosted an influential panel on STEM to STEAM—putting the arts into STEM education. He’s argued that the forward march of technology will lead to a higher premium being placed on the personal, well-designed and handmade.
  5. Going back to the classroom. “Where are the districts?” “Where are the teachers?” Aside from a few leaders of charter schools I’ve run into, most of whom were presenting, my impression is that there are few full-time educators here, let alone people who make IT purchasing decisions for school districts. Many sense a fundamental disconnect on both sides between the innovation conversation going on here and the real needs of teachers in classrooms. Hopefully that will change soon.

Big data and schools: Education nirvana or privacy nightmare?

InBloom, a nonprofit start-up founded with funding from the Bill & Melinda Gates Foundation and Carnegie Corporation, is taking center stage and spreading around some significant funds as an official sponsor of the South by Southwest Education conference in Austin, Texas this week. It hosted the official opening night party on Tuesday, is sponsoring a “networking lounge” with free coffee and snacks at the Hilton next to the convention center, and is debuting the first live demonstrations of its technology with representatives from pilot districts and states.


Iwan Streichenberger

It’s quite a splash for what is basically a highly technical, behind-the-scenes infrastructure company. InBloom promises to bring all the potential of “big data” to classrooms in a big way for the first time. Its stated mission: to “inform and involve each student and teacher with data and tools designed to personalize learning.”

“We want to make personalized learning available to every single kid in the U.S.,” says CEO Iwan Streichenberger. “The way you do this is by breaking the barriers—making data much more accessible.”

But to some educational activists, InBloom represents a danger, not an opportunity.

InBloom began as the Shared Learning Collaborative in 2011. It gets a bit technical, but basically, 10 districts in nine states agreed to build a shared technology infrastructure. Currently, student data—from attendance to standardized test scores—are locked in dozens of different “student information systems” that don’t talk to each other. “In one district we work with in Massachusetts, teachers had to use 20 different assessment storage places with different log-ins,” says Streichenberger.

InBloom offers a single middleware layer that hosts student data using Amazon Web Services, with some centralized dashboard-style functions and an API (application programming interface) that would allow start-ups to build education apps, aligned with Common Core standards, that anyone could use. It’s a similar strategy to how Facebook and Apple allow outside developers to build apps that pull your profile information from the cloud. Instead of designing for  thousands of school districts across the country, all of whom have their own idiosyncratic data storage systems, the InBloom platform will eventually allow developers to build one application—like DreamBox, a differentiated math game, or Kickboard, a dashboard program that allows teachers to track students’ performance and behavior—and have it work automatically in several states. This coordination, in turn, is likely to attract even more technology entrepreneurs to a market for educational IT spending estimated to be worth $20 billion in 2013. And similar to the way that electronic health records promise to reduce costs and increase efficiency and effectiveness in medicine, the use of centrally hosted data, says Streichenberger, offers similar cost savings and improvements in education.

But the very moves that make this idea a huge opportunity from the point of view of edtech entrepreneurs—the ability to find a large market for learning games and systems all in one place, to pull student data automatically, and to coordinate effortlessly with other apps—makes parents “horrified,” in the words of school activist Leonie Haimson of Class Size Matters.

“There are no limitations on the time-frame, or the kind of data. There’s no provision for parental consent or opt-out. The point is to give our kids’ data away for free, and share it as widely as possible with for-profit ventures to help them market and develop their learning products,” she says. “For-profit vendors are slavering right now at the prospect of being able to get their hands on this info. and market billions of dollars of worth of so-called solutions to our schools.”

Class Size Matters has been working with a lawyer to get public access to the agreements between InBloom and the nine states that are members of the collaborative (New York, Massachusetts, Louisiana, Colorado, Illinois, North Carolina, Georgia, Delaware and Kentucky), to learn under what circumstances student data will be released, and whether there are potential violations of FERPA, the Family Educational Rights and Privacy Act, which generally requires written consent from parents to release the records of students under 18. They are also trying to get states to agree to opt-out policies so parents can withhold children’s information from InBloom, especially sensitive information like disciplinary records, health records, and personally identifiable details like addresses.

Streichenberger says that InBloom’s terms of service are fully compliant with FERPA, but privacy policies—including parental notification and opt-out—will be in the hands of individual districts, which will hold and control all access to the data that InBloom hosts. “Privacy is a very emotional issue,” he says. “I have two children, four and six. I would never join InBloom if I thought it would compromise my kids.” At the same time, he says, “The privacy discussion is an important one, but one of my concerns is it’s preventing the discussion of what’s going on in the classroom. Are we preparing the children for the future? Do we have the tools to prepare them for the jobs of tomorrow?”

So far, Haimson says, her group has generated thousands of letters from parents concerned about their students’ privacy to the Gates Foundation and to individual states. She says that her biggest problem in spreading the word is that many parents don’t believe this is really happening. “Parents are outraged and can’t believe it’s legal,” she says. “The tech companies and foundations shrug their shoulders. People are living on two separate planets.”

Note: The Bill & Melinda Gates Foundation and Carnegie Corporation are among the various funders of The Hechinger Report.