Data and Privacy
Part 7 in my Top 10 Trends of 2014 series.
I’m not sure if I expected things to change substantially after last year’s revelations of the massive government surveillance by the NSA. But I guess I’d hoped that folks might be a little more cautious, a little more thoughtful, a little more skeptical about data and technology adoption. Americans are aware and increasingly concerned about the amount of data collection – by the government and by businesses. Some 39% of Internet users globally say they’ve taken steps to protect their privacy and security online.
But what effect has this had on education and the current policy efforts that demand data collection? What effect has this had on ed-tech?
Perhaps this sentence from Politico’s Stephanie Simon in an article on data-mining in educational products gives us a hint at what the answers to those questions might be: “The NSA has nothing on the ed tech startup known as Knewton.”
The Ideology of Data
We’ve been told – by politicians, by the Department of Education, by tech entrepreneurs, by investors, by industry analysts, by researchers, by journalists, by pundits – that “more data,” more data analysis, and more surveillance of students and teachers will “fix education.” (Whatever “fix” means.) “More data” – and by that, we often mean “more standardized testing” – has been a core part of US education policy at the K–12 level for over a decade now, and the demand for “more data” is seeping into higher education as well.
The adoption of more and more technologies in schools has some arguing that we now have an opportunity to collect more data than what we could glean from all those standardized tests. As testing giant Pearson wrote in its report on the “Impacts of the Digital Ocean on Education,”
“The devices and digital environments with which we interact are designed to record and store experiences, thereby creating a slowly rising ocean of digital data. We can imagine schools and individual learners using this ‘digital ocean’ to inform decisions about learning. As learners learn, they are able to collect information about their activities and get feedback about what they know and can do. Learning can occur in formal and informal contexts, and data can be drawn from both. In the digital ocean, we would expect to see data from all types of activities and contexts used to create persistent learner profiles, which could then be used to recommend future activity.”
A persistent learner profile. As in, this really will go down on your permanent record.
Although many companies are scrambling to cash in on the data-mining boom (see below), one of the most vocal about the amazing and incredible and pretty much totally unbelievable potential for data and analytics has to be the aforementioned Knewton. Once a test-prep company, it now offers an “adaptive learning” engine that many textbook partners, including Pearson (which is an investor), are incorporating into their existing products. This year, Knewton announced partnerships with Cengage, with the Turkish educational publisher Sebit, with Microsoft, with Scandanavian publisher Sanoma, with the Norwegian publisher Gyldendal, with publisher and former arms dealer Elsevier, with the Sesame Workshop, and with Latin American textbook publisher Santillana.
As Stephanie Simon writes, “The data analytics firm has peered into the brains of more than 4 million students across the country. By monitoring every mouse click, every keystroke, every split-second hesitation as children work through digital textbooks, Knewton is able to find out not just what individual kids know, but how they think. It can tell who has trouble focusing on science before lunch — and who will struggle with fractions next Thursday."
“We literally know everything about what you know and how you learn best, everything,” CEO Jose Ferreira says in a video posted on the Department of Education website. "We have five orders of magnitude more data about you than Google has. …We literally have more data about our students than any company has about anybody else about anything, and it’s not even close.”
I call “bullshit,” but hey, what do I know. (Not everything. Literally.) Yet ed-tech startups insist that, thanks to them, we’re cracking the code of how people learn, something into which educators never had any insight until this very moment in history. As the co-founder of survey startup Panorama told The New York Times this fall, “Education is just starting to figure out what measurement actually means."
The Business of Education Data
What these companies certainly are figuring out is how to make a lot of money – by rolling out a lot of new data-related products, and/or by raising a lot of venture capital) on education data. One approach to leveraging student data for profit was neatly summarized by Chegg CEO Dan Rosensweig who told his investors, “We’ll get the [student] data, get the credit card, and market our other products.”
(For a more complete look at “The Business of Ed-Tech,” see the second post in this series.)
I’ll explore “Who Owns Student Data?” in more detail below, but that question was particularly pertinent this year when ConnectEDU, a career planning company that was founded in 2002, filed for bankruptcy. The FTC expressed concern that the data of some 20 million students might not be protected as the company sold off its assets. Among those who acquired the pieces of ConnectEDU’s business: Symplicity, whose CEO pleaded guilty this year to conspiring to hack into his competitors’ computers systems. So yeah… more on the security of student data below too.
Data and Surveillance at School
Despite all the exuberance about the business of education data and the science of education data, it may well be that education technology is, in the words of edu-infosec advocate Jessy Irwin, “grooming students for a lifetime of surveillance”:
“When we develop and use educational technologies that monitor a student’s every moment in school and online, we groom that student for a lifetime of surveillance from the NSA, from data brokers, from advertisers, marketers, and even CCTV cameras. By watching every move that students make while learning, we model to students that we do not trust them– that ultimately, their every move will be under scrutiny from others. When students recognize that they are being watched, they begin to act differently– and from that very moment they begin to cede one small bit of freedom at a time.”
A few of the year’s school surveillance highlights: From The New York Times this fall, “At a New York state elementary school, teachers can use a behavior-monitoring app to compile information on which children have positive attitudes and which act out. In Georgia, some high school cafeterias are using a biometric identification system to let students pay for lunch by scanning the palms of their hands at the checkout line. And across the country, school sports teams are using social media sites for athletes to exchange contact information and game locations.” The ACLU and EFF accused a Tennessee school district of violating students’ rights with its new policy that “ allows school officials to search any electronic devices students bring to campus and to monitor and control what students post on social media sites.” A student in Minnesota settled a lawsuit with her school over claims that officials violated her constitutional rights by viewing her Facebook and email without her permission. Proof of concept for facial recognition technology in Moodle. Hidden cameras in Harvard lecture halls. Students on meal plans at George Mason University must be registered for the iris scanner in order to eat. Counterterrorism software marketed to campuses to help identify students who might drop out. Web filtering companies started to offer additional analytics “to give school leadership critical visibility into how students and teachers in their district are really using technology.”
The surveillance of students, and the surveillance of school employees as well. It’s often framed as safety and “protection,” but it’s not. Not really. It’s part of a new regime of “performance metics” (and performance anxiety) and surveillance solutionism. And it’s part of an old regime that is deeply intertwined with school as a disciplinary institution – the school-to-prison pipeline. Take the Huntsville City Schools in Alabama, for example, which “expelled 14 students last year based on the findings of a private contractor who monitored students’ social-media activity as part of greater school security efforts." 12 of them were Black. (Coming soon: Part 9 in this series which will address ed-tech and social justice.)
Data and Surveillance at Home
Surveillance at school. Surveillance at home. Surveillance at work. Surveillance at play. Oh, and start early, with the “smart nursery” perhaps. Use an app to become a “data-driven parent.” Coerce your child with the “Elf on the Shelf.” Let the cops install “Internet safety software” on your home computers so as to protect your kids online. What could possibly go wrong?
Anonymous Apps on Campus
One of the “hot new” trends in technology this year is the “anonymous” app, which ostensibly lets you post messages without revealing your identity. In light of all this surveillance at home and at school, it seems quite noteworthy to see the adoption of these apps by students. There are dozens of these, and the companies behind them have raised tens of millions of dollars. Secret: $35 million. Whisper: $60 million. Yik Yak: $73.5 million.
Although it describes itself generically as an “anonymous local bulletin board,” Yik Yak has been marketing itself directly to students and to college campuses. After a number of reports that it was being used for cyberbullying, bomb threats, and shooting threats against schools, in March Yik Yak blocked access to its app for high schools and middle schools, using geofencing around schools’ GPS coordinates. Many of those students have apparently moved on to another app, After School, which has already been yanked from the App Store a couple of times. These apps do insist that they have anti-bullying procedures in place, but these measures don’t really seem to work.
Despite efforts preventing the K–12 set from using its app while at school, Yik Yak seems to have no issue with how it’s being used on college campuses. (In response to some of the ugliness on the app, professors at Colgate University staged an intervention of sorts where they flooded the app with positive messages.)
The potential for these apps to be detrimental to a campus community or to an individual student doesn’t stop Silicon Valley investors and entrepreneurs though. Weaponized “anonymity” could be the next Facebook, after all.
I have “anonymity” in quotes because, guess what. Your identity isn’t really a secret.
2014: the year of the hack. Among some of the high profile data breaches: Sony. JP Morgan Chase. Celebrities’ iCloud accounts. The USPS. Target (OK, this was actually in December 2013). The European Central Bank. NOAA’s weather satellites. eBay. Home Depot.
But don’t worry. Education data is totally safe. Unless your data was here: the University of Maryland (hacked twice). Indiana University. Maricopa County Community College. The American Institutes for Research. Butler University. The Lewisburg Area School District. Some of these breaches were a result of “cyber-intrusions.” Many were a result of human error, like the admin at the University of Virginia Law School who accidentally emailed a spreadsheet to a student listserv containing “each student’s grade-point average, class rank, political affiliation, and much more.” Or a result of human negligence, like the officials at a charter school group in New Orleans who auctioned off school laptops without wiping them of students names, birthdates, and Social Security numbers.
There were some very significant technical reasons why our data was vulnerable this year: Shellshock, a bug in the UNIX bash shell, the “Heartbleed bug,” for example, which compromised the security of websites using OpenSSL, and “POODLE” which affected SSL 3.0. (Change your passwords. Do not use the same password for every application. Teach students good infosec practices. Practice them yourself.)
These last two protocols – OpenSSL and SSL – were designed to keep our data safe through encryption. Unfortunately, many applications, including education ones, send information unencrypted.
In October, for example, The Digital Reader’s Nate Hoffelder broke the story that Adobe Digital Editions had been collecting users’ reading data and sending it back to company servers in unencrypted text, with privacy implications for readers, for schools, and for libraries.
I wrote about a potential security vulnerability with Coursera in more detail in my post in this series on MOOCs, but I want to touch on the company’s response here, because it’s pretty par for the course: “Given our partnership philosophy, we have focused less effort on deflecting malicious attacks that might be made by one of our trusted partners. This has left open some gaps.” “It’s really not our fault,” was also the response from Dropbox and from Snapchat when users found their data had been leaked.
Privacy, Data, and the Law
Whose responsibility is it to protect student data? Schools? Technology companies? Parents? Students? Sadly, it’s not really clear.
The US Department of Education did release some new guidelines this year on protecting student privacy and on communicating with parents about data collection. The guidelines include helpful answers to questions like “Is student information used in online educational services protected by FERPA?” (Answer: “It depends.”) “What does FERPA require if PII from students’ education records is disclosed to a provider?” (Answer: “It depends.”) And this: “metadata that have been stripped of all direct and indirect identifiers are not considered protected information under FERPA because they are not PII” – even though it’s been demonstrated that it’s almost impossible to strip out these identifiers and metadata is incredibly revealing.
With FERPA protections being so weak and so confusing, it’s not surprising that several states made efforts to strengthen student privacy. In California, Governor Jerry Brown signed Senate Bill 1177 that would “prohibit education-related websites, online services and mobile apps for kindergartners through 12th graders from compiling, using or sharing the personal information of those students in California for any reason other than what the school intended or for product maintenance.” In Florida, Governor Rick Scott signed CS/CS/SB 188 that would phase out the usage of Social Security numbers as student IDs and ban biometric data collection. At the federal level, US Senators Edward Markey and Orinn Hatch did introduce the Protecting Student Privacy Act, which would have offered an update to FERPA, but LOL, it never left committee. Because you know what’s broken besides education? Congress.
Elsewhere in the federal government: The US Supreme Court ruled that police may not search cellphones without a warrant – will this affect schools that say they’re free to search students’ cellphones? The FTC fined TinyCo $300,000 and Yelp $450,000 for violating COPPA for improperly collecting kids’ data. The FTC also reached a $200,000 settlement with TRUSTe over deceptive labels and promises. The company purports to certify that companies handle data safely. Remember: just because that ed-tech company proudly displays a TRUSTe seal doesn’t mean your data is safe; it means that they paid for the seal.
Also in violation of COPPA apparently: Sasha Obama. That’s what we learned in June when Tumblr’s founder David Karp visited the White House. (She'd created an account before she turned 13.) No surprise, several companies including Apple, Facebook, and Google announced this year they were looking for “COPPA-compliant” ways to get those under age 13 to sign up for their services.
So what protections do laws like COPPA and FERPA really provide? When these laws are updated, to whose benefit accrue the revisions? And what happens as more and more learning occurs outside formal institutions? Is student data protected then? Or are the existing privacy loopholes going to be increasingly exploited by tech companies?
Who Owns Your Education Data? And Who Mines It?
Part of the problem with FERPA in particular is that it frames student data in terms of the “official educational record”: name, home address, grade level, dates of attendance, final grade – the sort of stuff that appears on a report card, for example. Not protected: metadata. And as I noted above, what is protected: well, “it depends.”
A group of tech and textbook companies did pinky-swear this year they’re not going to do bad things with students’ data. Amplify is on the list (incidentally, the trial involving its parent company News Corp wrapped up this year with editor Andy Coulson pleading guilty to phone hacking). Neither Apple nor Google nor Pearson nor Khan Academy signed the pledge. (Pearson did issue a press release “applauding” the effort.)
After a complaint was filed in the US District Court for the Northern District of California, charging that Google Apps for Education violated state and federal wiretap and privacy laws because it was scanning emails of students under age 18, the company announced that it would stop data-mining the emails in Google Apps for Education. “We’ve permanently removed all ads scanning in Gmail for Apps for Education, which means Google cannot collect or use student data in Apps for Education services for advertising purposes.” The company updated its Terms of Service in April to reflect that fact that, yes, “automated systems analyze your content” for the sake of ads, customization and security. These revelations have done little to dampen educators’ enthusiasm for Google Classroom and Google Chromebooks.
Why do I feel like every sentence in this post should end with ¯\_(ツ)_/¯
I mean, what’s a little data-mining between friends, right?
Unfortunately, some of the most popular ed-tech apps have pretty atrocious Terms of Service and privacy policies. But folks simply don’t read them, even when they’re compelling their students to use a particular product or service. Thankfully, Funnymonkey’s Bill Fitzgerald does, and he wrote a number of blog posts this year highlighting what you might be signing away – or asking your students to sign away – when you sign up: things like your location, your content, your contacts, and, in case of sale, all your data. BetterLesson. ShareMyLesson. GoNoodle. ClassDojo. Remind. Securly. Mevoked. Digedu, which Fitzgerald described as “about the worst I have ever seen.”
In response, the Digedu said they updated their terms.
It’s a pity – and pretty indicative of how lucrative the tech industry feels that data will be – that these privacy concerns aren’t considered from the outset. But hey, you could end up like iParadigms, maker of the anti-plagiarism software TurnItIn, and be acquired for hundreds of millions of dollars, all on the backs of the students whose content and data you’ve extracted. ¯\_(ツ)_/¯
Public Data / Private Data
One of the challenges of education data is that it lives in this weird and fragile intersection between public data and private data. As such, efforts to “open” education data can be pretty fraught. Which data gets opened? Whose data gets opened? And opened to whom?
I know those of you who pay close attention to ed-tech and data and privacy are scanning this article and asking “When are you going to write about inBloom, Audrey?” “5000 goddamn words so far and you haven’t mentioned the Gates Foundation or inBloom!” You are right. I have not. I actually did hammer out 600 or so words in last year’s privacy round-up on the data infrastructure initiative, which even then seemed doomed to fail. inBloom did finally officially close its doors in April of this year, a result in part to parent outcry about privacy – education data is private, they contended, and should not be shared publicly, should not be shared with third-party providers.
But should all education data be private? What data, if any, should be available – to researchers, to journalists, to technology providers, to parents, to the government?
What about teachers’ data? (A three-judge state appellate court panel ruled this year that The LA Times could not have access to teachers’ job performance data.)
What about administrative data? (The LAUSD School Board passed a measure this year to delete all its emails after one year – a conveniently timed move considering that open records requests into emails had found possible improprieties with how the district handled its billion-dollar iPad program.)
And what counts as public education data? (The Virginia Supreme Court ruled that a University of Virginia climate scientist’s emails were protected and exempt from public records requests. An anti-climate change organization had sought the researcher’s emails, “hoping that exchanges among climate scientists might cast doubt on views of the environment backed by most scholars.”)
From Data to Information Justice
In November, I delivered a keynote at OpenCon where I argued that we need to shift the conversation from “open data” to “information justice.” It’s an argument that draws heavily on the work of Jeffrey Alan Johnson . I think we also need to shift from "privacy" to "information justice" as well. I plan to write more about ed-tech, ethics, and social justice in Part 9 of this series. But I want to stress this here too. So I’ll quote me at length, because I can:
…I want to raise more questions about the data itself. Data is not neutral. Data — its collection, storage, retrieval, usage — is not neutral. There can be, as Jeffrey Alan Johnson argues, “injustices embedded in the data itself,” and when we “open data,” it does not necessarily ameliorate these. In fact, open data might serve to reinscribe these, to reinforce privilege in no small part because data, open or not, is often designed around meeting the needs around businesses and institutions and not around citizens, or in this case students.
What “counts” as education data? Let’s start there. What do schools collect?
As I said earlier, the inBloom data spec included hundreds of data points. A small sampling: Academic Honors, Attendance Type, Behavior Incident Type, Career Pathway, Disability Type, Disciplinary Action, Grade Point Average, Incident Location, Personal Information Verification Type, Reason for Restraint, Eligibility for Free or Reduced School Lunch, Special Accommodation, Student Characteristic, Weapon Type.
I think it’s clear, as I list these, that the moments when students generate “education data” is, historically, moments when they come into contact with the school and more broadly the school and the state as a disciplinary system. We need to think more critically, more carefully about what it means to open up this data — data that is often mandated by the state to be collected — to others, to businesses. Again, is “open data” about liberating data, as the Department of Education suggests, "to spur entrepreneurship, create value, and create jobs while improving educational outcomes for students”
As Johnson argues, “the opening of data can function as a tool of disciplinary power. Open data enhances the capacity of disciplinary systems” — and school certainly functions as one of those — “to observe and evaluate institutions’ and individuals’ conformity to norms that become the core values and assumptions of the institutional system whether or not they reflect the circumstances of those institutions and individuals."
Did you speak out of turn in class? Are you a child of an illegal immigrant? Did you get written up for wearing a halter top? Are you pregnant? Did you miss school? Why? Why? Why?
What classes did you take? What grades did you make? Why? Why? Why?
(Is the answer to “why” a data point? And — here’s the rub — is that “data point” ever connected to an ethics of care or a sense of social justice?)
Education data often highlights the ways in which we view students as objects not as subjects of their own learning. I’ll repeat my refrain: education data is not neutral. Opening education data does not necessarily benefit students or schools Or communities; it does not benefit all students, all schools, all communities equally. Open source education data warehouses are not neutral. And similarly, the source code does not benefit students equally.
I want us to consider how we will collect and analyze data ethically. I want us to consider the implications for civil rights and social justice, as powerful institutions turn to algorithms and "data-driven decision-making." What if the data is bad? Will more data fix that? What are the politics of more data, “better data,” about sexual assault on campus, for example? What are the politics of more data, “better data” about discipline and the school-to-prison pipeline? What are the politics of more data, “better data” about the cost of college? What can data really “do”? What can it not do?
These questions hint at why I think a blanket demand for “more privacy” is over-simplistic and even troubling. Context matters. In July, for example, the Parent Coalition for Student Privacy announced its formation. Among those in the coalition, which said its mission was to “involve parents in the decision-making process to ensure that their children’s privacy is protected”: Diane Ravitch, (small class size advocate) Leonie Haimson, the (anti-vaccine) Autism Action Network, the (anti-LGBTQ, anti-immigration, anti-public pension, anti-choice) American Principles Project, and Joy Pullmann the editor of School Reform News (who’s tweeted gems like this).
Should parents be able to opt-out of sharing their children’s immunization records with schools, for example? What happens if a queer student turns to a teacher for support about coming out? Should a school be obligated to share that info request with a parent under the auspices of “parental consent”? What sorts of student privacy advocacy might one expect from an organization that is anti-immigration and anti-choice?
Data collection isn’t politically neutral. Neither is privacy. If we don’t pay attention to equity, then privacy will persist... but likely as premium feature.
Data, Privacy, and the Future of Ed-Tech
Last year, I rounded out my “Data vs. Privacy” round-up with a series of questions I felt as though education technology must confront. They’re all still relevant:
What role will predictive modeling and predictive policing have in education? Who will be marked as “deviant”? Why? Against whom will data discriminate? What role does privacy play – or phrase differently: what role does a respite from surveillance play – in a child’s development? How can we foster agency and experimentation in a world of algorithms? What assumptions go into our algorithms and models? Who builds them? Are they transparent? What can we really learn from big data in education? Bill Gates says big data will “save American schools.” Really? Save from what? For whom? Is all this data hype just bullshit? Who owns education data? How well do schools protect student data, particularly as they adopt more and more cloud-based tools? What happens to our democracy if we give up our privacy and surrender our data to tech companies and to the federal government? What role will education play in resisting or acquiescing to these institutions’ demands?
I don’t think we've answered or resolved any of these in the past 12 months. Except for this one thing – thanks to data wiz Nate Silver who finally launched his new FiveThirtyEight site in March: it’s “data is” not “data are.” So that's progress.
Special thanks to Bill Fitzgerald for reading the Terms of Service. This post was first published on December 15, 2014.