Assessing The Impact Of the AOL Data Spill

It’s been just over a week since we learned that AOL inadvertently released three months worth of search history for 670,000 users. While the furor has died down slightly, it seems likely that we’ll be hearing about this issue for quite some time. So much data was revealed that it could take a while before we fully understand the implications, and just how much damage may be inflicted on individual users. As scandals go, this one shows no sign of going away any time soon.

Early reaction to the news has been surprisingly varied. Data researchers were initially gleeful at the opportunity to work with extremely large data sets generated in a real-world environment. Privacy advocates were outraged and pointed out that while the data was anonymized, the search queries themselves can be used to identify users. Meanwhile, a large portion of the blogosphere has become fascinated with user 17556639 and his apparent plans to kill his wife. It’s entirely possible that law enforcement may be searching the logs for possible signs of criminal activity.

Here’s a brief summary of some of what’s happened since the news broke last week:

  • AOL has issued an apology with a promise that it will never happen again. AOL’s Jason Calacanis reacted swiftly with an honest appraisal of the situation and a suggestion that AOL stop logging search data. He followed up with a frustrated account of the hard times at AOL. He’s since gone into a self-imposed restriction from blogging on the topic until he’s had time to cool off. He might be wondering what he got himself into by joining AOL right about now.
  • The New York Times demonstrated just how easy it is to identify individuals from the anonymized search histories. The paper published an interview with searcher 4417749. She’s a 62 year-old Georgia widow with plenty of quality search time on her hands. She was stunned that AOL released the data and plans to cancel her account.
  • With the search queries now out in the wild a variety of tools have sprung up to help users mine the data for interesting insights. AOL Search Logs provides users with a rudimentary search by keyword and user ID, while SEO Sleuth helps search engine marketers perform keyword and domain analysis. By far the most interesting of these tools is AOL Psycho, a collaborative tagging project that invites users to write personality profiles based on individual user’s search history. The site features a ‘random user’ link that allows you to quickly jump to a user who hasn’t yet been profiled. The site is scary, sad, and sometimes very funny.
  • The mainstream media has seized the opportunity to raise questions about individual users and their intentions. However, as the NYT profile on user 4417749 demonstrated, a user’s intentions can’t always be determined simply by what he or she types into a search box. Regardless, there seems to be plenty of room for sensationalism, and it’s only a matter of time before someone at Fox news jumps on this bandwagon.

Perhaps the most surprising thing about this past week is the fact that so many people seem surprised that something like this could possibly happen. Apparently most people aren’t aware of digital footprints they leave behind as they move around the Internet.

If anything, this past week has left us with more questions than answers:

How could something like this happen? Are we really to believe that an AOL engineer with only the best intentions acted unilaterally to release this data? If that’s the case, what’s to keep another employee with incredibly bad judgement from doing this again? Or worse, an employee with a grudge against the company. And it’s not just AOL we have to worry about. This could happen at any other search company or website that maintains a substantial amount of user data. The availability of this sort of user information is fertile ground for blackmail and extortion. It’s becoming clear that companies that maintain this sort of data need to maintain tighter control over internal access.

Is it wrong to look? Reality TV has turned us into a nation of voyeurs. While I hate to admit it, I can’t help being fascinated by some of the search profiles that I uncovered after just a few minutes of poking around AOL Psycho. Like user 5055627 for example. She’s a future bride from Tyler Texas planning a September wedding. She’s searching for a polka dot wedding cake with a motorcycle topper. Along the way she reveals that she REALLY hates her mother-in-law and isn’t quite sure how to dress for the fox races (fox races??). Meanwhile, user 317966’s mother-in-law might want to keep her distance. That guy needs help.

What should be done with the data? Now that the data has escaped there’s no way for AOL to take it back. While the data was originally intended for search engine researchers it’s now being used by anyone who cares to access it. AOL could try to track down all of the rogue sources, but that would almost certainly turn into a never ending battle.

Ultimately, I think the best we can all hope for is that consumers will become more aware of the data that corporations collect, and will take more of an active interest in how that data is maintained.

While AOL may be the first to spill this much user data onto the Internet, something tells me they won’t be the last. As long as this type of data exists there is always a chance that a leak like this will occur – either overtly or covertly. As we’ve seen, it only takes one employee to make a boneheaded mistake.

The potential for disaster is there with almost any service you use, online or otherwise:

  • Users of social networking applications routinely give up a huge amount of personal information.
  • TiVo users have long been concerned about the amount and type of data the company maintains on its users.
  • The very data that allows Amazon to serve up personalized product suggestions could also be abused in any number of ways if it were to be made public.
  • iTunes is now reportedly exploring the possibility of allowing users to share playlists and purchase histories. Seems innocent enough. Until your wife finds out you’ve been listening to Soccer Girl or watching French Maid TV (in which case you’re supposed to tell her you were learning CPR).

I could go on, but you get the idea. Everything about you is being logged and stored in databases around the Internet. You may not think you have anything to hide, but you might change your mind when your data is leaked.

Tomorrow I’ll tell you about some ways you can protect your privacy online.

2 Responses to “Assessing The Impact Of the AOL Data Spill”

Trackbacks/Pingbacks

  1. […] Yesterday I wrote about what we’ve learned from the recent AOL data leak. By now you’re probably aware that the risks to your personal data online can be rather significant. If you value your privacy you’ll want to take precautions to protect yourself online. Here are a few things you can do to minimize the damage the next time a major data spill occurs: […]

  2. […] created by people who are “searching” for things. Like, well, it’s scary what people seek online. You should be looking at the person next to you with appropriate fear and loathing. Mostly […]