For years I’ve been using the same username for many websites but with different passwords. I did it for convenience but I also had this vague idea that I was crafting some kind of an overall online identity which would be uniquely identifiable as me, would be consistent over time and would serve as an informal history to build my technical reputation and credibility. But now that I see the results I don’t like it even though there are not any individual postings or fragments of data that I’m ashamed of or embarrassed about. It’s just that when I see them all together the effect is unsettling and feels like I’ve been under surveillance all these years.
In some cases I made either bad choices or misinformed decisions. For example, by way of Googling my name recently, I found my work phone number in the web archives of a members only listserv for people in my industry. I recall making the decision to put my phone number in my email signature because I was posting specific information that I thought would help guys doing my same job in other organizations. There are few enough of us that I figured I’d be happy to help if one of them were to call me to ask for more details or advice. The problem is that, while I knew that registered members (i.e., people in my industry) would be able to search the archives, I had no idea that the thread was going to end up on Google. That was just simple misinformed decision. But my initial settings on my Twitter.com account turned out to be a case of making a genuinely bad decision, then forgetting all about it.
Working in IT has made me pretty cynical about the usefulness and security of new technologies that take the world by storm. I avoided bothering to sign up for Twitter for longer than most people; Larry King was probably Twittering before I was. But suddenly, in mid-June of this year, I heard a news report about Iranian election protesters using Twitter to communicate the locations of antigovernment rallies and the IP addresses of sympathetic proxy servers for bypassing Iran’s web filters. I created a Twitter account in a hurry and began following the action as described by thousands of Iranian Twitterers in real time.
But the mistake I made was when I first created my Twitter account. I did not check the “Protect my tweets” box in the Account tab of the Settings page. I figured that if I was going to cheer for the good guys from the sidelines of this new use of Twitter then I should let everyone see anything I posted. Specifically, I was dissuaded from checking the box by the warning that, “…only those you approve will receive your tweets. You will not appear on the public timeline.” So I left it unchecked. I posted a few words of encouragement and advice over the next few days until I, and the rest of the planet, decided that the Iranian government must have by then, surely started to use or at least surveil Twitter to their own advantage.
That was the end of my Twitter usage as far as I was concerned. But a few months later some friends and I went to a multi-track IT security convention and agreed to communicate our locations and statuses to each other by way of Twitter. This is one of the best uses I’ve seen for Twitter. It worked out great for us. We were able to advise each other to avoid lame speakers, hurry to a certain room that was filling up fast before a popular speaker arrived, etc. Plus we could coordinate our after hours rendezvous each evening. Fortunately I did not tweet anything like “hey dudes I’m still @ the motel. I accidentally killed another hooker”, because, as I later found out, everything I posted to twitter ended up on google.
Back when I made the initial decision to not check the “Protect my tweets” box, I vaguely assumed that my tweets would be searchable by anonymous users on Twitter.com only. But I did not expect that when I chose to make texts searchable by other users that also meant Google. Worse, or rather, weirder, I’ve since discovered that my small collection of tweets have been indexed and linked to by several other sites, including some site that lists tweets associated with particular retailers. Apparently I am a good person to follow if you want to find deals at Neiman Marcus! WTF?
That last oddity is an example of a recent phenomenon of crawling and indexing the so called “deep web”. Basically, specialty websites post the results of hypothetical searches of other sites which have their own specific data. This is a big subject. It has other names too, and aspects other than what I describe here. I’m just now learning about how it works on the back end. But so far I’m seeing that these deep web links add a menacing and creepy feeling to the previously benign and narcissistic pastime of self-googling. One site listed my birth month and year, a few of my tweets, a research paper I co-authored, and my boss’ Facebook page. Yes, this collective and unintended dossier on me is invasive and aggravating. But if I was your 11 year old kid it would be time to panic.
But there is opportunity here too, for somebody. At least one site, Zabasearch.com, offers to remove a single public record from the web for $20. In my case it was the street address of an apartment I lived in during college. Will we see more of this in the future? Will we all be extorted into paying to have this kind of “deep web” data collection and correlation taken down from websites we’ve never even visited? Might it morph into a new spin on the old “protection” racket? I am not sure.
But one thing I am sure of is that, increasingly, the online identities and reputations we’re all cultivating, the deliberately crafted kind and the kind that accumulate haphazardly, will be yet another thing that somebody else owns the rights to.
I have many thoughts on this topic. I tend to think there’s no avoiding the aggregation of this sort of data, so I usually advise the following:
1. Take an active role shaping your Google results. If you work at it, you can control the first couple pages of results.
2. Alternately, since data removal is an iffy proposition at best, you can always flood the net with intentionally bad data.
I briefly considered creating a service called de-Google that would remove undesirable Google search results. I quickly realized that wouldn’t be possible in most cases. In the end, the best you can hope for is to outrank the results you don’t like.
Phew. I was reading this post and had assumed, incorrectly, that Kirk had written it. Halfway through I was thinking, “I never knew that about Kirk” and “I thought he was MUCH more active on Twitter”. Valuable lessons, and thanks for being someone other than Kirk. I was on the verge of rewriting my whole understanding.
I like Kirk’s idea of using brute force on Google. If you can get any sensitive or undesirable information off the first 5 pages of Google, I would think you would be OK. When was the last time you even hit the third page on a Google search? I see most people give up if the information they want isn’t the top search result.