You want a hot discussion? Put librarians, Microsoft, Google, and Bob Stein from the Institute for the Future of the Book on a SXSW panel to talk about issues surrounding book digitization (and call the panel “Revenge of the Librarians”). An hour wasn’t nearly long enough for the conversation – and the diverse audience proved that the issues surrounding digitization aren’t limited to a small segment of the population.
Starting with the ideas of what happens after books are digitized and what the impact of a shrinking pool of knowledge might be, the panel started by discussing the elephant in the room (let me say that it was refreshing to see open back-and-forth dialogue between the panelists, unlike the normal nicey-nice stuff you see): Google’s book-related programs — Microsoft’s project isn’t online yet, so escaped detailed scrutiny. Dan Clancy, of Google, explained the various components of the initiative.
The goal for Google and Microsoft (other than making money, and that’s what corporations do) is to build indexes of authoritative works that will provide resources during search. To do this effectively, they need to have a lot of books digitized. This is an expensive and time-consuming process.
While Microsoft is working largely through the Open Content Alliance initiative, Google is digitizing books in the public domain, in association with publishers, and under non-exclusive deals with various major academic libraries. Public domain works are the most accessible while the works that remain under copyright continue to provide limited views. A major challenge under the library agreements is finding copyright holders. As with the motion picture industry, there isn’t always a clear trail; until the copyright owner can be found, much of what Google digitizes remains unseen. This, I think, is a big discussion that needs to be held sooner rather than later.
Members of the audience raised an issue about lack of remixability for the public domain works in Google’s library. Clancy noted that while the books can be printed, Google hasn’t fully determined its policies on re-use of the books by other sources. They continue to seek a balance between return on investment and community needs. Microsoft hopes to address this balance by realizing that the books are available in multiple places and the Microsoft advantage comes from better user experience.
Danielle Tiedt of Microsoft noted that approximately 5% of the world’s information is online. That’s not a lot, and there’s a serious challenge to digitize information before it’s lost. She also noted that given the cost and long-time frame of book digitization, from a corporate perspective, it would be beneficial to point their search engines to another source. There is no other source. Hence the Google and Microsoft initiatives.
Bob Stein expressed discomfort with having the control of the world’s knowledge in corporate hands. From my paranoid perspective, this is a fair question, and several audience members seemed to embrace the idea of allowing the public to digitize books. At first glance, this seems like an ideal solution — everyone can digitize one book.
But as I thought about it, I realize this is the least workable of all possible solutions. Corporate or non-profit interests can create and enforce guidelines that respect copyright, appropriate metadata, and systematic approaches. Asking members of the general public to digitize books will lead, however unwittingly, to many of the problems that are commonly associated with file sharing.
The grassroots-oriented audience members somewhat ceded that other than Project Gutenberg, there is very little in the way of organized digitization efforts for books. Even if more projects are initiated, there remains the fact of limited public money. To date, our culture hasn’t exhibited a commitment toward protecting artistic history. Whether we like it or not, the fact that Google, Microsoft, and their successors need this information to feed hungry search engines means that money and resources are being allotted to book digitization. Even publishers haven’t stepped up with a plan for protecting their own catalogs — and they surely have just as much interest in maximizing investment as the search companies.
Tiedt noted that some European countries are taking a more proactive approach in that governments are providing funding. The drawback with public financing comes in the form of appropriate government controls and making determinations of what should or shouldn’t be digitized. Public ownership of digitized public domain works would ensure that everyone has equal opportunity.
Some audience members noted that there wasn’t a clamor for certain types of resources; but who makes this determination? Microsoft and Google, turning to authoritative sources in books to create better search results, will surely see that they’ve opened a Pandora’s box when it comes to digitization — there needs to be even more done to answer search queries.
One issue not in doubt during the conversation was the fact that the world will continue to need librarians. Indexes of knowledge — and Google, MSN, Yahoo!, etc. are indexes, not libraries — will still require someone to make sense of all the information. Liz Lawley, the panel’s moderator, noted that librarians have a natural affinity for searching and organizing data. Clancy noted that search is not the same thing as finding.
This is a discussion that should have started loud and public ten years ago. I cannot imagine that it wasn’t being talked about. Of course, when it comes to issues like this, maybe ten years wouldn’t be enough time to find answers, either.