Thursday, June 7, 2007

Google Digitization and the CIC

Well, I'm glad to hear that Google is moving quickly with the big consortium of university libraries in the Midwest to do more digitization. The ivory towers on the coasts can't have all the fun, right? My academic home, is of course, the University of Wisconsin-Madison, and I grew up in Minnesota, so I have a soft spot for this group of libraries.

This time around they are doing selective digitization, based on collection strengths. On their press release page, they offer a description of collection strengths, which I found interesting. Nortwestern U has a big collection of Africana, for example. I was a little nostalgic noting that one of University of Wisconsin's great strengths is European History and Social Sciences. The beauty of studying modern German history there was that if you were looking for any book on Germany published in US or Europe within a certain timeframe (the 1950s-1970s I think) you could practically count on it being there. The comprehensiveness of the collection seemed to diminish as you hit the eighties and tighter university budgets took effect.

One thing not to overlook here is that this goes far beyond English-language content; these books will be useful well-beyond the English speaking world. There is going to be tons and tons of non-English material in this collection. I can recall shelves and shelves of books in Polish, Chinese, Russian, German, French, etc. wandering Memorial Library stacks in Madison. I imagine that American university libraries are the most effective place to start for collections that span the world's corpus of written works.

Lorcan Dempsey sees this as a big step. He rightly points out that with this comprehensive data, Google is going to be able to build services that no one else can:

However, as we are beginning to see on Google Book Search, we are really going beyond 'retrieval as we have known it' in significant ways. Google is mining its assembled resources - in Scholar, in web pages, in books - to create relationships between items and to identify people and places. So we are seeing related editions pulled together, items associated with reviews, items associated with items to which they refer, and so on. As the mass of material grows and as approaches are refined this service will get better. And it will get better in ways that are very difficult for other parties to emulate.
By "other parties" I think we can read OCLC, who is doing their best to leverage all of the data in WorldCat to develop structured relationships between intellectual works, their authors, and subjects. Will Google learn to do FRBR before OCLC does?

The libraries that are party to this deal get to keep the digitize texts and do their own things with it. Will this give these big universities a "strategic advantage" over some of their competitors? Does this mean that size still can matter in the networked environment? This reminds me a little of the NITLE initiative, the original intent of which was to overcome the disadvantage of small sized liberal arts Colleges in the information technology arena. Here's an example of an area in which us small folks can't compete--we just don't have much unique material in our libraries. But I guess the point is that everyone can access this stuff to some extent through Google.



No comments: