Wednesday, September 23, 2009

Summon 'web scale'? I don't think so.

I think it's strange that Serials Solutions is attempting to apply the "web-scale" adjective to their Summon Service.

As far as I can tell, the library community has really co-opted this term from its original use, which pertained to computing infrastructure that could support web sites that handle huge amounts of traffic. Perhaps Lorcan Dempsey widened the use of the term in January 2007:
'Web-scale' refers to how major web presences architect systems and services to scale as use grows. But it also seems evocative in a broader way of the general attributes of the large gravitational hubs which are such a feature of the current web (eBay, Amazon, Google, WikiPedia, ...).
This reference to 'web scale' is now at the top of Google results for the term, making me think that the library community has just about taken over the term.

I attended a webinar on Summon yesterday, and found out that with Summon, Serials Solutions creates a broad index of content available to your library: books, journals, digital collections, etc. It gets the data from your library uploading data and from the e content vendors with which your library has relations. The data goes in a SOLR index, which then can serve as a comprehensive discovery tool for your library's content. Because it is built on local data and tailored for a particular user community this sounds much more like an 'intranet' type search than anything that is "web scale."

WorldCat Local with its upcoming metasearch features does something similar, but I think that it can make a more legitimate claim to the "web scale" designation because it is attached to the WorldCat.org database. In my opinion, WorldCat.org is web scale in the sense that it is used and improved by a global community.

Summon and WorldCat Local are competing in the same discovery interface space. On first glance, it appears that Serials Solutions is ahead of OCLC in the incorporation of article content, perhaps because of their close relations with content vendors. OCLC seems to have the edge in books: they are able to leverage holdings data in relevance rankings and they have a more sophisticated treatment of various editions of the same work (FRBR). OCLC is also endeavoring to provide delivery services in addition to discovery.

It will be interesting to see if OCLC can use its global database and the Web 2.0 principle "it gets better the more people use it" to differentiate its product from competitors like Summon.

I don't think its obvious, but what OCLC is trying to do with WorldCat is much bolder than Serials Solutions and Summon. With Summon, libraries are basically throwing all of their content into one index to break down the data silos within an institution. But what you end up with is a big search silo for that institution.

With WorldCat, the vision is to break down not only the silos within institutions but also the silos between institutions. And not just break down those silos in the sense of harvest-and-search. The concept is that libraries and their patrons will be working together to improve a shared database through intentional and professional metadata. This shared database will be big enough to have a real impact on the web. Its records will surface in search engine results. Its interface will be familiar to many, and it will be customizable for a particular audience via the WorldCat Local route.

We'll see if this grand vision takes hold.

2 comments:

Paul said...

Hi Mark, all your points are valid, and while as a beta partner with Summon I haven't been paying as much attention to what OCLC is building, I can tell you a little more about Summon's model.

Summon's offers a base package of content; all the metadata they have acquired from their relationships with publishers and aggregators. In *addition* to that, they add an individual library's local data. The default search for any individual library is to search their local metadata plus the metadata from Summon's base package. There is also an option to "Add results beyond your library's collection" that pulls in content from all the other libraries that have loaded metadata, thus reaching towards the "web-scale".

So at Dartmouth's instance of Summon (http://dartmouth.summon.serialssolutions.com/) if I run a search on "Calgary" and exclude newspaper articles, I get just shy of 110,000 results. When I choose to expand the results beyond the base + Dartmouth holdings I more than double that to nearly 280,000 results. A large portion of those new results come from MY library at the University of Calgary. The more libraries that participate, the broader that second search becomes, getting closer and closer to true "web-scale".

Hope that helps clarify.

Mark Dahl said...

Thanks for your comments, Paul.

It sounds like Serials Solutions is building a big database using metadata from all Summon participants, thus there would be a kind of breaking down of data silos between institutions.

-Mark