Thursday, March 22, 2007

Economist on the future of books

This article from the Economist contains a fairly nuanced set of predictions on the future of the book.

Interesting point that people often package a 50-page idea into a 300-page book because those are the economics of the medium.

The observation that the electronic medium offers lots of benefits in the scholarly environment, but not so many in the recreational reading one, would seem to bode well for physical books in book stores and public libraries but not so much for those in academic libraries.

Monday, March 19, 2007

discovery services at the network level

Peter Brantley mentions that someone who went to RLG's D2D conference made this comment:

The conference covered a wide range of issues, and was very interesting. I think the consensus opinion at the close was that discovery has moved to the network layer and libraries should stop allocating their time and money trying to build better end-user UI, and concentrate instead on delivery, and their niche or customized services such as digitizing special collections, providing innovative end-user tools for managing information, and so forth.
Interesting observation. I'm thinking about how this comment relates to lots of dispersed efforts such as Scriblio to remake the library OPAC using technologies like SOLR. These are important pioneering efforts but will they stick around long?

More specifically, I'm thinking about it in terms of the options available to replace the interface to our OPAC at L&C, and ultimately the Summit Union catalog that we share with other Orbis Cascade Alliance members. The Alliance, through the Summit Catalog Committee, is considering lots of different options, including III's Encore, Endeca, OCLC's forthcoming WorldCat Local, as well as local development options. The statement above, in my mind, would be an argument for WorldCat Local.

WorldCat Local (not sure if this is the correct product name), from what I've heard, is a version of the catalog that is scoped down to your own holdings and optionally, those of your union catalog. It allows you to use your local system for delivery. By using a discovery service like WorldCat local, we would be tapped into OCLC's web-scale bibliographic database and could, theoretically, benefit from "network effects" only available on that platform. One network effect readily available would be the most up-to-date version of any bibliographic record. But there could also be other network effects, for example, Web 2.0 ish stuff like tagging and comments that really only become useful on a wide scale. And maybe they will offer stuff like FRBRization and relevance ranking that takes advantage of the intelligence available in that size of a database.

OCLC, in my mind, has always been sort of a slow moving behemoth. Using many of their services, such as LDR updating, is often painfully slow and cumbersome. But they appear to have turned a corner on their OpenWorldCat program, especially some of the APIs that they have released recently. I'm listening to them with an open mind.

Thursday, March 15, 2007

Library as Platform

Another phrase that I caught at code4lib was "the library as platform." It came from Terry Reese via Jeremy Frumkin, I think.

This got me thinking about all the web applications that we run here at Watzek and how they hang together. There are a couple reasons to be thinking about this now, especially. One is that there are now two (three if you count the Law Library) programmers working on the coding and database creation. And two, we're thinking of redesigning our website soon (even though we just we're named the "College Website of the Month" by ACRL's college library section).

We're pretty small scale here, but I still think that it is productive to think of our digital environment here in terms of a platform. My idea is that we should identify the large building blocks of our environment so that we're not reinventing things every time we create a new application.

Some components of the platform:
  • LAMP (throw postgres in there)
  • An up-to-date XHTML based web design and corresponding set of CSS classes that can be applied to a wide range of situations. This will let us spend our time building applications rather than tweaking design details.
  • Some form of template system (currently dreamweaver + PHP includes) for applying the design uniformly
  • databases:
    • the ILS database
    • serials knowledgebase
    • databases which are subsets of the ILS like our newbooks database and our A/V database
    • database of electronic resources that drives our subject web pages
  • Campus LDAP authentication applied via PHP and Apache mod_auth_ldap
  • PHP classes and conventions for common tasks (like authentication via LDAP, passing data about a citation from one application to the next)
  • common Expect scripts (mainly for data extraction from the ILS)

Many applications (ReservesDirect, our homegrown CMS for e-resources, new books database, etc.) are running on the platform and leverage its resources.

the library workshop

Watzek Library's ILL department just had a visit from some folks at the Multnomah County Library, the big public library system serving most of the Portland Metro area. They were interested in a little hack to our Clio interlibrary loan system that allows us to check out ILL books on our III integrated library system without any rekeying.

As Jeremy and Jenny showed the application in action, I was really admiring the resourcefulness of it. Nothing fancy--just the DIY powertools of the digital workshop: Expect (to talk to a legacy black box III system), MySQL, PHP, and especially a PHP function to create barcode images. Normal ILL workflow for an incoming book is followed by a couple simple steps, and pretty stickers with barcodes that correspond to records in our ILS emerge.

We're thinking about the software platform (basically LAMP), which powers our website. Should we go to something like Rails? Or should we stick with the simple, blunt LAMP instruments that we're used to.

Friday, March 9, 2007


Gotta love the name Freebase.

Along the same lines of the Talis Platform (at least in my mind), some guys are starting up a company that will develop a global database that allows for complex relationships to be established between the data within it. The company is called metaweb, but their site isn't too revealing.

Here's what Tim O'Reilly has to say about it:

“It’s like a system for building the synapses for the global brain,” said Tim O’Reilly, chief executive of O’Reilly Media, a technology publishing firm based in Sebastopol, Calif.

Google Book Search and rank

Some interesting notes from a talk by Google at the Future of Bibliographic Control Committee (the name just oozes dullness). Dan Clancy from Google said that Google is having trouble relevance ranking books in its book search because it can't rely on the link structure of the web to support relevance ranking.

Thursday, March 8, 2007

data stores

At code4lib, Talis was promoting their platform. It's based on the concept of "stores", which are basically large bodies of data stored on Talis' computers. The advantage of putting your data in these stores is that it can be queried, searched, and related to other data in numerous ways.

Some of what they say about their platform:
Large-Scale Content Stores

The Talis Platform is designed to smoothly handle enormous datasets, with its multiple content stores providing a zero-setup, multi-tenant content and metadata storage facility, capable of storing and querying across numerous large datasets. Internally, the technology behind these content stores is referred to as Bigfoot, and there is an early white paper on this technology here.

Content Orchestration

The Talis Platform also comprises a centrally provided orchestration service which enables serendipitous discovery and presentation of content related according to arbitrary metadata. This service makes it easy to combine data from across different Content Stores flexibly, quickly and efficiently.

This all seems rather nebulous when you first think about it, but slowly, the usefulness of the concept begins to reveal itself. They discussed a little bit about how this platform is supporting Interlibrary Loan at UK libraries because it provides a way to query across different libraries.

My question is, do libraries really have enough of their own content to leverage a platform like this? All we really have is generic data about books and journals and specific data about what libraries holds them.

I wonder whether this kind of service would most useful if a player like Google offered it. Why Google and not Talis? Because they have huge amount of data already amassed from web crawling, publisher relationships, not to mention scanning books. Think about the opportunities that would present themselves if you could query specific slices of Google's content alongside your organization's own data? What if Google hosted research databases as stores and you could slice them up, query them, and relate them ala the Talis platform?

Essentially, a library could create its own, highly tailored searching/browsing/D2D systems.

Maybe I'm asking for too much.

Friday, March 2, 2007

standing on the sholders of giants

Casey Durfee's presentation on "Open Source Endeca in 250 lines or less" was pretty cool. How could he create a "next-gen" faceted catalog with such little relying on Solr and Django to do the heavy lifting. Because Solr indexes XML natively, no relational database is even necessary. One of the things, generally speaking, I'm looking for at this conference is ways that we can leave the complexity to other applications.

Thursday, March 1, 2007

proximity and the network

Dan Chudnov gave a talk on making library resources available for sharing like itunes does on a LAN. It was hard to immediately sense the value in this. He spoke of walking into a library and having access to the whole of the library. Isn't that what we get through our digital presence on the web?

But thinking about it more, I like the idea of our computers being able to sense services and resources based on proximity. What if you met you met a group to study and when on the same wireless network, had immediate access to others' personal digital library on an application like Zotero or the like. What if when you walked through a physical library, the web presence of the library changed based on the section of the building you're in. Suppose you're studying in the East European Language Reading room late at night and you notice that somebody else has a similarly esoteric set of references on Polish intellectuals in their shared digital library...and perhaps that's her across the room. Could be a good way to get dates.

why code4lib?

Despite the fact I'm kind of burnt out on writing code, I find code4lib to be one of the most invigorating conferences I've attended in the last few years. Why? I think it's because it's where the new opportunities in the broader web world meet the digital library world.

Some interesting ideas that have come up this year:
  • the SOLR platform for indexing and faceting a library catalog or a digital library of anything really, XML based
  • The Talis platform's concept of data "stores": large bodies of xml data that can be queried and related to data in other stores in an unlimited number of ways using "web scale" infrastructure
  • the idea of hooking up openurl resolver type services as a microformat
  • using as a content management system for library subject guides
  • a subject recommendation engine based on crawling intellectual data associated with university departments
  • using a protocol like zeroconf so that library patrons can auto-discover library services upon entering the physical library space
It seems like most of the big players here work in larger universities or organizations that have large local data sets to work with in the form of institutional repositories or digital collections. There's a lot of concern about building large, searchable digital libraries . This is fine if you have control over a large body of data. I can tell you that in the small college library environment, most of the data we work with is generic data about books and journal articles that is living in some database that is out of our control. We're often only able to add value to that data once it's arrived in a user's search results, through an OpenURL resolver or perhaps a tweak to our catalog.

This is not to say that the what the big players are doing isn't useful or interesting to us. It's just different and makes me wish we had more opportunities to creatively manipulate the digital content to which we provide our patrons access.


A team at my old place of employment, Wendt Library at UW-Madison, showed off a pretty cool application, bibapp, that gathers data about what faculty on their campus have published. Among other things, the data is used to find articles that our legally storable in the IR; almost cooler than that is the connections that they demonstrate between the publications. They can visualize who's publishing with who, analyze popular research subjects across discplines, etc.