Friday, March 9, 2007

Freebase

Gotta love the name Freebase.

Along the same lines of the Talis Platform (at least in my mind), some guys are starting up a company that will develop a global database that allows for complex relationships to be established between the data within it. The company is called metaweb, but their site isn't too revealing.

Here's what Tim O'Reilly has to say about it:

“It’s like a system for building the synapses for the global brain,” said Tim O’Reilly, chief executive of O’Reilly Media, a technology publishing firm based in Sebastopol, Calif.

Google Book Search and rank

Some interesting notes from a talk by Google at the Future of Bibliographic Control Committee (the name just oozes dullness). Dan Clancy from Google said that Google is having trouble relevance ranking books in its book search because it can't rely on the link structure of the web to support relevance ranking.

Thursday, March 8, 2007

data stores

At code4lib, Talis was promoting their platform. It's based on the concept of "stores", which are basically large bodies of data stored on Talis' computers. The advantage of putting your data in these stores is that it can be queried, searched, and related to other data in numerous ways.

Some of what they say about their platform:
Large-Scale Content Stores

The Talis Platform is designed to smoothly handle enormous datasets, with its multiple content stores providing a zero-setup, multi-tenant content and metadata storage facility, capable of storing and querying across numerous large datasets. Internally, the technology behind these content stores is referred to as Bigfoot, and there is an early white paper on this technology here.

Content Orchestration

The Talis Platform also comprises a centrally provided orchestration service which enables serendipitous discovery and presentation of content related according to arbitrary metadata. This service makes it easy to combine data from across different Content Stores flexibly, quickly and efficiently.


This all seems rather nebulous when you first think about it, but slowly, the usefulness of the concept begins to reveal itself. They discussed a little bit about how this platform is supporting Interlibrary Loan at UK libraries because it provides a way to query across different libraries.

My question is, do libraries really have enough of their own content to leverage a platform like this? All we really have is generic data about books and journals and specific data about what libraries holds them.

I wonder whether this kind of service would most useful if a player like Google offered it. Why Google and not Talis? Because they have huge amount of data already amassed from web crawling, publisher relationships, not to mention scanning books. Think about the opportunities that would present themselves if you could query specific slices of Google's content alongside your organization's own data? What if Google hosted research databases as stores and you could slice them up, query them, and relate them ala the Talis platform?

Essentially, a library could create its own, highly tailored searching/browsing/D2D systems.

Maybe I'm asking for too much.

Friday, March 2, 2007

standing on the sholders of giants

Casey Durfee's presentation on "Open Source Endeca in 250 lines or less" was pretty cool. How could he create a "next-gen" faceted catalog with such little code...by relying on Solr and Django to do the heavy lifting. Because Solr indexes XML natively, no relational database is even necessary. One of the things, generally speaking, I'm looking for at this conference is ways that we can leave the complexity to other applications.

Thursday, March 1, 2007

proximity and the network

Dan Chudnov gave a talk on making library resources available for sharing like itunes does on a LAN. It was hard to immediately sense the value in this. He spoke of walking into a library and having access to the whole of the library. Isn't that what we get through our digital presence on the web?

But thinking about it more, I like the idea of our computers being able to sense services and resources based on proximity. What if you met you met a group to study and when on the same wireless network, had immediate access to others' personal digital library on an application like Zotero or the like. What if when you walked through a physical library, the web presence of the library changed based on the section of the building you're in. Suppose you're studying in the East European Language Reading room late at night and you notice that somebody else has a similarly esoteric set of references on Polish intellectuals in their shared digital library...and perhaps that's her across the room. Could be a good way to get dates.

why code4lib?

Despite the fact I'm kind of burnt out on writing code, I find code4lib to be one of the most invigorating conferences I've attended in the last few years. Why? I think it's because it's where the new opportunities in the broader web world meet the digital library world.

Some interesting ideas that have come up this year:
  • the SOLR platform for indexing and faceting a library catalog or a digital library of anything really, XML based
  • The Talis platform's concept of data "stores": large bodies of xml data that can be queried and related to data in other stores in an unlimited number of ways using "web scale" infrastructure
  • the idea of hooking up openurl resolver type services as a microformat
  • using del.icio.us as a content management system for library subject guides
  • a subject recommendation engine based on crawling intellectual data associated with university departments
  • using a protocol like zeroconf so that library patrons can auto-discover library services upon entering the physical library space
It seems like most of the big players here work in larger universities or organizations that have large local data sets to work with in the form of institutional repositories or digital collections. There's a lot of concern about building large, searchable digital libraries . This is fine if you have control over a large body of data. I can tell you that in the small college library environment, most of the data we work with is generic data about books and journal articles that is living in some database that is out of our control. We're often only able to add value to that data once it's arrived in a user's search results, through an OpenURL resolver or perhaps a tweak to our catalog.

This is not to say that the what the big players are doing isn't useful or interesting to us. It's just different and makes me wish we had more opportunities to creatively manipulate the digital content to which we provide our patrons access.

bib-app

A team at my old place of employment, Wendt Library at UW-Madison, showed off a pretty cool application, bibapp, that gathers data about what faculty on their campus have published. Among other things, the data is used to find articles that our legally storable in the IR; almost cooler than that is the connections that they demonstrate between the publications. They can visualize who's publishing with who, analyze popular research subjects across discplines, etc.