Tuesday, February 26, 2008

Brewster Kahle - code4lib 2008

Just took in the opening keynote at code4lib '08, Portland.

The Internet Archive/OpenLibrary project takes the most open approach to digital collections and digitization projects. Also the most centralized.

They've got big aspirations: archive of all web pages, a catalog with every book, a major digitization initiative.

The overlapping goals of their project with Google Books, WorldCat, and other more localized digitization projects makes you wonder about which projects will last. Or can they all coexist?

Friday, February 8, 2008

professional vs. consumer media

This post by Tim O'Reilly regarding a talk by Reuters CEO at MoneyTech makes some interesting references to professional vs. consumer media, semantic markup and metadata, as well as the concept of "curation", all of which have parallels in the academic information arena.

Basically, Reuters is making the case that their professional-level information products for the financial industry will deliver value above and beyond consumer financial information products, through semantic metadata. His points as summed up by Tim:
  1. The impact of consumer media on professional media. As young people who grew up on the web hit the trading floor, they aren't going to be satisfied with text. Reuters needs to combine text, video, photos, internet and mobile, into a rich, interactive information flow. However, he doesn't see direct competition from consumer media (including Google), arguing that professionals need richer, more curated information sources.

  2. The end of benefits from decreasing the time it takes for news to hit the market. He describes the quest for zero latency in news, from the telegraph and early stock tickers and the news business that Reuters pioneered through today's electronic trading systems. (Dale Dougherty wrote about this yesterday, in a story about the history of the Associated Press.) As we reach the end of that trend, with information disseminated instantly to the market via the internet, he increasingly sees Reuters' job to be making connections, going from news to insight. He sees semantic markup to make it easier to follow paths of meaning through the data as an important part of Reuters' future.
I think you could make the argument that these two points apply to the types of "professional" information resources that academic libraries provide to their patrons.

Wednesday, January 30, 2008

Open Source ILS for Academic Libraries

This got forwarded to my email, from the CNI listserv:
The Duke University Libraries are preparing a proposal for the Mellon Foundation to convene the academic library community to design an open source Integrated Library System (ILS). We are not focused on developing an actual system at this stage, but rather blue-skying on the elements that academic libraries need in such a system and creating a blueprint. Right now, we are trying to spread the word about this project and find out if others are interested in the idea.

We feel that software companies have not designed Integrated Library Systems that meet the needs of academic libraries, and we don’t think those companies are likely to meet libraries’ needs in the future by making incremental changes to their products. Consequently, academic libraries are devoting significant time and resources to try to overcome the inadequacies of the expensive ILS products they have purchased. Frustrated with current systems, library users are abandoning the ILS and thereby giving up access to the high quality scholarly resources libraries make available.

Our project would define an ILS centered on meeting the needs of modern academic libraries and their users in a way that is open, flexible, and modifiable as needs change. The design document would provide a template to inform open source ILS development efforts, to guide future ILS implementations, and to influence current ILS vendor products.Our goal is not to create an open-source replica of current systems, but to rethink library workflows and the way we make library resources available to our
constitutiencies. We will build on the good work and lessons learned in other open source ILS projects. This grant would fund a series of planning meetings, with broad participation in some of those meetings and a smaller, core group of schools developing the actual design requirements document.
I agree that the current ILS marketplace doesn't deliver for academic libraries.

I'm not sure if a traditional open source project is the best solution, either. Seems to me that the next generation ILS should follow more of a cloud computing model instead of many disparate systems sharing a single base of code.

In my opinion, the question of a next generation ILS should be approached first from the data side, and then the software application side. As Tim O'Reilly puts it, Web 2.0 means "data is the next Intel Inside." The next generation ILS should be all about large pools of programmable, shared data. Organizations like OCLC and Serials Solutions have some of this data, but lots of other data dispersed across the net in various silos.

If libraries want to deliver a user experience at anywhere near the level of Google, we need to be using the same techniques that they are. And their most important technique is aggregating large amounts of data.

What am I talking about, more concretely? Our digital collections of unique materials should be managed in a centralized system that can leverage network effects in search, folksonomies, and more. Our library catalogs should simply be a subset of a larger shared catalog. Organization of licensed content should be facilitated by sharing metadata about that content. Even user/patron data should be managed in a network fashion using systems like OpenID.

In some ways, the next generation ILS is already emerging in the form of data driven products like Serials Solutions 360 and WorldCat Local. Of course, there is also more work to be done.

If a consortium of universities creates their own ILS, I'm afraid that it'll be a glacially moving monstrosity of a project like Fedora or DSpace. A theoretically wonderful piece of code that doesn't amount to much when it's installed in many isolated instances.

Thursday, January 24, 2008

Watzek Rocks '08


Watzek Rocks is happening here in about a half hour.

Monday, January 21, 2008

New Watzek Library web site

Over winter break, I worked a fair amount on coding a revised version of Watzek Library's website. Much of the credit goes to Jeff Allman over at Boley Law Library, Jeremy McWilliams, as well as our web site team, who provided helpful criticism.

http://library.lclark.edu

Basically, I was living in CSS hell for awhile, but it was sort of enjoyable to immerse myself in a technology project and leave some of my other responsibilities on hold. The new version uses more modern, semantically correct code and has some new features, which are pretty typical on many library web sites now:
  • a search widget (home and interior pages)
  • a promotional news space (home page)
  • an area to highlight new acquisitions (home page)
We consciously took an incremental approach to redesigning the site for a few reasons:
  • an upcoming College-wide redesign might encourage us to go in a new direction sometime soon
  • we could preserve navigational consistency for our users
  • sticking with existing colors and fonts wouldn't force us to re-style all of our applications (though this might be a good exercise in consolidation of stylesheets, etc.).
Next up is a facelift of some of the content on the interior pages of the site.

I've been sort of out of the web design arena for awhile, and it's amazing how much more you can do with just CSS than you could 5 years ago.

The articles tab on the search widget is our first foray into federated searching. It needs work.

Our technology stack for the site consists of:
  • RedHat Enterprise Linux on a virtual server
  • Apache
  • PHP 5.x
  • PostgreSQL and MySQL
  • JQuery (for tabs and various ajaxy effects)
  • Dreamweaver/Contribute templates

Friday, January 11, 2008

NITLE workshop on Scholarly Collaboration

I attended a good workshop put on by NITLE about two weeks ago in sunny Claremont, CA at Pomona College. No good excuse for not posting this earlier.

A few particularly interesting things that I've picked up at this workshop:
  • got a look at CommentPress, mentioned in a recent article in the Chronicle of Higher Ed
  • a recent white paper on fair use, published by ARL, suggesting that academics should take a more aggressive stance on what's eligible for fair use protection; could have some implications for some digitization projects.
  • Homer Multitext project: an effort sponsored by the Center for Hellenic Studies to present multiple historical manuscripts of the Illiad and the Odyssey alongside each other. Faculty and students at liberal arts college have been heavily involved in the project. The fellow presenting on it made the case that this style of digitization was could be contrasted with the Google Books project: careful, thoughtful, interpretive, scholarly rather than scanning by the dump-truck load.
  • Anarchist Archives: a good example of a thematic digital archive sponsored by a faculty member; the model of a faculty member curating a digital collection is an interesting one; I can see some possibilities for it at Lewis & Clark

Friday, January 4, 2008

warnings on Google

A few interesting posts recently that emphasize the danger that Google's size poses to innovation in the web/digital publishing environment:

Nick Carr makes the case that the massive amount of data Google is accumulating will give it a huge competitive advantage over other firms. Tim O'Reilly, points out that Google is hosting more and more of the content that it indexes, rather than indexing others' content for them. The Knol, is a move in this direction. This is akin to a financial firm trading for its own benefit rather than for its client's benefit. Peter Brantley discusses the issue in the context of research libraries.

In the late 1990s I was very anti-Microsoft and avidly favored the anti-trust litigation against them. For some reason, I just don't see Google in the same "Evil Empire" light. I guess I appreciate Google as an innovator. They have done a lot for search and the general move towards cloud computing. Their library project is bold in scale as well as legal approach in a way that never would have happened in the non-profit sector. Google's products are actually really good and cutting edge, whereas Microsoft's, by my experience, were always behind the curve and sort of sucked. (Microsoft, on the other hand, is a company to "admire" from a business perspective, not so much a technology perspective, because they've been able to rake so much money in for their mediocre products. )

Like Microsoft, Google will be unseated in due course. Skrenta is working on it. I do appreciate the skeptics out there keeping an eye on Google.