Wednesday, May 28, 2008

crowdsourcing and history

This piece in the Boston Globe about digital approaches to history really gets across the point that doing digital humanities is about more than just digitizing the printed word. I think it can be hard for scholars to get that, as Dan Cohen has pointed out.

It emphasizes projects like the 9/11 Archive and Flickr Commons as ways that crowdsourcing can contribute to primary material that historians have to work with.
Cohen sees the potential for partnerships between the lone professional historian and crowds of helpers, particularly as the quantity of historical material increases. It's possible, for example, for a historian of Colonial America to read every document written by the founders of the Massachusetts Bay Colony (though such a task would still be time-consuming). It's altogether another thing for a historian of modern America to tackle the vast output of the Bush White House. "One person can't read it," explains Cohen, "but a hundred or thousand could read individual documents and tag them with keywords."
The title of the piece, "everyone's a historian now" is a little deceptive, perhaps to provoke a reaction. At the end of the article, the importance of the professional historian is reaffirmed.
Having the crowd on your side is a good thing at certain stages of the research and publication process," says Cohen. "But at other times, historians will still want to be by themselves, sitting at their computer screen, using their own words to knit things together and make sense of the past."
As someone who did some graduate work in history awhile back, I always enjoy reading Dan Cohen's take on digital humanities.

Wednesday, May 14, 2008

trail run 2.0

This past weekend, I did the Mac Forest 50K, an ultramarathon that winds its way through trails in a very hilly research forest managed by Oregon State University in Corvallis, OR. It was my 2nd 50K.

It's a pretty hard run and finishing it has been my goal for awhile. The funny thing is that for many of the serious ultra types around me, this was just a throw-away run, their equivilant of a 10K for a typical marathon runner. They are all gearing up to do 100 milers like the Western States.

One thing I love about Oregon is that people are so hardcore about their recreation.

After I got done, got home and had a good meal and a couple beers, it wasn't long before I was searching the web for traces of the run: blog posts, photos, race results. Its funny how quickly that stuff appears, and how it can be hard to find at first.

I got a few pointers to photos on Picasa from a Yahoo Groups running group I'm on. Searching Google Blog Search also yielded a few posts. But there was no easy way to really watch the social web response to this event unfold.

Having a common tag to use to refer to the event could have helped. I've been noticing a trend toward that at conferences recently. It would be cool if there was some way to easily transmit that tag to social software from the event web site. And it would be nice if the race website could display the latest commentary on the race from the social web, probably in a somewhat moderated way.

Running results are also a good semantic web application candidate. If you'd like to see your results across races done by different organizations a kind of centralized database or merged set of databases could be useful.

Monday, May 5, 2008

how could Google help search in academic libraries?

John Wilkin has an interesting post about various ways Google Scholar could add functionality that would help academic library patrons get to the specialized databases provided by academic libraries. Interestingly, he brings Anurag Acharya, the guy who created Google Scholar, in on the discussion. The ideas generally have to do with learning about the user's needs and then pointing them to the more specialized resources. The post really addresses the problem of metasearch, that is, finding a way to give users a simple, single search box and get them from there to some of the richer, more powerful databases produced for academic research.

But what about once a library patron is in a research database like MLA Bibliography, Historical Abstracts, or Psychinfo? Many of these resources are fairly primitive when it comes to the search functionality and content that they cover. Often you get to search the citations, abstracts, sometimes the fulltext of academic articles. Sure, sometimes more is less, but typically, they don't cover the increasing amount of scholarly material that is out there on the open web. They also certainly don't offer the fulltext of books.

If Google (or another big search vendor) offered a platform that database vendors could mount their systems on, those vendors could make so much better products. Services available to the vendor could include:
  • access to Google search software
  • ability to create an continually updated index of portions of the web alongside proprietary data
  • ability to provide advanced search functionality and data analysis specific to the needs of a particular discipline
  • access to Google Books index
Google already sort of offers some of this functionality with its APIs, which could allow mixing results from things like Google Custom Search and Google Books into results from an external resource. But I'm thinking here of an even deeper level of integration. Imagine Historical Abstracts if it also included high quality history websites (including digital archives) and the full text of books in its results.

I suspect that it wouldn't be worth it to Google to design a product for the library research sector. This would need to be an infrastructure product that could span proprietary search needs of multiple industries.

When we got a Search Appliance here at Lewis & Clark, I have to admit, I was kind of disappointed playing around with the admin interface, that you couldn't easily mix in parts of Google's web index with your own proprietary stuff. Guess this is sort of what I'm asking for here.

Scirus is sort of a development in this direction, that is a hybrid of the research database and search engine. Another sort-of-related idea: Dan Cohen has called for Google Books to open up its APIs for scholarly inquiry.

Some folks will no doubt be horrified that I'm suggesting putting more of our eggs in Google's basket. But the idea really is about bringing web scale infrastructure to the service of more specialized, niche needs. Not giving ourselves over to Google, but rather using their data and software as a platform on which to accomplish bigger things.

Friday, May 2, 2008

architecture images in academia: moving into the cloud

The Society of Architectual Historians just received a Mellon grant to build "a dynamic online library of architectural and landscape images for research and teaching."

One thing that's notable in the description of the project is that it aspires to move visual resource collections away from building separate collections at each institution (as has been the case with slides, and initially with digital images) to collaborative creation of a shared collection:
It is the expectation that SAH AVRN will change the way Visual Resources and Art/Architecture Librarians at those institutions conduct their work. Instead of developing separate, independent collections of architectural images for each institution, librarians will contribute images and metadata to SAH AVRN, a shared resource that will be widely available. Initially images will be contributed to SAH AVRN by scholars at the same three institutions who have agreed to share thousands of their own images that were taken for research and pedagogical purposes.
Another intriguing aspect of the proposal is its mention of the development of new technology that will allow the contribution of images by front line people.
Building upon the existing ARTstor platform for storage, retrieval, viewing and presentation of images, ARTstor is going to develop two new tools to be used in conjunction with SAH AVRN. The first is a tool that will enable scholars, practitioners, librarians and others to contribute images to the shared resource of SAH AVRN. The second set of tools will be a content management system that will enable sophisticated processing and management of those images.
Is this the 'academic Flickr' that we've been waiting for?