As a NITLE online seminar, NITLE's Director of Research Bryan Alexander gave a great overview of the semantic web and potential applications in academia. It was impressive how much he fit into an hour long session. Today, I was thinking about how this would have made a cool day long workshop, but then we really would have needed to move from the theoretical to the practical.
I think that as more academic content gets marked up semantically, it should change the way we do research. Searching and organizing content as one does academic research should get more sophisticated.
We could start marking up data in our digital collections using RDF. Looks like there is a good way of doing this for creative commons data. Search engines are starting to use this information more.
Wednesday, December 9, 2009
Friday, December 4, 2009
Digital Initiatives at a Liberal Arts College Library
At work, I've been thinking about new digital services that the library could introduce, potentially in partnership with other units at L&C. Here is the draft of a short paper that I've put together on the topic:
Potential Future Digital Initiatives at Watzek Library
This paper is intended to stimulate thinking and discussion about potential digital initiatives involving a variety of constituents at Lewis & Clark including faculty, the Library, IT, New Media, etc.
As the nature of research, scholarship, and learning changes in an increasingly digital environment, academic libraries need to rethink the services that they are providing. This rethinking is even more crucial in a tight budget environment where we need to maximize the impact of the funds expended on library services and resources.
In the past five years, Watzek Library has developed its capacity to support digital initiatives in three main areas: enhancements to the library website and the functions it provides to support research, the Visual Resources Collection, and Special Collections and Archives. Some of the projects that we have completed include the Special Collections and Archives Digital Collection, the MDID image collection, our senior theses collection, accessCeramics, and the William Stafford Archives. Work on these projects has allowed us to develop expertise in information architecture, design, web programming, metadata management, Web 2.0 technologies, and digital scholarship.
As we look towards the future, we would like to broaden the impact of Watzek's digital initiatives and make more connections with academic endeavors across the College. These are a list of possible digital services that the library could offer in the future. In one form or another, they are being offered by colleges and universities around the United States. We are putting this list forward to gauge interest and applicability at Lewis & Clark.
Thematic Digital Collections: Faculty may have interest in developing an online archive of primary materials, scholarship, or data associated with their scholarship and/or teaching. The Library might partner with faculty on the development of online collections of images, documents, or other media surrounding a particular topic. We could provide the expertise in digitization, software selection, database design, metadata schemas, information architecture and search engine optimization needed to develop such projects. Our collaboration might take the form of a consultation or a more extensive partnership for larger collections that would fit in with Watzek Library's long term digital collections. One example of such a project is accessCeramics, a database of contemporary ceramics images developed as a partnership between Assistant Professor of Ceramics Ted Vogel and Watzek Library. accessCeramics has paired Vogel's connections to the ceramics community and interest in curating an online collection of images with Watzek Library's expertise in digital collections. A few other examples of thematic archives arranged around faculty research interests include the Gerald Warner Taiwan image collection, a project of Associate Professor Paul D. Barclay and Digital Inititives Librarian Eric Luhrs at Layfayette College, the Anarchy Archives, a project of Professor David Ward at Pitzer College, and the Murals of Northern Ireland collection a project of Tony Crowley, the Hartley Burr Alexander Chair in the Humanities at Scripps College and one of numerous thematic collections in the Claremont Colleges Digital Library.
Institutional Repository: The library could support an online digital archive devoted to storing and making accessible digital objects associated with the academic life of the College. The content might include faculty and student scholarship, materials from College symposia, and other media. The library already archives student theses, as do some individual departments. For an example of an institutional repository at a liberal arts college, see Macalester's Digital Commons.
Platforms for Collaborative Student Research: In the digital environment, there are growing opportunities for students to work together on research projects. Using social bookmarking software and wikis, students can share resources with each other. Lewis & Clark's Environmental Studies program uses delicious.com to accumulate and organize research resources around particular sites. Software like the History Engine gives students a platform for the publishing of original research using primary sources. The library could serve as a consultant with faculty in the deployment and use of these resources. The library could also act as an agent to preserve the output of these collaborations over time.
Data Curation: Lewis & Clark has several active research laboratories in the sciences and social sciences. The library could serve as a consultant in the organization and long term storage and preservation of data output as a result of this research, whether in local or remote repositories. The library could recommend remote digital archives, storage technologies, metadata schemas, and information architectures that suit the needs of a particular research lab. To our knowledge, this is a relatively new area for liberal arts colleges and we do not have successful examples of this type of service.
Expanding Visual Resources: Our Visual Resources Collection currently supports teaching with images of art and culture through a local collection of images (MDID) as well as licensed collections of images such as ARTstor. These images are used primarily by Art and Art History faculty, but are also used by faculty in other humanities disciplines as well as the social sciences. Should we expand our support for images to include scientific images and the scientific disciplines? Currently, our expertise in images is limited largely to still images and 2d images. Should we develop expertise in acquisition and delivery of moving images as well as three dimensional image technology?
Web Archiving: Content on the web represents a range of activities across Lewis & Clark, both academic and non-academic. Meeting minutes, departmental rosters, symposia programs, syllabi, campus news, etc. all live on the web. But much of this content is ephemeral: it is taken down and disappears after it no longer has currency. Should the library take responsibility for archiving all or part of Lewis & Clark's web output for the needs of future generations? Haverford, Bryn Mawr, and Swarthmore have a web archiving initiative underway using the Achive-It software from the Internet Archive.
Potential Future Digital Initiatives at Watzek Library
This paper is intended to stimulate thinking and discussion about potential digital initiatives involving a variety of constituents at Lewis & Clark including faculty, the Library, IT, New Media, etc.
As the nature of research, scholarship, and learning changes in an increasingly digital environment, academic libraries need to rethink the services that they are providing. This rethinking is even more crucial in a tight budget environment where we need to maximize the impact of the funds expended on library services and resources.
In the past five years, Watzek Library has developed its capacity to support digital initiatives in three main areas: enhancements to the library website and the functions it provides to support research, the Visual Resources Collection, and Special Collections and Archives. Some of the projects that we have completed include the Special Collections and Archives Digital Collection, the MDID image collection, our senior theses collection, accessCeramics, and the William Stafford Archives. Work on these projects has allowed us to develop expertise in information architecture, design, web programming, metadata management, Web 2.0 technologies, and digital scholarship.
As we look towards the future, we would like to broaden the impact of Watzek's digital initiatives and make more connections with academic endeavors across the College. These are a list of possible digital services that the library could offer in the future. In one form or another, they are being offered by colleges and universities around the United States. We are putting this list forward to gauge interest and applicability at Lewis & Clark.
Thematic Digital Collections: Faculty may have interest in developing an online archive of primary materials, scholarship, or data associated with their scholarship and/or teaching. The Library might partner with faculty on the development of online collections of images, documents, or other media surrounding a particular topic. We could provide the expertise in digitization, software selection, database design, metadata schemas, information architecture and search engine optimization needed to develop such projects. Our collaboration might take the form of a consultation or a more extensive partnership for larger collections that would fit in with Watzek Library's long term digital collections. One example of such a project is accessCeramics, a database of contemporary ceramics images developed as a partnership between Assistant Professor of Ceramics Ted Vogel and Watzek Library. accessCeramics has paired Vogel's connections to the ceramics community and interest in curating an online collection of images with Watzek Library's expertise in digital collections. A few other examples of thematic archives arranged around faculty research interests include the Gerald Warner Taiwan image collection, a project of Associate Professor Paul D. Barclay and Digital Inititives Librarian Eric Luhrs at Layfayette College, the Anarchy Archives, a project of Professor David Ward at Pitzer College, and the Murals of Northern Ireland collection a project of Tony Crowley, the Hartley Burr Alexander Chair in the Humanities at Scripps College and one of numerous thematic collections in the Claremont Colleges Digital Library.
Institutional Repository: The library could support an online digital archive devoted to storing and making accessible digital objects associated with the academic life of the College. The content might include faculty and student scholarship, materials from College symposia, and other media. The library already archives student theses, as do some individual departments. For an example of an institutional repository at a liberal arts college, see Macalester's Digital Commons.
Platforms for Collaborative Student Research: In the digital environment, there are growing opportunities for students to work together on research projects. Using social bookmarking software and wikis, students can share resources with each other. Lewis & Clark's Environmental Studies program uses delicious.com to accumulate and organize research resources around particular sites. Software like the History Engine gives students a platform for the publishing of original research using primary sources. The library could serve as a consultant with faculty in the deployment and use of these resources. The library could also act as an agent to preserve the output of these collaborations over time.
Data Curation: Lewis & Clark has several active research laboratories in the sciences and social sciences. The library could serve as a consultant in the organization and long term storage and preservation of data output as a result of this research, whether in local or remote repositories. The library could recommend remote digital archives, storage technologies, metadata schemas, and information architectures that suit the needs of a particular research lab. To our knowledge, this is a relatively new area for liberal arts colleges and we do not have successful examples of this type of service.
Expanding Visual Resources: Our Visual Resources Collection currently supports teaching with images of art and culture through a local collection of images (MDID) as well as licensed collections of images such as ARTstor. These images are used primarily by Art and Art History faculty, but are also used by faculty in other humanities disciplines as well as the social sciences. Should we expand our support for images to include scientific images and the scientific disciplines? Currently, our expertise in images is limited largely to still images and 2d images. Should we develop expertise in acquisition and delivery of moving images as well as three dimensional image technology?
Web Archiving: Content on the web represents a range of activities across Lewis & Clark, both academic and non-academic. Meeting minutes, departmental rosters, symposia programs, syllabi, campus news, etc. all live on the web. But much of this content is ephemeral: it is taken down and disappears after it no longer has currency. Should the library take responsibility for archiving all or part of Lewis & Clark's web output for the needs of future generations? Haverford, Bryn Mawr, and Swarthmore have a web archiving initiative underway using the Achive-It software from the Internet Archive.
Services to Support Scholarly Communication in the Digital Environment: The library could develop a menu of services to support faculty as they publish their research. These services could include: consulting/education on copyright and open access, assistance with acquiring rights for digital assets (such as images) for use in publication, advice on publishing research data, assistance with scholarly reputation management on the web. Oberlin College's Library has an initiative focused on transforming scholarly communication, which includes advising faculty on copyright and open access opportunities.
Mark Dahl
Associate Director for Digital Initiatives and Collection Management
Watzek Library
Lewis & Clark College
Mark Dahl
Associate Director for Digital Initiatives and Collection Management
Watzek Library
Lewis & Clark College
Monday, October 26, 2009
flatlands and failures of curation
As a counterpoint to my last post on the rise of the verticals, I've been thinking about the importance of horizontal library collections. On the one hand if a library wants to make a difference in the web environment, they should develop unique vertical collections that focus in on particular subject areas and are of interest globally.
But what of the notion that libraries, particularly college libraries like my own, should provide their users with a strong general collection in line with their institution's curriculum? In the long tail, hybrid print/digital environment of the early 21rst century, this idea of a broad and shallow local collection perhaps doesn't make as much sense. As we try to expand our patron's information universe with consortial borrowing and large aggregations of e content, not to mention awareness of what's out there on the web, the idea of a limited general book collection seems quaint, like your neighborhood book store.
Somehow, we still want our patrons to be able to be able to identify the most important works in a subject area without getting overloaded with choices. One might argue that Google's success is based on doing something like this for the web as a whole. Google is able to reliably pull up the most popular and trusted websites on a given topic.
Our discovery systems need to do a better job of giving some relief to the information landscape. Our users should be able to tell if some titles are more popular, more widely cited, etc. than others. If a text is a classic work of literature or a classic in the field, it should be obvious s in search results.
Ranking search results based partly on the number of holding libraries like WorldCat.org does is a step in the right direction: the collective intelligence of collection development work, if you will. FRBRization is another one. Use of citation analysis could be another. Folksonomies and recommendation engines another. Human curation also has a role.
The commercial world is getting good at using these techniques. Libraries really have a chance to lead in the FRBRizaton arena, I think. This is something the commercial world hasn't figured out, as Mike Shatzkin points out out here:
But what of the notion that libraries, particularly college libraries like my own, should provide their users with a strong general collection in line with their institution's curriculum? In the long tail, hybrid print/digital environment of the early 21rst century, this idea of a broad and shallow local collection perhaps doesn't make as much sense. As we try to expand our patron's information universe with consortial borrowing and large aggregations of e content, not to mention awareness of what's out there on the web, the idea of a limited general book collection seems quaint, like your neighborhood book store.
Somehow, we still want our patrons to be able to be able to identify the most important works in a subject area without getting overloaded with choices. One might argue that Google's success is based on doing something like this for the web as a whole. Google is able to reliably pull up the most popular and trusted websites on a given topic.
Our discovery systems need to do a better job of giving some relief to the information landscape. Our users should be able to tell if some titles are more popular, more widely cited, etc. than others. If a text is a classic work of literature or a classic in the field, it should be obvious s in search results.
Ranking search results based partly on the number of holding libraries like WorldCat.org does is a step in the right direction: the collective intelligence of collection development work, if you will. FRBRization is another one. Use of citation analysis could be another. Folksonomies and recommendation engines another. Human curation also has a role.
The commercial world is getting good at using these techniques. Libraries really have a chance to lead in the FRBRizaton arena, I think. This is something the commercial world hasn't figured out, as Mike Shatzkin points out out here:
Recommendation engines aside (”based on what you bought before, have we got a book for you!”), online book retailers have a long way to go to enable the customized curation that seems both possible and desireable in the digital age. Even as sophisticated a retailer at Barnes & Noble will present multiple duplicate entries of a public domain scan from Google to an ebook search for a Shakespeare play. And even as sophisticated a retailer as Amazon will sell you a Kindle ebook that is a self-published tome in a way that is indistinguishable from a book from a legitimate publisher. These are failures of curation.
Monday, October 5, 2009
the rise of the verticals
Mike Shatzkin, a commentator on the book publishing industry, makes the following observation:
He is contrasting this model with traditional bookstores and trade publishers that cover a wide range of subjects. It also seems the opposite of the way that a traditional academic or public library is setup with books spanning a wide range of subjects and positioned to serve a local audience.
old=local and horizontal
new=global and vertical
I would argue that in the academic repository arena, we can already observe the difference between these two approaches.
Institutional repositories aggregate scholarship that crosses a wide range of subject areas only tied together by affiliation with a single academic institution. They might be described as local and horizontal.
Disciplinary repositories like the Social Science Research Network and arxiv.org concentrate content in certain academic disciplines. They might be described and global and vertical.
Which model is more successful, the disciplinary repositories or the institutional ones? If this ranking is right, it is the disciplinary repositories. They have the most momentum and interest behind them.
Generally, I think that digital initiatives in libraries will be most successful if they are able to build on a vertical community. Projects that are too wide in scope end up being about nothing.
Horizontal aggregation was more efficient in a world of physical delivery. Vertical aggregation makes more sense in a world of digital delivery. And enabling the customer or user to have some control over the curation is possible in the digital world but hardly is in the physical.Shatzkin sees the future information ecosystem trending towards niches or 'verticals' with global audiences.
He is contrasting this model with traditional bookstores and trade publishers that cover a wide range of subjects. It also seems the opposite of the way that a traditional academic or public library is setup with books spanning a wide range of subjects and positioned to serve a local audience.
old=local and horizontal
new=global and vertical
I would argue that in the academic repository arena, we can already observe the difference between these two approaches.
Institutional repositories aggregate scholarship that crosses a wide range of subject areas only tied together by affiliation with a single academic institution. They might be described as local and horizontal.
Disciplinary repositories like the Social Science Research Network and arxiv.org concentrate content in certain academic disciplines. They might be described and global and vertical.
Which model is more successful, the disciplinary repositories or the institutional ones? If this ranking is right, it is the disciplinary repositories. They have the most momentum and interest behind them.
Generally, I think that digital initiatives in libraries will be most successful if they are able to build on a vertical community. Projects that are too wide in scope end up being about nothing.
Wednesday, September 23, 2009
Summon 'web scale'? I don't think so.
I think it's strange that Serials Solutions is attempting to apply the "web-scale" adjective to their Summon Service.
As far as I can tell, the library community has really co-opted this term from its original use, which pertained to computing infrastructure that could support web sites that handle huge amounts of traffic. Perhaps Lorcan Dempsey widened the use of the term in January 2007:
I attended a webinar on Summon yesterday, and found out that with Summon, Serials Solutions creates a broad index of content available to your library: books, journals, digital collections, etc. It gets the data from your library uploading data and from the e content vendors with which your library has relations. The data goes in a SOLR index, which then can serve as a comprehensive discovery tool for your library's content. Because it is built on local data and tailored for a particular user community this sounds much more like an 'intranet' type search than anything that is "web scale."
WorldCat Local with its upcoming metasearch features does something similar, but I think that it can make a more legitimate claim to the "web scale" designation because it is attached to the WorldCat.org database. In my opinion, WorldCat.org is web scale in the sense that it is used and improved by a global community.
Summon and WorldCat Local are competing in the same discovery interface space. On first glance, it appears that Serials Solutions is ahead of OCLC in the incorporation of article content, perhaps because of their close relations with content vendors. OCLC seems to have the edge in books: they are able to leverage holdings data in relevance rankings and they have a more sophisticated treatment of various editions of the same work (FRBR). OCLC is also endeavoring to provide delivery services in addition to discovery.
It will be interesting to see if OCLC can use its global database and the Web 2.0 principle "it gets better the more people use it" to differentiate its product from competitors like Summon.
I don't think its obvious, but what OCLC is trying to do with WorldCat is much bolder than Serials Solutions and Summon. With Summon, libraries are basically throwing all of their content into one index to break down the data silos within an institution. But what you end up with is a big search silo for that institution.
With WorldCat, the vision is to break down not only the silos within institutions but also the silos between institutions. And not just break down those silos in the sense of harvest-and-search. The concept is that libraries and their patrons will be working together to improve a shared database through intentional and professional metadata. This shared database will be big enough to have a real impact on the web. Its records will surface in search engine results. Its interface will be familiar to many, and it will be customizable for a particular audience via the WorldCat Local route.
We'll see if this grand vision takes hold.
As far as I can tell, the library community has really co-opted this term from its original use, which pertained to computing infrastructure that could support web sites that handle huge amounts of traffic. Perhaps Lorcan Dempsey widened the use of the term in January 2007:
'Web-scale' refers to how major web presences architect systems and services to scale as use grows. But it also seems evocative in a broader way of the general attributes of the large gravitational hubs which are such a feature of the current web (eBay, Amazon, Google, WikiPedia, ...).This reference to 'web scale' is now at the top of Google results for the term, making me think that the library community has just about taken over the term.
I attended a webinar on Summon yesterday, and found out that with Summon, Serials Solutions creates a broad index of content available to your library: books, journals, digital collections, etc. It gets the data from your library uploading data and from the e content vendors with which your library has relations. The data goes in a SOLR index, which then can serve as a comprehensive discovery tool for your library's content. Because it is built on local data and tailored for a particular user community this sounds much more like an 'intranet' type search than anything that is "web scale."
WorldCat Local with its upcoming metasearch features does something similar, but I think that it can make a more legitimate claim to the "web scale" designation because it is attached to the WorldCat.org database. In my opinion, WorldCat.org is web scale in the sense that it is used and improved by a global community.
Summon and WorldCat Local are competing in the same discovery interface space. On first glance, it appears that Serials Solutions is ahead of OCLC in the incorporation of article content, perhaps because of their close relations with content vendors. OCLC seems to have the edge in books: they are able to leverage holdings data in relevance rankings and they have a more sophisticated treatment of various editions of the same work (FRBR). OCLC is also endeavoring to provide delivery services in addition to discovery.
It will be interesting to see if OCLC can use its global database and the Web 2.0 principle "it gets better the more people use it" to differentiate its product from competitors like Summon.
I don't think its obvious, but what OCLC is trying to do with WorldCat is much bolder than Serials Solutions and Summon. With Summon, libraries are basically throwing all of their content into one index to break down the data silos within an institution. But what you end up with is a big search silo for that institution.
With WorldCat, the vision is to break down not only the silos within institutions but also the silos between institutions. And not just break down those silos in the sense of harvest-and-search. The concept is that libraries and their patrons will be working together to improve a shared database through intentional and professional metadata. This shared database will be big enough to have a real impact on the web. Its records will surface in search engine results. Its interface will be familiar to many, and it will be customizable for a particular audience via the WorldCat Local route.
We'll see if this grand vision takes hold.
Labels:
Serials Solutions,
Summon,
synthesize,
web scale,
WorldCat Local
Wednesday, September 9, 2009
WorldCat Local Review
I've written a fair amount in the abstract about the benefits of WorldCat.org and WorldCat Local.
At Watzek, we launched "L&C WorldCat" around July 1. Here are some thoughts based on my experience with the implementation.
At Watzek, we launched "L&C WorldCat" around July 1. Here are some thoughts based on my experience with the implementation.
- There is already a sense developing at our school that "everything" is in or should be in WorldCat Local. People expect all articles and books to be there (even though they aren't). I may post more on this later.
- Compared with launching an III OPAC, the process of bringing WCL up is refreshingly simple. They have consciously limited customization to the very basics (logo, colors, etc.)
- Even so, as I've said before in this blog, I'd prefer a greater level of customize-abilty, kind of on the level of Blogger. Give me full access to the stylesheet. Let me add code snippets.
- It's backward that the software pulls in live holdings data for print items from your ILS, but can't pull in links to digital content from your link resolver. When students come upon an article, they want the direct link to it up front, not a click or two away. OCLC should scrape resolvers like they do ILSs to embed link resolver links in records for articles.
- I'm excited about the idea of OCLC partnering with content providers like EBSCO and indexing their content in WC. One thing I speculated on when writing the Digital Libraries book in '06 was that following on the success of search engines, meta indexing services for library content would eventually emerge. We now see that with Serials Solutions Summon and WorldCat.
- The idea of also incorporating in traditional real-time meta-searching seems like a backward compromise: OCLC should be firm with content providers and resolve to only incorporate content that they can put into their index.
- The stats module for WCL is basically a commercial web analytics package slapped onto WCL with a few limited custom reports. Basically, you can look at your site traffic and search terms being used.
- I like the idea of using standard web analytics software on WCL, but please let me drop the code snippet in for Google Analytics.
- If they did some url rewriting so as to map some of the search/browsing activity to clean URL paths (eg "/author/" "/title/" "/facet/video/") web analytics software becomes more useful because you can collate together like activities based on url paths.
- For a minute, I was thinking that to provide access to an e book package we purchased through WCL, all we'd need to do is "flip the switch" and activate our holdings for those records in WCL, forget about ILS records. But then I remembered: the URLs to that package need to go through our proxy server so they need to be drawn from our ILS. WCL is not making our lives easier yet.
- A little off the subject, but now that OCLC owns EZproxy, aren't they in a great position to develop some better, more graceful form of remote authentication than proxy? OCLC could act as a trusted third party and provide single sign on to content provider websites.
Tuesday, September 8, 2009
Economist on Google Books
The Economist has a leader supporting the Google Books Deal, and an interview with Paul Courant, Dean of Libraries at Univ. of Michigan.
He talks some about the product that Google will be offering to libraries with this deal.
I have to wonder if this product will be the watershed moment for e books in academic libraries. If Google's library of books is big and broad enough to serve as a general library on its own, Google's platform for e books could become the place to do research in books.
Much of its success will depend on how much current content is in their index, and this is really dependent on Google doing deals with thousands of publishers. If Google's index is largely made up of older scanned books, it'll be a useful research tool, but not compelling as place for general research.
Google might become the place to do research in books, whereas recreational e book reading will happen through other vendors like Amazon.
He talks some about the product that Google will be offering to libraries with this deal.
I have to wonder if this product will be the watershed moment for e books in academic libraries. If Google's library of books is big and broad enough to serve as a general library on its own, Google's platform for e books could become the place to do research in books.
Much of its success will depend on how much current content is in their index, and this is really dependent on Google doing deals with thousands of publishers. If Google's index is largely made up of older scanned books, it'll be a useful research tool, but not compelling as place for general research.
Google might become the place to do research in books, whereas recreational e book reading will happen through other vendors like Amazon.
Subscribe to:
Posts (Atom)
