Friday, November 14, 2008

finding full text with Google Scholar

The Google Operating System Blog had a post the other day that alerted me to a relatively new feature in Google Scholar. For each article in a result set, Google Scholar will point you to a free, unrestricted copy of the article on the web (if available) with a little green .

With many academic journal publishers allowing authors to post copies of their articles on their personal websites, it is now common for scholarly articles in subscription journals to be available for free on the open web. Below is an example of an article, with a copy available from a website in an academic domain (sorry for the tiny image).


This is a good example of Google Scholar leveraging the Google web index to provide something you can't get within the research systems that libraries have built and licensed. It's also yet another reminder that libraries and publishers have lost their role as sole provider and intermediary for academic content.

I've pointed out previously in this blog that creators of research products for libraries do not (or are not able) to take advantage of web indexes as they create their products. I wonder if openurl resolver vendors or someone like OCLC could offer this feature by tapping into something like the Alexa Web Search service to mine the web for full copies of a given article? It might be hard to do on the fly with a resolver request.

I'm guessing that Google Scholar will have 90%+ of scholarly articles in existence in its index at the citation level in the not-to-distant future. It is able to mine so many places for citations: web sites, scanned books and journals, and many publishers' archives, etc.

As OCLC loads article citations into Open WorldCat, I wonder if they have considered a more "brute force" approach to finding citations. They could mine the web for them like Google. Of course, this would introduce all sorts of possibilities for errors and lack of bibliographic control. Google Scholar must have lots of errors in the citations it collects, but it seems to efficiently collate like citations together and recognize which citations are the most referenced.

No comments: