The first Internet search engine I used, back in about 1990, was Archie. This was an index of content hosted across the internet on ftp servers; mostly software but there were documents and databases too. Archie didn’t feel much like an information tool, but more something for computer specialists. Then came Veronica – an index to content hosted on gopher servers (kind of forerunners of the web). This did feel more like a way to search for information, though its content was still limited – a very small niche. Once the web came along we saw a succession of web search engines. Each came into being in a blaze of superlatives (“bigger and better”), trumpeted as the solution to searching the web, but each lasted just a couple of years and then slowly faded as the next new thing took over (where are you, Hotbot, Lycos, Alta Vista?). I never imagined back then that one of these search tools would grow to become an absolutely key part of the academic information environment with a major presence in every part of the information world.
Google has achieved that position. Beyond its dominant presence as a general internet search engine and software development company, the existence of Google Scholar, tehe Google book digitization project and the recently-launched Google ebook service make it a core part of the library and information landscape. My theme today is Google Scholar but I will come back to the book projects in another post.
A recent article you may have missed, in the International Journal of Cultural Studies, affirms that Google has become an integral part of everyday life, not least in the academic world. But Google’s instincts are not those of the academic world – it has a tendency to secrecy borne of its commercial mission. The press release about the article states:
One of the key points about search engines’ ranking and profiling systems is that these are not open to the same rules as traditional library scholarship methods in the public domain. Automated search systems developed by commercial Internet giants like Google tap into public values scaffolding the library system and yet, when looking beneath this surface, core values such as transparency and openness are hard to find.
Inexperienced users tend to trust proprietary engines as neutral knowledge mediators [but] engine operators use meta-data to interpret collective profiles of groups of searchers.
Another article, in Serials Review, is entitled Google Scholar’s Dramatic Coverage Improvement Five Years after Debut. The author finds that over the five years from 2005 to 2010 Google Scholar has improved its coverage of scholarly journals. Coverage varied between subject fields, but in 2005 was between 30% and 88%; in 2010 between 98% and 100%.
Librarians criticised Google Scholar in its early days for its very patchy coverage, and also for its lack of openness – it was very hard to find out exactly what it did cover. It seems they have overcome that problem, though worries over its accuracy remain. In an article in Issues in Science and Technology Librarianship science researchers at the University of California Santa Cruz were surveyed about their article database use and preferences. Web of Science was the single most used database, selected by 41.6%. Statistically there was no difference between PubMed (21.5%) and Google Scholar (18.7%) as the second most popular database. 83% of those surveyed had used Google Scholar and an additional 13% had not used it but would like to try it. While Google Scholar is favored for its ease of use and speed, those who prefer Web of Science feel more confident about the quality of their results than do those who prefer Google Scholar. Librarians and faculty alike often assert that “all researchers use Google Scholar.” Based on this study, this is essentially correct. 83% of researchers had used Google Scholar and an additional 13% had not used it but would like to try it. Of those who had used Google Scholar, almost three quarters of them (73%) found it useful.
In this context I was interested to see that Richard Wintle, one of the guest bloggers on this network, wrote recently about his experience of PubMed, suggesting that sometimes Google Scholar performed better than PubMed. I think every tool has occasional weaknesses, so it is good to have multiple search tools available.
Peter Jacso, who has followed Google Scholar for some years, wrote in Library Journal about “Google Scholar’s ghost authors” and in Online Information Review about the “Metadata mega mess in Google Scholar“. He highlights a key problem:
Google’s algorithms create phantom authors for millions of papers. They derive false names from options listed on the search menu, such as P Login (for Please Login). Very often, the real authors are relegated to ghost authors deprived of their authorship along with publication and citation counts.
Jacso says therefore that Google Scholar is inappropriate for bibliometric searches, for evaluating the publishing performance and impact of researchers and journals. One of the problems is that Google’s secrecy means that we don’t know how many records are in Google Scholar, and can only guess at the frequency of these errors.
Google Scholar is five years old, so it is still a young child when compared to PubMed (fully launched in 1997) or PubMed’s progenitor Index Medicus (started 1879). But Google Scholar no longer has a “beta” label, so clearly Google think it is a finished product or at least “good enough”.
My advice – be a little cautious whichever search tool you are using, but especially so with Google Scholar.