9.1.3 Google scholars flaws don't impact citation analysis much

Peter Jacsó, a prominent academic in Information and Library Science, has published several rather critical articles about Google Scholar (see e.g. Jacsó, 2006a/b). When confronted with titles such as “Dubious hit counts and cuckoos eggs” “Deflated, inflated and phantom citation counts”, Deans, academic administrators and tenure/promotion committees could be excused for assuming Google Scholar provides unreliable data.

However, the bulk of Jacsós (2006b) critique is levelled at Google Scholar's inconsistent number of results for keyword searches, which are not at all relevant for the author and journal impact searches that most academics use Publish or Perish for. In addition, most of the metrics used in Publish or Perish are fairly robust and insensitive to occasional errors as they will not generally change the h-index or g-index and will only have a minor impact on the number of citations per paper.

Chapter 13 includes a detailed discussion of the limitations of Google Scholar in comparison with ISI and Scopus. It also discusses most of the specific problems that Jacsó (2006a/b) has identified in his articles. There is no doubt that Google Scholar's automatic parsing occasionally provides us with nonsensical results. However, these errors do not appear to be as frequent or as important as implied by Jacsós articles. They also do not generally impact the results of author or journal queries much, if at all.

Google Scholar has also significantly improved its parsing since the errors were pointed out to them. However, many academics are still referring to Jacsós 2006 articles as convincing arguments against any use of Google Scholar. I would argue this is inappropriate. As academics, we are only all too well aware that all of our research results include a certain error margin. We cannot expect citation data to be any different.

What is most important is that errors are random rather than systematic. I have no reason to believe that the Google Scholar errors identified in Jacsós articles are anything else than random. Hence they will not normally advantage or disadvantage individual academics or journals.

In contrast, commercial databases such as ISI and Scopus have systematic errors as they do not include many journals in the Social Sciences and Humanities, nor have good coverage of conferences proceedings, books or book chapters. Therefore, although it is always a good idea to use multiple data-sources, rejecting Google Scholar out of hand because of presumed parsing errors is not a rational.