13.2.2 Some of Google scholars coverage might be problematic

There are several potential problems with coverage in Google Scholar, which we will discuss in some detail below. However, it must be said that these problems were mostly present in the early days and Google Scholar has improved its coverage over the years.

Names with diacritics, apostrophes or ligatures were problematic

Both Google Scholar and Thomson ISI Web of Science have problems with academics that have names including either diacritics (e.g. Özbilgin or Olivas-Luján). In Google Scholar a search for the name with diacritics will usually provide some results, but they are not comprehensive (see below). A search for the name without diacritics encompasses both the references with and those without diacritics and is hence recommendable.

Names with apostrophes (e.g. many Irish names) initially also presented problems in Google Scholar. These problems now seem have been resolved, but are still present for ISI (see next chapter). In the early days, a search for "KH O'Rourke" in Google Scholar provided very few results as Google Scholar treated both KH and O as initials and hence searched for KHO Rourke. Adding an additional blank space before O'Rourke solved this problem. More recently, however, searches for "KH O'Rourke" (without the additional blank space) result in more than 3,000 citations.

With ligatures Without ligatures

Ligatured names initially also presented problems in Google Scholar. If an academic's name includes a sequence of characters that is ligatured in traditional typesetting ("fi", "ff", "fl", and others in other languages) and he/she prepares papers with LaTeX (as do most in mathematics and computer science), Google Scholar did not find their publications.

When searching for "J Bradfield" in 2007, I only found some 190 cites for computer scientist Julian Bradfield, whereas "J Bradeld" resulted in nearly 400 cites for the same person. However, Google Scholar has resolved this problem and a search for J Bradfield" now results in some 900 cites, whereas "J Bradeld" results in only 2 results with a total of one citation.

Google Scholar includes some non-scholarly citations

Google Scholar does sometimes include non-scholarly citations such as student handbooks, library guides or editorial notes. However, incidental problems in this regard are unlikely to distort citation metrics, especially robust ones such as the h-index.

An inspection of my own papers shows that more than 75% of the citations are in academic journals, with the remainder appearing in books, conference and working papers and student theses. Few non-scholarly citations were found. Moreover, I would argue that even citations in student handbooks, library guides or editorials show that the academic has an impact on the field.

In a similar vein, Vaughan and Shaw (2008) argue that 92% of the citations identified by Google Scholar in the field of library and information science represented intellectual impact, primarily citations from journal articles.

Not all scholarly journals are indexed in Google Scholar

Not all scholarly journals seem to be indexed or indexed comprehensively in Google Scholar. Unfortunately, Google Scholar is not very open about its coverage and hence it is unclear what its sources are.

It is generally believed that Elsevier journals are not included (Meho & Yang, 2007), because Elsevier has a competing commercial product in Scopus. However, I was able to find all Elsevier journals I have published in. It is quite possible that Google Scholar coverage for Elsevier journals has increased in the last three years.

On the other hand, Meho & Yang (2007) did find that Google Scholar missed 40.4% of the citations found by the union of Web of Science and Scopus, suggesting that Google Scholar does miss some important refereed citations. However, it must also be said though that the union of Web of Science and Scopus misses 61.04% of the citations in Google Scholar. Further, Meho & Yang (2007) found that most of the citations uniquely found by Google Scholar are from refereed sources.

Google Scholar coverage might be uneven across fields of study

Although Google Scholar generally provides a higher citation count than ISI, this might not be true for all fields of studies.

In a systematic comparison of a 64 articles in different disciplines, Bosman et al. (2006) found overall coverage of Google Scholar to be comparable with both Web of Science and Scopus and slightly better for articles published in 2000 than in 1995. However, huge variations were apparent between disciplines with Chemistry and Physics in particular showing very low Google Scholar coverage and Science and Medicine also showing lower coverage than in Web of Science.

Based on a sample of 1650 articles Kousha & Thelwall (2007, 2008) found Google Scholar coverage to be less comprehensive than ISI in the three Science disciplines included in their study (Biology, Chemistry and Physics), with Google Scholar showing a particularly low coverage for Chemistry. Google Scholar coverage for the four Social Sciences included in their study (Education, Economics, Sociology and Psychology) as well as Computing was significantly higher than ISI coverage. Similarly, Bar-Ilan (2008) finds the number of Google Scholar citations substantially higher than the WoS and Scopus for mathematicians and computer scientists, but lower for high-energy physicists.

On the other hand, my own recent comparison of coverage across three different databases (see Chapter 16 for details) for a small sample of academics across a large variety of disciplines showed that Google Scholar had higher citation counts than both ISI and Scopus for 9 out of 10 academics. For the 10th academic the citation count in Google Scholar was 10% lower than in ISI, but this might well be caused by the fact that many of his publications were old (see next section).

More detailed comparisons by academics working in the respective areas would be necessary before we can draw general conclusions. However, as a general rule of thumb, I would suggest that using Google Scholar might be most beneficial for three of the Google Scholar categories: Business, Administration, Finance & Economics; Engineering, Computer Science & Mathematics; Social Sciences, Arts & Humanities.

Although broad comparative searches can be done for other disciplines (Biology, Life Sciences & Environmental Science; Chemistry & Materials Science; Medicine, Pharmacology & Veterinary Science; Physics, Astronomy & Planetary Science), I would not encourage heavy reliance on Google Scholar for individual academics working in other areas without doing spot checks with either Scopus or Web of Science.

Google Scholar does not perform as well for older publications

Google Scholar does not perform as well for older publications as these publications and the publications that cite them have not (yet) been posted on the web.

Pauly & Stergiou (2005) found that Google Scholar had less than half of the citations of the Web of Science for a specific set of papers published in a variety of disciplines (mostly in the Sciences) between 1925-1989. However, for papers published in the 1990-2004 period both sources gave similar citation counts. The authors expect Google Scholar's performance to improve for old articles as journals back issues are posted on the web.

Meho & Yang (2007) found the majority of the citations from journals and conference papers in Google Scholar to be from after 1993. Belew (2005) found Google Scholar to be competitive in terms of coverage for references published in the last 20 years, but the Web of Science superior before then. This means that Google Scholar might underestimate the impact of scholars who have mainly published before 1990.

However, these studies are fairly old and with many journals posting back issues on the web, it is expected that this aspect of Google Scholar's coverage will the less important over time.