Sacrifice a little accuracy for a lot more comprehensive coverage
© Copyright 2016 Anne-Wil Harzing. All rights reserved.
First version, 29 January 2016
[This is a prologue written for La revolución Google Scholar: Destapando la caja de Pandora académica by Enrique Orduña-Malea, Alberto Martín-Martín, Juan M. Ayllón, Emilio Delgado López-Cózar, to be published the Association of Spanish University Presses-UNE, 2016.]
If I were asked to summarise my “verdict” of Google Scholar in just 10 words, the title above would be a likely candidate. I have been following Google Scholar since its introduction in 2004, but my level of interest increased in 2006 when I needed to make a case for research impact in my application for promotion to full professor. Working in International Business, I found that very few of the journals I had published in were listed in the Web of Science. Those that were listed had low journal impact factors, largely because many of the journals that were citing them were not (yet) included in the Web of Science.
As a result, ten years ago it was very difficult for an International Business scholar to convince a promotion panel that they had achieved just as much impact as for instance academics working in the neighbouring disciplines of Economics and Psychology. In both these disciplines a far higher proportion of academic journals was ISI listed.
Enter Google Scholar! When searching for my work in Google Scholar, my case immediately looked much brighter. It turned out my work was actually very highly cited and even my non-traditional publications such as books, book chapters, and a journal ranking list had a strong impact in terms of citations. Unfortunately, the Google Scholar interface didn’t make it very easy to aggregate citation metrics in an accessible form that could be compared across academics.
Thus – in October 2006 – Publish or Perish (PoP) was born. PoP is a software program that retrieves and analyses academic citations. It uses Google Scholar to obtain the raw citations, then analyses these and presents a very wide range of metrics. For me, it was also the start of a new “research hobby”, doing research in the field of bibliometrics. Since then I have published about a dozen articles in journals such as Scientometrics, Journal of Informetrics, and Journal of the Association for Information Science & Technology. Most of these articles have used Google Scholar as a data source.
Google Scholar has come a long way since the early days. Its coverage has improved dramatically and it is now better for all disciplines than either the Web of Science or Scopus (Harzing & Alakangas, 2016). As a result there are now many bibliometric studies that rely on Google Scholar (usually with Publish or Perish) to do their research. An all of the words Google Scholar search for the words: Harzing "Publish or Perish" results in nearly 2,000 hits.
|Discipline||Scopus citations as % of
Google Scholar citations
|Web of Science citations as % of
Google Scholar citations
The table above summarises part of the results of our recent study of 146 academics in the Life Sciences, Sciences, Engineering, Social Sciences and Humanities. (Harzing & Alakangas, 2016). As is immediately apparent, both the Web of Science and Scopus miss a huge number of citations in the Humanities and Social Sciences, mostly because they do not include book publications, and in some disciplines cover only a fraction of the journals. However, even in Engineering, the Sciences and the Life Sciences, Google Scholar reports between one-and-a-half and twice as many citations as the Web of Science and Scopus.
I would be the first to acknowledge that Google Scholar does have some important drawbacks, the most important of which are summarised in my Publish or Perish tutorial. There is also the non-negligible danger that - with easy access to bibliometric tools - comes a certain level of inexpert or plain ignorant use.
PoP used by academics, librarians, governments, grant agencies, and laboratories
However, in my view Google Scholar has played a major role in “democratising” citation analysis (see also Harzing & Mijhardt, 2015). With Publish or Perish everyone with a computer and Internet access can run bibliometric searches. Not surprisingly, the software is used all over the world, from individual academics and librarians in more than 100 countries to governments departments (e.g. US Dept of Energy, US Environmental Protection Agency, US Agency for International Development), from grant giving agencies (e.g. SSHRC in Canada, CNRS in France) to research laboratories (e.g. Microsoft, Hewlett Packard, IBM).
PoP used at both highly ranked and emerging countries' universities
It is particularly gratifying to note that the software is widely used at highly ranked universities such as Harvard, Stanford, MIT, Oxford, and Cambridge, universities that have comprehensive access to commercial alternatives. However, it is even more satisfying to see its equally high use at under-resourced universities in countries such as Armenia, Botswana, Mongolia, Paraguay, Tajikistan, and Uruguay. More generally, there are over a thousand libraries worldwide that list the software as a free alternative to Scopus and the Web of Science. Google Scholar and Publish or Perish clearly fill a need!
PoP used to improve transparency and meritocracy
Closer to (my) home, the software and Google Scholar are particularly popular in Italy, Greece, and Poland, countries in which many universities do not have access to Scopus or the Web of Science either. What I find particularly pleasing is that in both Italy and Greece, the software has been used regularly to promote transparency and meritocracy in university appointments.
PoP used to cover non-English language publications
Moreover, in these and other countries, academics that publish in their own languages rather than in English need to rely on Google Scholar if they want more than incidental coverage of their work. Just like East Asians, they might also find that Thomson Reuters Web of Knowledge isn’t particularly well versed in accurately distinguishing academics with non-English names (see Harzing, 2015).
Using Google Scholar means sacrificing a certain level of accuracy. In the majority of bibliometric analyses, especially those at higher levels of aggregation and those focusing on robust metrics such as the h-index, Google Scholar’s inevitable slip-ups will not significantly influence the results. However, this is little solace for the (rare) individuals that are “robbed” of their most cited publication, because Google Scholar doesn’t parse their name correctly. If 99-100% accuracy is required, Google Scholar will always need to be triangulated with other data sources.
However, as a scholar in the Social Sciences, I am more than happy to accept the occasional Google Scholar lapse in return for a coverage that is vastly superior to Scopus and the Web of Science and does not discriminate against scientific publication practices outside the (Life) Sciences (see also Harzing, 2013). Hence my opening quote: “Sacrifice a little accuracy for a lot more comprehensive coverage”. I might add to that: … “and a lightening fast response”. By the time I have started up the Web of Knowledge or Scopus website, remembered my password and logged in, I have already finished my searches in Publish or Perish!
Although many researchers have used Google Scholar as a data source, none of them have been so diligent in their efforts to provide a better picture of Google Scholar as the EC3 Research Group. Since 2008 Emilio and his team have worked tirelessly to explore the inner workings of Google Scholar. I am therefore delighted that their experience to date has now been integrated into a monograph. My only regret is that it is not (yet) available in English!
- Harzing, A.W. (2013) Document categories in the ISI Web of Knowledge: Misunderstanding the Social Sciences?, Scientometrics, vol. 93, no. 1, pp. 23-34. Available online...
- Harzing, A.W. (2015) Health warning: Might contain multiple personalities. The problem of homonyms in Thomson Reuters Essential Science Indicators, Scientometrics, vol. 105, no. 3, pp. 2259-2270. Available online... [Press coverage in The Times and the Times Higher Education].
- Harzing, A.W.; Mijnhardt, W. (2015) Proof over promise: Towards a more inclusive ranking of Dutch academics in Economics & Business, Scientometrics, vol. 102, no. 1, pp. 727-749. Available online...
- Harzing, A.W. (2016) Publish or Perish tutorial, a collection of tips to introduce users to the program's main functions in 80 easy chunks.
- Harzing, A.W.; Alakangas, S. (2016) Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison, Scientometrics, vol. 106, no. 2, pp. 787-804. Available online... - Publisher's version - Presentation slides - Video presentation of this article - ESI top 1% most Highly Cited Paper - ESI hot paper
Copyright © 2017 Anne-Wil Harzing. All rights reserved. Page last modified on Sun 12 Mar 2017 18:29
Anne-Wil Harzing is Professor of International Management at Middlesex University, London and visiting professor of International Management at Tilburg University. She is a Fellow of the Academy of International Business, a select group of distinguished AIB members who are recognized for their outstanding contributions to the scholarly development of the field of international business. In addition to her academic duties, she also maintains the Journal Quality List and is the driving force behind the popular Publish or Perish software program.