Looking for John Smith? About author disambiguation

Details how users of Publish or Perish can disambiguate authors

Judging by the email support requests I receive and the responses to the Publish or Perish survey, the most common challenge that users experience is getting a "clean" publication record for their author of interest. Now this isn't so much of a problem if you are called Anne-Wil Harzing, Michael Zyphur or Marcel Wissenburg, but it is a problem if you are called Michelle Brown, Prakash Singh or John Smith. You are likely to have many namesakes in academia.

Author disambiguation in ISI

Please note, however, this is not a problem that just occurs with Google Scholar and Publish or Perish. Even commercial databases with very high subscription fees like Thomson Reuters Web of Science (ISI) have problems with this, see e.g.:

Smart searching avoids many problems

You can avoid many of these namesake problems by smart searching. There are five simple steps (linked to separate pages of the PoP tutorial) that will cover the majority of problematic searches:

  1. First of all ensure you put quotes around your search, e.g. "J Smith", not J Smith. If you don't, Google will match the initial anywhere in the author record, so you might get publications by A Smith and J Jones.
  2. Second, if your author has normally published with multiple initials, e.g. "JK Smith", then use multiple initials.
  3. Third, if your author as only ever published with one initial, you can exclude namesakes with multiple initials in one fell swoop by excluding "J* Smith", "J** Smith", "J*** Smith".
  4. Fourth, if your author works in a field where journals typically list full given names, you can simply search for "John Smith".
  5. If after these steps, you are left with only a few publications that are not relevant, you can simply use selective exclusion to remove them.

 Obviously, you can combine several of these steps for the best result.

What if this still doesn't give you the result you want?

The above will give you a good result for many authors, but for some you will still get many irrelevant hits. Hence, you need stronger armory. Below I have listed five more strategies that can be used, each linking to a detailed example in the PoP tutorial.

  1. Use year restrictions: Useful if you know you author has for instance only published since 2002.
  2. Use multiple names: Useful if your author has published under multiple names (e.g. maiden/married name, original/anglicized name).
  3. Exclude co-authors: Useful if your author has only published with a limited number of co-authors, you can then exclude namesakes' co-authors.
  4. Use research field: Useful if your author has published in designated research field that are likely to appear in their articles.
  5. Use affiliation: Useful if your author has only work in a limited number of institutions.

New: Google Scholar Profile searches in PoP version 5 and later

Since Publish or Perish version 5 you can search GS Profiles as well as the "raw" Google Scholar data. This means that author disambiguation has been conducted by the author themselves. Thus it is much easier to get a "clean" publication and citation record for authors with common names. GS Profile searching was further expanded with Publish or Perish version 7 and a dedicated blogpost about GS Profiles is available.

Please note that Google Scholar and GS Profiles are two distinct data sources. Google Scholar contains the "raw" data, Google Scholar Profiles is the profile that is created and curated by the author themselves. In cases where authors do not actively maintain their profile and have chosen for automatic updates, GSP might contain publications that are not authored by the author in question. This is especially common for authors with East Asian names. a problem Google Scholar shares with sources for bibliometric information.