The citation analysis in Publish or Perish is based on the results returned by Google Scholar and Microsoft Academic. Both providers crawl the web for academic content and their processing is automatic. Unlike Scopus or the Web of Science, they are not manually curated bibliometric databases.

Therefore, errors or omissions in Google Scholar and Microsoft Academic are more frequent than in Scopus and the Web of Science, which incidentally are by no means error-free either. Fortunately, citation metrics such as the h-index are fairly robust and usually not unduly influenced by occasional errors or omissions. For more details, please see my white paper: Sacrifice a little accuracy for a lot more comprehensive coverage.

If 99-100% accuracy is required, you will need (to get your employer) to pay the, very substantial, subscription fees needed to gain acces to Scopus or the the Web of Science. If you are willing to accept that bibliometrics is an inexact science and that any data source has its own flaws, you might find Google Scholar and Microsoft Academic accuracy perfectly adequate for your purpose.

Please note: Publish or Perish can also import Scopus or Web of Science data, so if you do have access to these data-bases you can use Publish or Perish to calculate a wide variety of citation metrics.

Accuracy in Microsoft Academic

As this data source is still fairly new and in constant development, a detailed analysis of accuracy in Microsoft Academic is still in progress. Microsoft Academic seems to follow a more "controlled" crawling process than Google Scholar and hence does not provide as many "stray" references. However, its automatic processing inevitably results in duplicate results for some articles, with the "stray" references generally showing no or very few citations.

In addition, at present Microsoft Academic has several known limitations and problems:

  1. Duplicate versions for (some) article titles with a semi-colon and sub-title, one with and one without subtitle, with both versions having substantive citation scores. This problem is easily resolved in Publish or Perish by merging the two articles. [Please note: this limitation appears to have been largely - though not entirely - addressed since late January 2017. Please let me know if you still come across problems with this.].
  2. Wrong year allocation for some articles; this seems to happen mostly with journals published by Emerald and Taylor & Francis, so might indicate a parsing problem with their website. There is currently no work-around for this. [Please note: this limitation appears to have been addressed since early February 2017. Please let me know if you still come across problems with this.].
  3. [Very occasionally] Microsoft Academic parses two versions of the same publication from different sources whilst allocating full or nearly full citation scores to both. This leads to a fairly serious overstimation of the citation counts for the article in question. [Please note: this limitation appears to have been addressed since late January 2017. Please let me know if you still come across problems with this.].
  4. Microsoft Academic does not have a publisher record. This means that when exporting bibliographic details the reference for books and book chapters will be incomplete.

Accuracy in Google Scholar

There are a number of issues to be aware of for Google Scholar, all discussed in detail in the Publish or Perish tutorial:

Please also note that Google Scholar limits its results to 1000. The results are ranked by number of citations, so the 1000 shown are the most-cited results. For further issues specific to Google Scholar read on.

Publish or Perish results that differ from Google Scholar

If the Publish or Perish results differ from the ones you get by using Google Scholar directly, this is typically caused by the fact that Publish or Perish uses the Advanced Scholar Search capabilities of Google Scholar, whereas your manual search probably used the standard Google Scholar search. The latter is equivalent to an All of the words search, which matches the search terms anywhere in the searched documents (author, title, source, abstract, references etc.) and usually provides too many irrelevant results for an effective citation analysis.

However, if you do want to get the same results in Publish or Perish as with a standard Google Scholar search, do the following.

  1. Go to the General search page.
  2. Empty all text fields except All of the words.
  3. Enter your query terms in the All of the words field.
  4. Set the Year of publication fields both to 0.
  5. Clear the Title words only field.
  6. Click on Lookup.
  7. When the results appear, click on the Rank column header to sort the results in the order in which Google Scholar returned them.

Ineffective queries in Google Scholar

Not all queries return the results you would expect. You might have to refine your queries to get the most accurate results. See Author search, Journal search, and General search for tips to refine your query.

Some tell-tale signs of ineffective queries are:

  • Too many results that you are not interested in: This is typically caused by criteria or search terms that are too broad.
  • Mixed up ranking order: Sometimes results appear plausible, but the Rank column appears strangely jumbled. In the screen shot below, note that the most cited works rank very low, and that the #1 and #2 ranked works only appear some way down the list:

This indicates that the most cited works are not the most relevant to the query. It is caused by search terms that are inappropriate, too wide-ranging, and that therefore catch highly-cited but mostly irrelevant works as well. Microsoft Academic doesn't typically suffer from the same problems as it matches search terms only in the title and abstract.

Matching of journal titles in Google Scholar is sometimes too broad

The journal impact analysis implements the Advanced Scholar Search "return articles published in". Google Scholar interprets this search broadly and returns matches for both the publication and the publisher. This means that a search for a relatively generic journal title such as Information Systems Research might get additional matches in the publisher field, such as Center for Information Systems Research or even the eventual co-sponsor/co-publisher of MIS Quarterly (Management Information Systems Research Center, University of Minnesota), even though this is not directly visible in the Google Scholar output.

