Accuracy of the results
The citation analysis is based on the results returned by Google Scholar. These are not always 100% accurate. Here are some issues to be aware of. Please also note that Google Scholar limits its results to 1000. The results are ranked by number of citations, so the 1000 shown are the most-cited results.
Note: See More about citation analysis for an in-depth discussion of the validity, assumptions, and limitations of the underlying sources and methods used by Publish or Perish.
Results that differ from Google Scholar
If the Publish or Perish results differ from the ones you get by using Google Scholar directly, this is typically caused by the fact that Publish or Perish uses the Advanced Scholar Search capabilities of Google Scholar, whereas your manual search probably used the standard Google Scholar search. The latter is equivalent to an All of the words search, which matches the search terms anywhere in the searched documents (author, title, source, abstract, references etc.) and usually provides too many irrelevant results for an effective citation analysis.
However, if you do want to get the same results in Publish or Perish as with a standard Google Scholar search, do the following.
- Go to the General citation search page.
- Empty all text fields except All of the words.
- Enter your query terms in the All of the words field.
- Set the Year of publication fields both to 0.
- Clear the Title words only field.
- Click on Lookup.
- When the results appear, click on the Rank column header to sort the results in the order in which Google Scholar returned them.
Ineffective queries
Not all queries return the results you would expect. You might have to refine your queries to get the most accurate results. See Author impact analysis, Journal impact analysis, and General citation search for tips to refine your query.
Some tell-tale signs of ineffective queries are:
- Too many results that you are not interested in
- This is typically caused by criteria or search terms that are too broad.
- Mixed up ranking order
- Sometimes results appear plausible, but the Rank column appears strangely jumbled. In the screen shot below, note that the most cited works rank very low, and that the #1 and #2 ranked works only appear some way down the list:
This indicates that the most cited works are not the most relevant to the query. It is caused by search terms that are inappropriate, too wide-ranging, and that therefore catch highly-cited but mostly irrelevant works as well.
For other potential problem areas, read on.
Mixed-up title and source fields
Some references contain mixed-up fields as illustrated in the second reference below:
This is typically caused by garbled information returned by Google Scholar, presumably because its sources were inaccurate or difficult to parse automatically by Google's web crawler.
The effect on the citation analysis is similar to having duplicates (see above), because some works end up as separate entries instead of being included with the correct title.
Author of publication listed under title
For some references Google Scholar lists the author name as part of the title, rather than including the author’s name in the author field, presumably because its sources were inaccurate or difficult to parse automatically by Google's web crawler.
As a result the reference in question does not show up if you search using the author’s name in an author query, since this query only lists results where the name is listed in the author field. For example, the following highly cited paper by Anbulagan does not show up when searching for his name in the author field.
[CITATION] Anbulagan. Heuristics based on unit propagation for satisfiability problems CM Li - Proceedings of the International Joint Conference on …, 1997 Cited by 237
If you have reason to believe that certain publications of a particular author are missing, you might want to repeat the search using a general query with the author’s name in the Any of the words field. This query searches for the author’s name in all parts of the database. Whilst it normally provides a large range of irrelevant results (especially if an author’s name is also a common noun, e.g. Robert Wood), it does allow you to find publications where Google Scholar has accidentally misplaced the author’s name.
Matching of journal titles
The journal impact analysis implements the Advanced Scholar Search "return articles published in". Google Scholar interprets this search broadly and returns matches for both the publication and the publisher. This means that a search for a relatively generic journal title such as Information Systems Research might get additional matches in the publisher field, such as Center for Information Systems Research or even the eventual co-sponsor/co-publisher of MIS Quarterly (Management Information Systems Research Center, University of Minnesota), even though this is not directly visible in the Google Scholar output.
Duplicate results
Occasionally you might notice duplicate or near-duplicate articles in the Results list. These duplicates may be due to one or more of the following:
- Sloppy referencing. Not all references to an author's work are perfectly accurate and small differences in the names of the authors, the article's title, or its sources may cause the same article to appear more than once.
- Other funnies. Google Scholar occasionally appears to return duplicate citations or just different results for the same query. This seems to happen particularly when the name you are looking for appears both as a given name and a surname in the returned results, for example Martin, Neal, Tania. If this happens, a second Lookup (with the same parameters) may return more accurate results.
The effect on the citation analysis is that:
- The total number of articles may come out higher than the actual number, because duplicates are counted separately.
- The citations per paper may come out lower, for the same reason.
- The h-index and g-index may come out differently, because citations are spread over the duplicates.