14.2.2 Accurate self-citation counts are difficult to achieve in ISI

As I discussed in Section 9.2 many university administrators seem to be obsessed with the presumed need to exclude self-citations from someone's citation record. They often assume ISI's Web of Science offers an easy and fool proof way to do so. In the process Google Scholar is often discarded as a data-source, because they think it doesn't allow an easy option to exclude self-citation.

I have already argued in Chapter 1 that excluding self-citations is almost always a waste of time. I have also shown in Section 9.2.2 that excluding self-citations with Google Scholar is in fact fairly easy, especially for academics with a modest citation record. Here I will discuss the enormous difficulty in getting accurate self-citation scores from ISI that are comparable across candidates.

ISI's citation report: an easy option?

Most users of the ISI Web of Science list the possibility to exclude self-citations as one of its big advantages. The ISI Web of Science offers the possibility to create a citation report (see screenshot) where one can subsequently exclude self-citations. Understandably, to most people, this sounds like a really easy way to extract a “clean” citation record.

Harzing WoS search

However, in reality this is a very cumbersome and error prone process, that unless carried by someone with substantial expertise in using the ISI database is likely to lead to highly diverging interpretations by different applicants and evaluators.

The first problem is that the citation report can only be created in ISI's general search, which only reports citations to ISI listed journal articles, not to books, book chapters, conference proceedings, or non-ISI listed journals (see Section 13.1.1 for details).

A General Search for my own publication record lists only 389 citations, even though in the Cited Reference Search my work has gathered a total of 914 citations, an increase of no less than 135%. As Chapter 16 shows, differences can be much more dramatic in Social Sciences and Humanities.

ISI uses citing articles, not citations for most of its analyses

The second problem is that even if one would accept this limitation in coverage, the proportion of self-citations would be likely to be drastically overestimated with this function. The reason is that ISI performs any of its analyses only on citing articles, not on citations. As a single article can and often will contain more than one citation to an academics work, this again underestimates the academics impact.

In my own case the 914 cited reference citations come from only 667 articles. My 389 citations in the general search come from 341 articles. If one subsequently clicks the link “view without self-citations” (left-hand picture) link only 331 articles remain (right-hand picture below). This means that 10 of the 341 articles citing articles were written by myself, a self-citation rate of 3%. However, many searchers will conclude that 58 of my 389 citations are self-citations (i.e. 15%) as they see the number of “citations” going down from 389 to 331.

WoS results #1 WoS results #2

Why we are still not there?

The third problem is that even the 3% self-citation rate could be incorrect. Why is this so? Because this is the proportion of self-citing articles, not the proportion of self-citations. Of course my articles could contain more than one reference to my own work.

Hence in order to find out my actual self-citation rate, one would need to check the citations of every single self-citing article. To do so, one would need to go back again to the previous page, click on “view citing articles” (instead of “view without self-citations”), and then refine the citing articles by author (see screenshot on the left).

WoS refine WoS refined results

One would then be presented with a list of articles of the academic in question that are citing his or her own work, the first three of which are reproduced above. In order to establish the actual number of self-citations contained in these articles, one would need to go through the reference list of each article and identify the number of references to the academics work.

This is a very time-consuming process, even if your library like the University of Melbourne has an integrated electronic database. For most articles I would only need to click on the full text button to access the article electronically. Although this would still take several minutes per paper, it would be relatively quick. However, for journals where full-text is not available (e.g. the first) I would need to go to the University library to access the article in question, which could easily take an hour if your library is not within the same building.

The fourth problem is that when counting the number of self-citation in these ten articles, I would need to remember to only count self-citations to ISI-listed journal articles, as citations to non ISI publications were not included in the initial count anyway. I would also need to remember to search for all articles where I am not the first author. Even if someone would get this far, it is likely that one would mistakingly count all self-citations, hence again overestimating the proportion of self-citations, or forget some of the articles that they have not first-authored.

Fortunately, I do keep electronic copies of all my publications, so in my case I could fairly quickly verify that I had cited fifteen ISI listed articles that were (co)-authored by myself. This means that my accurate number of self-citations in this data-set would be nearly 4%, slightly higher than one would conclude from looking at the proportion of self-citing articles, but nowhere near as high as a naive interpretation would assume.

No, We are still not there: Going back to the Cited Reference Search

As I indicated above, the first problem is that the ISI General Search underestimates an academics citation record substantially, especially in the Social Sciences and Humanities (see Chapter 16 for more details). Therefore, if one wanted to get a really accurate estimate of someone's self-citations, one would need to use the Cited Reference search.

To continue the example of my own citation record, I have 914 ISI citations in ISI's Cited Reference Search. We thus need to establish how many of these citations were self-citations. In order to do so, we need to go through a five-step procedure.

First we need to search for my name in the Cited Reference Search. Second, we need to establish which of resulting publications are mine and which are written by other academics with the same name. In my case this is not a problem, but for academics with a more common name, this can take a long time. We select all relevant articles and then click “Finish Search” (see screenshot).

This presents us with all citing articles (please note, this is not the same as citations). In order to identify the self-citing articles from this set, as a third step we need to again refine the search by author in the same way as above. In my case, this identifies 21 self-citing articles (or just over 3% of the total number of 667 citing articles).

WoS finish search

In order to find out the number of self-citations, as a fourth step we need to go through the reference lists of each of these 21 articles. Obviously, this is a fairly tedious procedure. This time, we need to remember to count all citations, not just citations to ISI listed articles, as our base rate is now all citations. We also still need to remember to include all articles where we are not the first authors. Doing for all 21 articles, resulted in a total of 47 self-citations, i.e. just over 5% of the total number of citations.

Conclusion

Even if someone would be willing to go through this extremely time-consuming and cumbersome process, it is quite likely that most applicants will not really understand how to do this. More importantly, it is very unlikely that university administrators or research officers will understand these procedures and hence candidates will most likely be instructed to simply use the “view without self-citations” link. This leads almost certainly to a an underestimation of their real number of citations (as it only counts citing articles), and most likely to a substantial overestimation of their self-citations.

So why bother with removing self-citations in the first place? It is a time-consuming and error prone process. In the vast majority of cases, the error margin introduced by removing self-citations will be larger than the error margin present by including self-citations. It is likely that different applicants will interpret the instructions differently, hence leading to inequitably comparisons. For most applicants and administrators, the difference between ISI's General Search and Cited By Search is difficult enough as it is. Why include even more complexity?

In the example above, my self-citation rate ranged from 3-5%, or 15% in the very naive and inaccurate interpretation. Would it have made any difference in evaluating my ISI citation record whether I had 867 or 914 citations, or even 331 or 389 citations? Most probably not. So why bother?