14.2.1 Thomson ISI underestimates citation impact

The major disadvantage of the Web of Science is that it often provides an underestimation of an academics citation impact. For example, the current (August 2010) number of citations to my own work is around 400 with ISI's “general search” function, around 900 with ISI's “cited reference” function and 2500 with Google Scholar.

Differences will not be as dramatic for all scholars, but many academics show a substantially higher number of citations in Google Scholar than in the Web of Science. For instance Nisonger (2004) found that (excluding self-citations) Web of Science captured only 28.8% of his total citations, 42.2% of his print citations, 20.3% of his citations from outside the United States, and a mere 2.3% of his non-English citations.

At the same time both sources (Web of Science and Google Scholar) have been shown to rank specific groups of scholars in a relatively similar way. Saad (2006) found that for his subset of 55 scientists in consumer research, the correlation between the two h-indices was 0.82. Please note that this does not invalidate the earlier argument as it simply means most academics h-indices are underestimated by a similar magnitude by Web of Science.

There are a number of specific reasons that all contribute to a greater or lesser extent to the underestimation of citation impact by Thomson ISI Web of Science. I will discuss each of them in some detail below.

Web of Science General Search is limited to ISI-listed journals

In the General Search, the Web of Science only includes citations to journal articles published in ISI listed journals. Citations to books, book chapters, dissertations, theses, working and conference papers, reports, and journal articles published in non-ISI journals are not included.

Whilst in the Sciences this may give a fairly comprehensive picture of an academics total output, in the Social Sciences and Humanities (SSH) only a limited number of journals are ISI listed. Also, in both the Social Sciences and the Humanities books and book chapters are very important publication outlets. Google Scholar includes citations to all academic publications regardless of whether they appeared in ISI-listed journals.

ISI Conference proceedings provide limited coverage

Since 2008 ISI has integrated their database of conference proceedings into the Web of Science. This partially accommodates the problem of not counting conference proceeding papers. However, ISI does not provide an overview of conferences that are covered beyond a generic list of topics, so it is unclear which conferences are covered. I was not able to find any of my own proceedings papers. Also, only conferences from 1990 onwards are covered.

In general, conference coverage seems to be much more comprehensive in the Sciences than in the Social Sciences and Humanities. A search for English-language conferences in the Sciences between in 2009 only resulted in > 100,000 hits (i.e. hitting the maximum number of results), mostly in Engineering and Computer Science.

In the Social Sciences and Humanities the same search results in some 25,000 conference papers. Nearly half of these are in Management or Business, with the bulk of the remainder made up of Education, Operations Research, Economics and Computer Science. However, many of the results are not actually conference proceedings, but journal special issues with papers that were initially presented at conferences or articles misclassified as proceedings papers (see Section 13.2.4 for details).

A search for common author names allows us to better gauge comparative coverage. Zhang results in 1,433 hits in the Social Sciences and Humanities and 16,037 hits in the Sciences for English language publications in 2009. Smith results in 53 hits in the Social Sciences and Humanities and 2,462 hits in the Sciences for English language publications in 2009. Hence, it appears that the Social Sciences and Humanities might be even more underrepresented in ISI listed conference proceedings than they are in ISI listed journals.

Web of Science Cited Reference limited to citations from ISI-listed journals

In the Cited Reference function Web of Science does include citations to non-ISI publications. However, it only includes citations from journals that are ISI-listed (Meho & Yang, 2007). As indicated before in SSH only a limited number of journals are ISI-listed.

Butler (2006) analysed the distribution of publication output by field for Australian universities between 1999-2001. She finds that whereas for the Chemical, Biological, Physical and Medical/Health sciences 69.3%-84.6% of the publications are in ISI listed journals, for Social Sciences such as Management, History Education and Arts only 4.4%-18.7% of the publications are published in ISI listed journals. ISI estimates that of the 2000 new journals reviewed annually only 10-12% are selected to be included in the Web of Science (Testa, 2004).

Archambault and Gagné (2004) found that US and UK-based journals are both significantly over-represented in the Web of Science in comparison to Ulrichs journal database. This overrepresentation was stronger for the Social Sciences and Humanities than for the Natural Sciences. Further, in many areas of engineering, conference proceedings are very important publication outlets. For example, in a search conducted in 2007, one of the most cited computer scientists (Hector Garcia-Molina) gathered more than 20,000 citations in Google Scholar, with most of his papers being published and cited in conference proceedings. In Web of Science he had a mere 240 citations to his name!

In contrast to the Web of Science, Google Scholar includes citations from all academic publications regardless of where they appeared. As a results Google Scholar provides a more comprehensive picture of recent impact, especially for the Social Sciences and Humanities where more than five years can elapse between research appearing as a working or conference paper and research being published in a journal.

This also means that Google Scholar usually gives a more accurate picture of impact for junior academics. However, it must be acknowledged that although Google Scholar captures more citations in books and book chapters than the Web of Science (which captures none), it is by no means comprehensive in this respect. Google Book Search may provide a better alternative for book searches.

Web of Science Cited Reference counts citations to non-ISI journals only towards first author

Whilst the Cited Reference function of Web of Science does include citations to non-ISI journals, it only includes these publications for the first author. Hence any publications in non-ISI journals where the academic in question is the second or further author are not included.

Google Scholar includes these publications for all listed authors. For instance, my 2003 publication with Alan Feely in Cross Cultural Management shows no citations in the Web of Science for my name, whilst it shows 58 citations in Google Scholar. A more disturbing case is discussed in Chapter 16, where our computer scientist is “robbed” of some 700-odd ISI citations to a book for which he is the second author.

Citation records for academics with “foreign” names are underestimated

Thomson ISI seems to have some difficulty with names that deviate from traditional English names. Below I describe four variants of this problem: names with diacretics, names with apostrophes, hyphenated names, names with prefixes and names with Asian characters.

Names with diacritics

Thomson ISI's Web of Science has problems with names including diacritics (e.g. Özbilgin or Olivas-Luján). A search with diacritics provides an error message (Search Error: Invalid query. Please check syntax) and no results (see below). A search without diacritics is the only way to get results.

WoS invalid query

Names with apostrophes

Thomson ISI's Web of Science has problems with names with apostrophes, such as many Irish names. A search for "O'Rourke K*" in the Web of Science Cited Reference function results in only eleven citations to the work of the economic historian Kevin H O'Rourke, whereas a search for "ORourke K*" gives more than 500 citations.

Strangely enough, the search for "O'Rourke K*" in the General search seems to provide a comprehensive record, although of course his influential books and book chapters are not included. As we saw earlier Google Scholar has fixed its problems with names with apostrophes and would hence be a better option for this type of searches.

Hyphenated names

Thomson ISI's Web of Science has problems with hyphenated names. Even though most academics refer to these names correctly, ISI data entry staff apparently prefers to enter these names without hyphens. As a results citation scores for academics with hyphenated names can be seriously underestimated.

A naïve user would search for Charles Baden-Fuller as “Baden-Fuller C*” and would find only about 200 citations. However, searching for Badenfuller would unearth another 800-odd citations. Google Scholar doesn't have any problems with hyphenated names and finds nearly 4,000 citations for Baden-Fuller and only two citations for Badenfuller, presumably caused by referencing errors by the citing authors.

Names with pre-fixes

There are many languages in which family names are preceded by pre-fixes. For instance in Dutch common examples are “van” (as in van Raan), “van der” (as in van der Wal). In bothFrench and Spanish “de la” is common. In Dutch, the correct way of listing these pre-fixes in a list of references is behind the family names, i.e. when ordering alphabetically the prefix is ignored, such as “Wal, R. van der” and “Raan, F. van”.

However, Thomson ISI's Web of Science has difficulty with these prefixes. More than 90% of the citations to the work of bibliometrist Anthony van Raan are incorrectly listed as Vanraan, even though in most cases the referring author listed the name correctly as van Raan.

Prefixed names

Again Google Scholar doesn't have any problems with names with prefixes. A search for “A van Raan” results in nearly 4,000 citations, whilst a search for “A Vanraan” only finds 50 citations (see screenshot above), 47 of which to an article in Nature that indeed incorrectly lists the last name as Vanraan.

Names with Asian characters

In Google Scholar it is possible to do a search in any language, including character based languages such as Chinese, Japanese and Korean. A search for Wang Ying (王英) for instance results in a large number of hits in Google Scholar (see screenshot below).

Asian author names

In ISI's Web of Science this same search will result in the same error messages provided for names with diacritics as information is only stored in English (see screenshot below).

WoS invalid query #2

Web of Science has very limited coverage of non-English sources

The Web of Science includes only a very limited number of journals in languages other than English (LOTE). Hence citations in non-English journals are generally not included in any Web of Science citation analysis. Whilst Google Scholar's LOTE coverage is far from comprehensive, it does include a larger number of publication in other languages and indexes documents in French, German, Spanish, Italian and Portuguese (Noruzi, 2005).

Meho and Yang (2007) found that 6.94% of Google Scholar citations were from LOTE, while this was true for only 1.14% for the Web of Science and 0.70% for Scopus. Archambault and Gagné (2004) found that Thomsons ISI's journal selection favours English, a situation attributable to ISI's inability to analyse the content of journals in LOTE.

As an example, Gérard Charreaux, a French accounting scholar, accumulated a grand total of 15 ISI citations in his lifetime. As the screenshot below shows he has a very respectable number of citations in French language journals and books that are indexed Google Scholar.

Charreaux results

Web of Science has poor aggregation of minor variations of the same title

In the General Search function Web of Science does not include citations to the same work that have small mistakes in their referencing (which especially for books and book chapters occurs very frequently). In the Cited Reference function Web of Science does include these citations, but they are not aggregated with the other citations.

For a rather amusing example refer to the screenshot below. It shows the many different variants of Geert Hofstedes highly cited book Cultures Consequences. There are still dozens of other variations included in the ISI database that are not shown below. For further details on this see Section 13.2.3.

CC #1 CC #2 CC #3

In many cases these errors were caused by data entry errors and not by mistakes in the original reference. Of course given the large number of entries that ISI has to cope with, incidental errors are inevitable. However, in a commercial product I would have hoped for a more active quality control system.

Google Scholar appears to have a better aggregation mechanism than Web of Science. Even though duplicate publications that are referenced in a (slightly) different way still occur regularly, Google Scholar has a grouping function that resolves the worst ambiguities.

Belew (2005) confirms that Google Scholar has lower citation noise than Web of Science. In the Web of Science only 60% of the articles were listed as unique entries (i.e. no citation variations), while for Google Scholar this was 85%. None of the articles in his sample had more than five separate listings within Google Scholar, while 13% had five or more entries in the Web of Science.