Microsoft Academic: Is the Phoenix getting wings?

After a whole string of publications on Google Scholar as a source of citation data, I have now written up what is the first large-scale analysis of the new Microsoft Academic.

In this article, we compare publication and citation coverage of the new Microsoft Academic with all other major sources for bibliometric data: Google Scholar, Scopus, and the Web of Science, using a sample of 145 academics in five broad disciplinary areas: Life Sciences, Sciences, Engineering, Social Sciences, and Humanities.

Overall, just like my first small-scale study on this topic, our large-scale comparative study suggests that the new incarnation of Microsoft Academic presents us with an excellent alternative for citation analysis. We therefore conclude that the Microsoft Academic Phoenix is undeniably growing wings; it might be ready to fly off and start its adult life in the field of research evaluation soon.

Comparing publications, citations, h-index and hIa

The easiest way to summarise the article is probably through its five figures. Figure 1 compares the average number of papers and citations across the four databases. On average, Microsoft Academic reports more papers per academic than Scopus and Web of Science and less than Google Scholar. However, in addition to covering a wider range of research outputs (such for instance as books), both Google Scholar and Microsoft Academic also include so-called “stray” publications, i.e. publications that are duplicates of other publications, but with a slightly different title or author variant. Hence, a comparison of papers across databases is probably not very informative.

However, citations can be more reliably compared across databases as stray publications typically have few citations. As Figure 1 shows, on average Microsoft Academic citations are very similar to Scopus and Web of Science citations and substantively lower only than Google Scholar citations. On average, Microsoft Academic provides 59% of the Google Scholar citations, 97% of the Scopus citations and 108% of the Web of Science citations.

The aforementioned differences in citation patterns are also reflected in the differences in the average h-index and hIa (individual annual h-index) for our sample (see Figure 2). On average, the Microsoft Academic h-index is 77% of the Google Scholar h-index, equal to the Scopus h-index, and 108% of the Web of Science h-index. The Microsoft Academic hIa-index is on average 71% of the Google Scholar index, equal to the Scopus index and 113% of the Web of Science index. Again Microsoft Academic, Scopus and Web of Science present very similar metrics.

Cross-disciplinary comparisons

Microsoft Academic has fewer citations than Scopus and, marginally, than Web of Science for the Life Sciences and Sciences (see Figure 3). However, overall citation levels for the Life Sciences and Sciences are fairly similar across three of the four databases. To a lesser extent this is true for Engineering as well. For three of our five disciplines, Microsoft Academic thus differs substantially in citation counts only from Google Scholar, providing between 57% and 67% of Google Scholar citations.

Confirming our earlier study based on the same sample of academics (Harzing & Alakangas, 2016), the differences between disciplines are much smaller when considering the hIa, which was specifically designed to adjust for career length and disciplinary differences (see Figure 4). Again we see that Microsoft Academic provides metrics that are very similar to Scopus and Web of Science for the Life Sciences and the Sciences.

For Engineering and the Humanities, the Microsoft Academic hIa is very similar to the Scopus hIa, whereas it is 1.2 (Engineering) to 1.5 (Humanities) times as high as the Web of Science hIa. Only for the Social Sciences is the Microsoft Academic hIa substantially higher than both the Scopus and the Web of Science hIa. The Google Scholar hIa is higher for all disciplines than the Microsoft Academic hIa, from 1.3 times as high for Engineering to 1.9 times as high for the Humanities.

MAS estimated citation counts

Microsoft Academic only includes citation records if it can validate both citing and cited papers as credible. Credibility is established through a sophisticated machine learning based system and citations that are not credible are dropped. The number of dropped citations, however, is used to estimate “true” citation counts. These estimated citation counts were added to the Microsoft Academic database in July/August 2016.

Taking Microsoft Academic estimated citation counts rather than linked citation counts as our basis for the comparison with Scopus, Web of Science, and Google Scholar does change the comparative picture quite dramatically. Looking at our overall sample of 145 academics, Microsoft Academic’s average estimated citation counts (3873) are much higher than both Scopus (2413) and Web of Science (2168) citation counts. However, Microsoft Academic average estimated citation counts (3873) are also very similar to Google Scholar’s average counts (3982); presenting a difference of less than 3%.

With regard to disciplines, Figure 5 shows that although Microsoft Academic estimated citation counts are closer to Google Scholar citation counts for all disciplines, Microsoft Academic gets closer for some disciplines than for others. For the Life Sciences Microsoft Academic estimated citation counts are in fact 12% higher than Google Scholar counts, whereas for the Sciences they are almost identical. For Engineering, Microsoft Academic estimated citation counts are 14% lower than Google Scholar citations, whereas for the Social Sciences this is 23%. Only for the Humanities are they substantially (69%) lower than Google Scholar citations.

Other articles on the same themes

Generated by Cphyl (2017.03.27.1326A)