The phantom reference strikes again

Update of a 2017 white paper that shows that - with over 500 citations - the phantom reference is alive and kicking

Anne-Wil Harzing - Thu 28 Nov 2019 08:13 (updated Thu 2 Jun 2022 13:03)

Picture: Paris Musee Cluny by Pieter Kroonenberg

Two years ago I published: The mystery of the phantom reference: a detective story. It went viral, as far as academic research ever goes viral. It received press coverage in the British Times (Professor and paper that don’t actually exist are copybook way to win a citation), the Dutch NRC (Spookartikel wordt ‘toevallig’ 400 keer geciteerd), the Austrian Der Standard (Nie geschriebenes Paper wurde in 400 Fachartikeln zitiert) and coverage in numerous blogs and aggregators.

To cut a long story short: The “phantom reference” had been created to illustrate Elsevier's desired reference format, but - through a combination of sloppy writing, sloppy quality control and possibly a bug in the referencing or proofing system - came to be cited nearly 400 times in journal articles and conference proceedings published by Elsevier. So has anything changed in the last two years? Has the phantom finally disappeared or does it keep rearing its head? A quick Web of Science search with the same search parameters as before quickly established that the problem is still very much alive. The number of citations to the non-existing reference now lies above 500.

Granted, the number of yearly citations in 2018 and 2019 [to date] was lower than in the six preceding years. This can largely be explained by the fact that Elsevier stopped publishing the Procedia Social and Behavioral Sciences and Procedia Engineering conference proceedings, the two publications that had been the major source of citations for the phantom reference. However as the screenshot below shows the number of journal articles citing the phantom reference has remained fairly stable. In fact it is higher in the last two years than in most other years. So - as in the original white paper - I continued my detective work to verify how the phantom reference was used in these articles.

How is the phantom reference used in journal articles?

Out of the 19 articles citing the phantom reference in 2018 and 2019, I was only able to acces 11 in full text. Three articles - both of which used author name/year referencing in the text - listed the phantom reference in their list of references, but didn't actually cite it in the article. The remaining 8 articles not only included the phantom reference in their list of references, but also used it to substantiate a particular statement in the article. However, as can be clearly seen in the screenshots below, the phantom reference had absolutely nothing to do with any of the statements it was purported to support. In the first six articles the phantom reference was listed as [1] in the list of references. In the final two articles it was listed alphabetically.

Article 1: Article in Corrosion Science

Article 2: Article in Spectrochimica Acta Part A-Molecular & Biomolecular Spectroscopy

Article 3: Article in Journal of Photochemistry and Photobiology A-Chemistry

Article 4: Article in Journal of Scientometric Research

Article 5: Article in the journal Life Sciences

Article 6: Article in University Politehnica of Bucharest Scientific Bulletin Series B-Chemistry and Materials Science

Article 7: Article in Pattern Recognition Letters

In this article the references were listed alphabetically even though a numbered referencing style was used; the phantom reference was listed as [13] in the list of references.

Article 8: Article in International Journal of Integrated Engineering

In this article the references were listed alphabetically even though a numbered referencing style was used; the phantom reference was listed as [18] in the list of references.

The verdict

As far as I have been able to verify the phantom reference was included in the list of references of all of nineteen journal articles; even for the eight articles for which I could access the full text I could access the references. Out of the remaining eleven articles, the reference wasn't actually referenced in the main text in three. This is fairly easy to explain as naive or sloppy authors might simply leave the example reference in their list of references by accident. Without proper quality control this would not be spotted in the copy-editing process.

What continues to baffle me is how the phantom reference came to be used to support statements that were clearly completely unrelated to it. My hunch is that it might be a combination of “anonymised” referencing through the system of numbered referencing, which makes spotting errors harder for authors and editors, and possibly a bug in the typesetting or proofing software used.

In sum

As I mentioned in the original white paper, the mystery of the phantom reference ultimately had a very simple explanation: sloppy writing and sloppy quality control. An academic incentive system that makes publication in Web of Science listed conference proceedings popular invoked the law of big numbers. Thus the actual number of mistakes rose to be high enough to be noticeable, even though the mistake was only committed by a fraction of the authors.

In a way we can be glad that our phantom reference IS a phantom reference. If this had been an existing publication, the mistakes might have had far more serious consequences. Five hundred inaccurate citations might be a drop in the ocean in a sea of hundreds of thousands of publications. However, for many individual authors five hundred citations might make the difference between a mediocre and a good citation record or getting a job or not.

Hence, the key conclusion I would draw is: be careful before taking unusual citation levels at face value. Do some due diligence, or let someone with bibliometric knowledge do so. If something looks fishy, it probably IS fishy!