Should metrics be used to assess research performance? A submission to HEFCE

Published June 18, 2014

The Higher Education Funding Council England (HEFCE) gives money to universities. The allocation that a university gets depends strongly on the periodical assessments of the quality of their research. Enormous amounts if time, energy and money go into preparing submissions for these assessments, and the assessment procedure distorts the behaviour of universities in ways that are undesirable. In the last assessment, four papers were submitted by each principal investigator, and the papers were read.

In an effort to reduce the cost of the operation, HEFCE has been asked to reconsider the use of metrics to measure the performance of academics. The committee that is doing this job has asked for submissions from any interested person, by June 20th.

This post is a draft for my submission. I’m publishing it here for comments before producing a final version for submission.

Draft submission to HEFCE concerning the use of metrics.

I’ll consider a number of different metrics that have been proposed for the assessment of the quality of an academic’s work.

Impact factors

The first thing to note is that HEFCE is one of the original signatories of DORA (http://am.ascb.org/dora/ ). The first recommendation of that document is

:"Do not use journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual scientist’s contributions, or in hiring, promotion, or funding decisions"

.Impact factors have been found, time after time, to be utterly inadequate as a way of assessing individuals, e.g. [1], [2]. Even their inventor, Eugene Garfield, says that. There should be no need to rehearse yet again the details. If HEFCE were to allow their use, they would have to withdraw from the DORA agreement, and I presume they would not wish to do this.

Article citations

Citation counting has several problems. Most of them apply equally to the H-index.

Citations may be high because a paper is good and useful. They equally may be high because the paper is bad. No commercial supplier makes any distinction between these possibilities. It would not be in their commercial interests to spend time on that, but it’s critical for the person who is being judged. For example, Andrew Wakefield’s notorious 1998 paper, which gave a huge boost to the anti-vaccine movement had had 758 citations by 2012 (it was subsequently shown to be fraudulent).
Citations take far too long to appear to be a useful way to judge recent work, as is needed for judging grant applications or promotions. This is especially damaging to young researchers, and to people (particularly women) who have taken a career break. The counts also don’t take into account citation half-life. A paper that’s still being cited 20 years after it was written clearly had influence, but that takes 20 years to discover,
The citation rate is very field-dependent. Very mathematical papers are much less likely to be cited, especially by biologists, than more qualitative papers. For example, the solution of the missed event problem in single ion channel analysis [3,4] was the sine qua non for all our subsequent experimental work, but the two papers have only about a tenth of the number of citations of subsequent work that depended on them.
Most suppliers of citation statistics don’t count citations of books or book chapters. This is bad for me because my only work with over 1000 citations is my 105 page chapter on methods for the analysis of single ion channels [5], which contained quite a lot of original work. It has had 1273 citations according to Google scholar but doesn’t appear at all in Scopus or Web of Science. Neither do the 954 citations of my statistics text book [6]
There are often big differences between the numbers of citations reported by different commercial suppliers. Even for papers (as opposed to book articles) there can be a two-fold difference between the number of citations reported by Scopus, Web of Science and Google Scholar. The raw data are unreliable and commercial suppliers of metrics are apparently not willing to put in the work to ensure that their products are consistent or complete.
Citation counts can be (and already are being) manipulated. The easiest way to get a large number of citations is to do no original research at all, but to write reviews in popular areas. Another good way to have ‘impact’ is to write indecisive papers about nutritional epidemiology. That is not behaviour that should command respect.
Some branches of science are already facing something of a crisis in reproducibility [7]. One reason for this is the perverse incentives which are imposed on scientists. These perverse incentives include the assessment of their work by crude numerical indices.
“Gaming” of citations is easy. (If students do it it’s called cheating: if academics do it is called gaming.) If HEFCE makes money dependent on citations, then this sort of cheating is likely to take place on an industrial scale. Of course that should not happen, but it would (disguised, no doubt, by some ingenious bureaucratic euphemisms).
For example, Scigen is a program that generates spoof papers in computer science, by stringing together plausible phases. Over 100 such papers have been accepted for publication. By submitting many such papers, the authors managed to fool Google Scholar in to awarding the fictitious author an H-index greater than that of Albert Einstein http://en.wikipedia.org/wiki/SCIgen
The use of citation counts has already encouraged guest authorships and such like marginally honest behaviour. There is no way to tell with an author on a paper has actually made any substantial contribution to the work, despite the fact that some journals ask for a statement about contribution.
It has been known for 17 years that citation counts for individual papers are not detectably correlated with the impact factor of the journal in which the paper appears [1]. That doesn’t seem to have deterred metrics enthusiasts from using both. It should have done.

Given all these problems, it’s hard to see how citation counts could be useful to the REF, except perhaps in really extreme cases such as papers that get next to no citations over 5 or 10 years.

The H-index

This has all the disadvantages of citation counting, but in addition it is strongly biased against young scientists, and against women. This makes it not worth consideration by HEFCE.

Altmetrics

Given the role given to “impact” in the REF, the fact that altmetrics claim to measure impact might make them seem worthy of consideration at first sight. One problem is that the REF failed to make a clear distinction between impact on other scientists is the field and impact on the public.

Altmetrics measures an undefined mixture of both sorts if impact, with totally arbitrary weighting for tweets, Facebook mentions and so on. But the score seems to be related primarily to the trendiness of the title of the paper. Any paper about diet and health, however poor, is guaranteed to feature well on Twitter, as will any paper that has ‘penis’ in the title.

It’s very clear from the examples that I’ve looked at that few people who tweet about a paper have read more than the title. See Why you should ignore altmetrics and other bibliometric nightmares [8].

In most cases, papers were promoted by retweeting the press release or tweet from the journal itself. Only too often the press release is hyped-up. Metrics not only corrupt the behaviour of academics, but also the behaviour of journals. In the cases I’ve examined, reading the papers revealed that they were particularly poor (despite being in glamour journals): they just had trendy titles [8].

There could even be a negative correlation between the number of tweets and the quality of the work. Those who sell altmetrics have never examined this critical question because they ignore the contents of the papers. It would not be in their commercial interests to test their claims if the result was to show a negative correlation. Perhaps the reason why they have never tested their claims is the fear that to do so would reduce their income.

Furthermore you can buy 1000 retweets for $8.00 http://followers-and-likes.com/twitter/buy-twitter-retweets/ That’s outright cheating of course, and not many people would go that far. But authors, and journals, can do a lot of self-promotion on twitter that is totally unrelated to the quality of the work.

It’s worth noting that much good engagement with the public now appears on blogs that are written by scientists themselves, but the 3.6 million views of my blog do not feature in altmetrics scores, never mind Scopus or Web of Science. Altmetrics don’t even measure public engagement very well, never mind academic merit.

Evidence that metrics measure quality

Any metric would be acceptable only if it measured the quality of a person’s work. How could that proposition be tested? In order to judge this, one would have to take a random sample of papers, and look at their metrics 10 or 20 years after publication. The scores would have to be compared with the consensus view of experts in the field. Even then one would have to be careful about the choice of experts (in fields like alternative medicine for example, it would be important to exclude people whose living depended on believing in it). I don’t believe that proper tests have ever been done (and it isn’t in the interests of those who sell metrics to do it).

The great mistake made by almost all bibliometricians is that they ignore what matters most, the contents of papers. They try to make inferences from correlations of metric scores with other, equally dubious, measures of merit. They can’t afford the time to do the right experiment if only because it would harm their own “productivity”.

The evidence that metrics do what’s claimed for them is almost non-existent. For example, in six of the ten years leading up to the 1991 Nobel prize, Bert Sakmann failed to meet the metrics-based publication target set by Imperial College London, and these failures included the years in which the original single channel paper was published [9] and also the year, 1985, when he published a paper [10] that was subsequently named as a classic in the field [11]. In two of these ten years he had no publications whatsoever. See also [12].

Application of metrics in the way that it’s been done at Imperial and also at Queen Mary College London, would result in firing of the most original minds.

Gaming and the public perception of science

Every form of metric alters behaviour, in such a way that it becomes useless for its stated purpose. This is already well-known in economics, where it’s know as Goodharts’s law http://en.wikipedia.org/wiki/Goodhart’s_law “"When a measure becomes a target, it ceases to be a good measure”. That alone is a sufficient reason not to extend metrics to science. Metrics have already become one of several perverse incentives that control scientists’ behaviour. They have encouraged gaming, hype, guest authorships and, increasingly, outright fraud [13].

The general public has become aware of this behaviour and it is starting to do serious harm to perceptions of all science. As long ago as 1999, Haerlin & Parr [14] wrote in Nature, under the title How to restore Public Trust in Science,

“Scientists are no longer perceived exclusively as guardians of objective truth, but also as smart promoters of their own interests in a media-driven marketplace.”

And in January 17, 2006, a vicious spoof on a Science paper appeared, not in a scientific journal, but in the New York Times. See https://www.dcscience.net/?p=156

The use of metrics would provide a direct incentive to this sort of behaviour. It would be a tragedy not only for people who are misjudged by crude numerical indices, but also a tragedy for the reputation of science as a whole.

Conclusion

There is no good evidence that any metric measures quality, at least over the short time span that’s needed for them to be useful for giving grants or deciding on promotions). On the other hand there is good evidence that use of metrics provides a strong incentive to bad behaviour, both by scientists and by journals. They have already started to damage the public perception of science of the honesty of science.

The conclusion is obvious. Metrics should not be used to judge academic performance.

What should be done?

If metrics aren’t used, how should assessment be done? Roderick Floud was president of Universities UK from 2001 to 2003. He’s is nothing if not an establishment person. He said recently:

“Each assessment costs somewhere between £20 million and £100 million, yet 75 per cent of the funding goes every time to the top 25 universities. Moreover, the share that each receives has hardly changed during the past 20 years.
It is an expensive charade. Far better to distribute all of the money through the research councils in a properly competitive system.”

The obvious danger of giving all the money to the Research Councils is that people might be fired solely because they didn’t have big enough grants. That’s serious -it’s already happened at Kings College London, Queen Mary London and at Imperial College. This problem might be ameliorated if there were a maximum on the size of grants and/or on the number of papers a person could publish, as I suggested at the open data debate. And it would help if univerities appointed vice-chancellors with a better long term view than most seem to have at the moment.

Aggregate metrics? It’s been suggested that the problems are smaller if one looks at aggregated metrics for a whole department. rather than the metrics for individual people. Clearly looking at departments would average out anomalies. The snag is that it wouldn’t circumvent Goodhart’s law. If the money depended on the aggregate score, it would still put great pressure on universities to recruit people with high citations, regardless of the quality of their work, just as it would if individuals were being assessed. That would weigh against thoughtful people (and not least women).

The best solution would be to abolish the REF and give the money to research councils, with precautions to prevent people being fired because their research wasn’t expensive enough. If politicians insist that the "expensive charade" is to be repeated, then I see no option but to continue with a system that’s similar to the present one: that would waste money and distract us from our job.

1. Seglen PO (1997) Why the impact factor of journals should not be used for evaluating research. British Medical Journal 314: 498-502. [Download pdf]

2. Colquhoun D (2003) Challenging the tyranny of impact factors. Nature 423: 479. [Download pdf]

3. Hawkes AG, Jalali A, Colquhoun D (1990) The distributions of the apparent open times and shut times in a single channel record when brief events can not be detected. Philosophical Transactions of the Royal Society London A 332: 511-538. [Get pdf]

4. Hawkes AG, Jalali A, Colquhoun D (1992) Asymptotic distributions of apparent open times and shut times in a single channel record allowing for the omission of brief events. Philosophical Transactions of the Royal Society London B 337: 383-404. [Get pdf]

5. Colquhoun D, Sigworth FJ (1995) Fitting and statistical analysis of single-channel records. In: Sakmann B, Neher E, editors. Single Channel Recording. New York: Plenum Press. pp. 483-587.

6. David Colquhoun on Google Scholar. Available: http://scholar.google.co.uk/citations?user=JXQ2kXoAAAAJ&hl=en17-6-2014

7. Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124.[full text]

8. Colquhoun D, Plested AJ Why you should ignore altmetrics and other bibliometric nightmares. Available: https://www.dcscience.net/?p=6369

9. Neher E, Sakmann B (1976) Single channel currents recorded from membrane of denervated frog muscle fibres. Nature 260: 799-802.

10. Colquhoun D, Sakmann B (1985) Fast events in single-channel currents activated by acetylcholine and its analogues at the frog muscle end-plate. J Physiol (Lond) 369: 501-557. [Download pdf]

11. Colquhoun D (2007) What have we learned from single ion channels? J Physiol 581: 425-427.[Download pdf]

12. Colquhoun D (2007) How to get good science. Physiology News 69: 12-14. [Download pdf] See also https://www.dcscience.net/?p=182

13. Oransky, I. Retraction Watch. Available: http://retractionwatch.com/18-6-2014

14. Haerlin B, Parr D (1999) How to restore public trust in science. Nature 400: 499. 10.1038/22867 [doi].[Get pdf]

Follow-up

Some other posts on this topic

Why Metrics Cannot Measure Research Quality: A Response to the HEFCE Consultation

Gaming Google Scholar Citations, Made Simple and Easy

Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting

Driving Altmetrics Performance Through Marketing

Death by Metrics (October 30, 2013)

Not everything that counts can be counted

Using metrics to assess research quality By David Spiegelhalter “I am strongly against the suggestion that peer–review can in any way be replaced by bibliometrics”

1 July 2014

My brilliant statistical colleague, Alan Hawkes, not only laid the foundations for single molecule analysis (and made a career for me) . Before he got into that, he wrote a paper, Spectra of some self-exciting and mutually exciting point processes, (Biometrika 1971). In that paper he described a sort of stochastic process now known as a Hawkes process. In the simplest sort of stochastic process, the Poisson process, events are independent of each other. In a Hawkes process, the occurrence of an event affects the probability of another event occurring, so, for example, events may occur in clusters. Such processes were used for many years to describe the occurrence of earthquakes. More recently, it’s been noticed that such models are useful in finance, marketing, terrorism, burglary, social media, DNA analysis, and to describe invasive banana trees. The 1971 paper languished in relative obscurity for 30 years. Now the citation rate has shot threw the roof.

hawkes

The papers about Hawkes processes are mostly highly mathematical. They are not the sort of thing that features on twitter. They are serious science, not just another ghastly epidemiological survey of diet and health. Anybody who cites papers of this sort is likely to be a real scientist. The surge in citations suggests to me that the 1971 paper was indeed an important bit of work (because the citations will be made by serious people). How does this affect my views about the use of citations? It shows that even highly mathematical work can achieve respectable citation rates, but it may take a long time before their importance is realised. If Hawkes had been judged by citation counting while he was applying for jobs and promotions, he’d probably have been fired. If his department had been judged by citations of this paper, it would not have scored well. It takes a long time to judge the importance of a paper and that makes citation counting almost useless for decisions about funding and promotion.

Tagged Academia, altmetrics, assessment, bibliometrics, Research Councils, Research Funding | 13 Comments

Nonsense about “research impact”. The Research Councils are as much a problem as the government

Published December 5, 2010

Jump to follow-up

Research quangos lead to mediocrity is the headline title of a letter to The Times appeared on 6 December 2010. It is reproduced below for those who can’t (or won’t) pay Rupert Murdoch to see it.

The letter is about the current buzzword, "research impact", a term that trips off the lips of every administrator and politician daily. Since much research is funded by the taxpayer, it seems reasonable to ask if it gives value for money. The best answer can be found in St Paul’s cathedral.

The plaque for Christopher Wren bears the epitaph

LECTOR, SI MONUMENTUM REQUIRIS, CIRCUMSPICE.

Reader, if you seek his memorial – look around you.

Much the same could be said for the impact of any science. Look at your refrigerator, your mobile phone, your computer, your central heating boiler, your house. Look at the X-ray machine and MRI machines in your hospital. Look at the aircraft that takes you on holiday. Look at your DVD player and laser surgery. Look, even, at the way you can turn a switch and light your room. Look at almost anything that you take for granted in your everyday life, They are all products of science; products, eventually, of the enlightenment.

BUT remember also that these wonderful products did not appear overnight. They evolved slowly over many decades or even centuries, and they evolved from work that, at the time, appeared to be mere idle curiosity. Electricity lies at the heart of everyday life. It took almost 200 years to get from Michael Faraday’s coils to your mobile phone. At the time, Faraday’s work seemed to politicians to be useless. Michael Faraday was made a fellow of the Royal Society in 1824.

. . . after Faraday was made a fellow of the Royal Society[,] the prime minister of the day asked what good this invention could be, and Faraday answered: “Why, Prime Minister, someday you can tax it.”

Whether this was really said is doubtful, but that hardly matters. It is the sort of remark made by politicians every day.

In May 2008, I read a review of ”The myths of Innovation” by Scott Berkun. The review seems to have vanished from the web, but I noted it in diary. These words should be framed on the wall of every politician and administrator. Here are some quotations.

“One myth that will disappoint most businesses is the idea that innovation can be managed. Actually, Berkun calls this one ‘Your boss knows more about innovation than you’. After all, he says, many people get their best ideas while they’re wandering in their bathrobes, filled coffee mug in hand, from the kitchen to their home PC on a day off rather than sitting in a cubicle in a suit during working hours. But professional managers can’t help it: their job is to control every variable as much as possible, and that includes innovation.”

“Creation is sloppy; discovery is messy; exploration is dangerous. What’s a manager to do?
The answer in general is to encourage curiosity and accept failure. Lots of failure.”

I commented at the time "What a pity that university managers are so far behind those of modern businesses. They seem to be totally incapable of understanding these simple truths. That is what happens when power is removed from people who know about research and put into the hands of lawyers, HR people, MBAs and failed researchers."

That is even more true two years later. The people who actually do research have been progressively disempowered. We are run by men in dark suits who mistake meetings for work. You have only to look at history to see that great discoveries arise from the curiosity of creative people, and that,. rarely, these ideas turn out to be of huge economic importance, many decades later.

The research impact plan, has been now renamed "Pathways to Impact". It means that scientists are being asked to explain the economic impact of their research before they have even got any results.

All that shows is how science is being run by dimwits who simply don’t understand how science works. This amounts to nothing less than being compelled to lie if you want any research funding. And, worse stiil, the pressure to lie comes not primarily from government, but from that curious breed of ex-scientists, failed scientists and non-scientists who control the Research Councils.

How much did RCUK pay for the silly logo?

We are being run by people who would have told Michael Faraday to stop messing about with wires and coils and to do something really useful, like inventing better leather washers for steam pumps.

Welcome to the third division. Brought to you be Research Counclls and politicians.

Here is the letter in The Times. It is worded slightly more diplomatically than my commentary. but will, no doubt, have just as little effect. What would the signatories know about science? Several off them don’t even wear black suits.

Sir,

The governance of UK academic research today is delegated to a quangocracy comprising 11 funding and research councils, and to an additional body – Research Councils UK. Ill considered changes over the past few decades have transformed what was arguably the world’s most creative academic sector into one often described nowadays as merely competitive.

In their latest change, research councils introduce a new criterion for judging proposals – “Pathways to Impact” – against which individual researchers applying for funds must identify who might benefit from their proposed research and how they might benefit. Furthermore, the funding councils are planning to begin judging researchers’ departments in 2014 on the actual benefits achieved and to adjust their funding accordingly, thereby increasing pressure on researchers to deliver short-term benefits. However, we cannot understand why the quangocracy has ignored abundant evidence showing that the outcomes of high-quality research are impossible to predict.

We are mindful of the need to justify investment in academic research, but “Pathways to Impact” focuses on the predictable, leads to mediocrity, and reduces returns to the taxpayer. In our opinion as experienced researchers, few if any of the 20th century’s great discoveries and their huge economic stimuli could have happened if a policy of focussing on attractive short-term benefits had applied because great discoveries are always unpredicted. We therefore have an acutely serious problem.

Abolishing “Pathways to Impact” would not only save the expense of its burgeoning bureaucracy; it would also be a step towards liberating creativity and indicate that policy-makers have at last regained their capacity for world-class thinking.

Donald W Braben
University College London,
And the following scientists who also sign in a personal capacity:

John F Allen, Queen Mary, University of London;
William Amos, University of Cambridge;
Michael Ashburner FRS, University of Cambridge;
Jonathan Ashmore FRS, University College London;
Tim Birkhead FRS, University of Sheffield;
Mark S Bretscher FRS, MRC Laboratory of Molecular Biology, Cambridge;
Peter Cameron, Queen Mary, University of London;
Richard S Clymo, Queen Mary, University of London;
Richard Cogdell FRS, University of Glasgow;
David Colquhoun FRS, University College London;

Adam Curtis, Glasgow University;
John Dainton FRS, University of Liverpool;
Felipe Fernandez-Armesto, University of Notre Dame;
Pat Heslop-Harrison, University of Leicester;
Dudley Herschbach, Harvard University, Nobel Laureate;
Herbert Huppert FRS, University of Cambridge;
H Jeff Kimble, Caltech, US National Academy of Sciences;
Sir Harry Kroto FRS, Florida State University, Tallahassee, Nobel Laureate;
James Ladyman, University of Bristol;
Michael F Land FRS, University of Sussex;

Peter Lawrence FRS, University of Cambridge;
Sir Anthony Leggett FRS, University of Illinois at Urbana-Champaign, Nobel Laureate;
Angus MacIntyre FRS, Queen Mary, University of London;
Sotiris Missailidis, Open University;
Philip Moriarty, University of Nottingham;
Andrew Oswald, University of Warwick;
Lawrence Paulson, University of Cambridge;
Iain Pears, Oxford;
Beatrice Pelloni, University of Reading;
Douglas Randall, University of Missouri, US National Science Board member;

David Ray, BioAstral Limited;
Sir Richard J Roberts FRS, New England Biolabs, Nobel Laureate;
Ian Russell FRS, University of Sussex;
Ken Seddon, Queen’s University of Belfast;
Steve Sparks FRS, University of Bristol;
Harry Swinney, University of Texas, US National Academy of Sciences;
Iain Stewart, University of Durham;
Claudio Vita-Finzi, Natural History Museum;
David Walker FRS, University of Sheffield;
Glynn Winskel, University of Cambridge;

Lewis Wolpert FRS, University College London;
Phil Woodruff FRS, University of Warwick.

Now cheer yourself up by reading Captain Cook’s Grant Application.

Follow-up

Scientists should sign the petition to help humanities too. See the Humanities and Social Sciences Matter web site.

Nobel view. 1. Andre Geim’s speech at Nobel banquet, 2010

"Human progress has always been driven by a sense of adventure and unconventional thinking. But amidst calls for “bread and circuses”, these virtues are often forgotten for the sake of cautiousness and political correctness that now rule the world. And we sink deeper and deeper from democracy into a state of mediocrity and even idiocracy. If you need an example, look no further than at research funding by the European Commission."

Nobel view. 2. Ahmed Zewail won the 1999 Nobel Prize in Chemistry. He serves on Barack Obama’s Council of Advisors on Science and Technology. He wrote in Nature

“Beware the urge to direct research too closely, says Nobel laureate Ahmed Zewail. History teaches us the value of free scientific inquisitiveness.”

“I have emphasized that without solid investment in science education and a fundamental science base, nations will not acquire the ground-breaking knowledge required to make discoveries and innovations that will shape their future.”

“Preserving knowledge is easy. Transferring knowledge is also easy. But making new knowledge is neither easy nor profitable in the short term. Fundamental research proves profitable in the long run, and, as importantly, it is a force that enriches the culture of any society with reason and basic truth.”

How many more people have to say this before the Research Councils take some notice?

Tagged acupuncture, Impact, Research Councils, Universities | 43 Comments

Should metrics be used to assess research performance? A submission to HEFCE

Draft submission to HEFCE concerning the use of metrics.

Follow-up

Like this:

Nonsense about “research impact”. The Research Councils are as much a problem as the government

Follow-up

Like this:

Research Councils

Should metrics be used to assess research performance? A submission to HEFCE

Draft submission to HEFCE concerning the use of metrics.

Follow-up

Share this:

Like this:

Nonsense about “research impact”. The Research Councils are as much a problem as the government

Follow-up

Share this:

Like this: