There are powerful currents whipping up the metric tide. The HEFCE metrics report

Published July 9, 2015

This is very quick synopsis of the 500 pages of a report on the use of metrics in the assessment of research. It’s by far the most thorough bit of work I’ve seen on the topic. It was written by a group, chaired by James Wilsdon, to investigate the possible role of metrics in the assessment of research.

The report starts with a bang. The foreword says

"Too often, poorly designed evaluation criteria are “dominating minds, distorting behaviour and determining careers.”1 At their worst, metrics can contribute to what Rowan Williams, the former Archbishop of Canterbury, calls a “new barbarity” in our universities."

"The tragic case of Stefan Grimm, whose suicide in September 2014 led Imperial College to launch a review of its use of performance metrics, is a jolting reminder that what’s at stake in these debates is more than just the design of effective management systems."

"Metrics hold real power: they are constitutive of values, identities and livelihoods "

And the conclusions (page 12 and Chapter 9.5) are clear that metrics alone can measure neither the quality of research, nor its impact.

"no set of numbers,however broad, is likely to be able to capture the multifaceted and nuanced judgements on the quality of research outputs that the REF process currently provides"

"Similarly, for the impact component of the REF, it is not currently feasible to use quantitative indicators in place of narrative impact case studies, or the impact template"

These conclusions are justified in great detail in 179 pages of the main report, 200 pages of the literature review, and 87 pages of Correlation analysis of REF2014 scores and metrics

The correlation analysis shows clearly that, contrary to some earlier reports, all of the many metrics that are considered predict the outcome of the 2014 REF far too poorly to be used as a substitute for reading the papers.

There is the inevitable bit of talk about the "judicious" use of metrics tp support peer review (with no guidance about what judicious use means in real life) but this doesn’t detract much from an excellent and thorough job.

Needless to say, I like these conclusions since they are quite similar to those recommended in my submission to the report committee, over a year ago.

Of course peer review is itself fallible. Every year about 8 million researchers publish 2.5 million articles in 28,000 peer-reviewed English language journals (STM report 2015 and graphic, here). It’s pretty obvious that there are not nearly enough people to review carefully such vast outputs. That’s why I’ve said that any paper, however bad, can now be printed in a journal that claims to be peer-reviewed. Nonetheless, nobody has come up with a better system, so we are stuck with it.

It’s certainly possible to judge that some papers are bad. It’s possible, if you have enough expertise, to guess whether or not the conclusions are justified. But no method exists that can judge what the importance of a paper will be in 10 or 20 year’s time. I’d like to have seen a frank admission of that.

If the purpose of research assessment is to single out papers that will be considered important in the future, that job is essentially impossible. From that point of view, the cost of research assessment could be reduced to zero by trusting people to appoint the best people they can find, and just give the same amount of money to each of them. I’m willing to bet that the outcome would be little different. Departments have every incentive to pick good people, and scientists’ vanity is quite sufficient motive for them to do their best.

Such a radical proposal wasn’t even considered in the report, which is a pity. Perhaps they were just being realistic about what’s possible in the present climate of managerialism.

Other recommendations include

"HEIs should consider signing up to the San Francisco Declaration on Research Assessment (DORA)"

4. "Journal-level metrics, such as the Journal Impact Factor (JIF), should not be used."

It’s astonishing that it should be still necessary to deplore the JIF almost 20 years after it was totally discredited. Yet it still mesmerizes many scientists. I guess that shows just how stupid scientists can be outside their own specialist fields.

DORA has over 570 organisational and 12,300 individual signatories, BUT only three universities in the UK have signed (Sussex, UCL and Manchester). That’s a shocking indictment of the way (all the other) universities are run.

One of the signatories of DORA is the Royal Society.

"The RS makes limited use of research metrics in its work. In its publishing activities, ever since it signed DORA, the RS has removed the JIF from its journal home pages and marketing materials, and no longer uses them as part of its publishing strategy. As authors still frequently ask about JIFs, however, the RS does provide them, but only as one of a number of metrics".

That’s a start. I’ve advocated making it a condition to get any grant or fellowship, that the university should have signed up to DORA and Athena Swan (with checks to make sure they are actually obeyed).

And that leads on naturally to one of the most novel and appealing recommendations in the report.

"A blog will be set up at http://www.ResponsibleMetrics.org
The site will celebrate responsible practices, but also name and shame bad practices when they occur"

"every year we will award a “Bad Metric” prize to the most
egregious example of an inappropriate use of quantitative indicators in research management."

This should be really interesting. Perhaps I should open a book for which university is the first to win "Bad Metric" prize.

The report covers just about every aspect of research assessment: perverse incentives, whether to include author self-citations, normalisation of citation impact indicators across fields and what to do about the order of authors on multi-author papers.

It’s concluded that there are no satisfactory ways of doing any of these things. Those conclusions are sometimes couched in diplomatic language which may, uh, reduce their impact, but they are clear enough.

The perverse incentives that are imposed by university rankings are considered too. They are commercial products and if universities simply ignored them, they’d vanish. One important problem with rankings is that they never come with any assessment of their errors. It’s been known how to do this at least since Goldstein & Spiegelhalter (1996, League Tables and Their Limitations: Statistical Issues in Comparisons Institutional Performance). Commercial producers of rankings don’t do it, because to do so would reduce the totally spurious impression of precision in the numbers they sell. Vice-chancellors might bully staff less if they knew that the changes they produce are mere random errors.

Metrics, and still more altmetrics, are far too crude to measure the quality of science. To hope to do that without reading the paper is pie in the sky (even reading it, it’s often impossible to tell).

The only bit of the report that I’m not entirely happy about is the recommendation to spend more money investigating the metrics that the report has just debunked. It seems to me that there will never be a way of measuring the quality of work without reading it. To spend money on a futile search for new metrics would take money away from science itself. I’m not convinced that it would be money well-spent.

Follow-up

Tagged altmetrics, bibliometrics, HEFCE, metrics, Universities, vice-chancellors

3 Responses to There are powerful currents whipping up the metric tide. The HEFCE metrics report

robbo says:

July 9, 2015 at 08:02

The beginning of the second enlightenment?

Loading...

Reply
nebuer says:

July 10, 2015 at 21:02

Athena Swan suffers from being a very weak scheme, at least when compared to an institutions obligations under the Equality Act (2010). This would apply to both research councils (and possibly other organisations, given the “match funding” mechanism for charities) and HEI’s, who are obliged to eliminate unlawful discrimination and advance equality of opportunity. Most Athena awards simply involve collecting some statistics involving gender, rather than addressing the needs of other protected groups (e.g those with disabilities, including mental health problems in any cases), or even coming close to meeting the standards under that Act (in reality, a gold award under Athena would be roughly equivalent in so far as gender is concerned).

Athena is not the answer. As well as advocating a very low standard, it is based upon self assessment, rather than enforcement and assessment by someone qualified to do so. Instead, rigorously enforcing the standards under the Equality Act, which are on the whole very sensible, seems to be an appropriate route forwards. This would mean removing funding from HEI’s and departments who cannot prove they meet the standards, as well as an appropriate independent complaints process (much like the First Tier Tribunal) that can remove academics, no matter how senior, from having actual power if they choose not to meet these standards. Finally, academics should be assessed as the public officials that they are, and how they direct their efforts towards advancing society, rather than how many points they accrue in one form or other. Those who can contribute outside such a framework belong in industry, rather than wasting a valuable academic post that could go to a genuine independent thinker (which all academics ought to be, but perhaps few actually are).

Loading...

Reply
David Colquhoun says:

July 11, 2015 at 06:58

Thanks for that excellent comment. I’m under no illusions about the value of signing bits of paper. But it would be a start. It is shocking that so few universities have signed DORA, never mind actually complied with it.

Loading...

Reply