Chalkdust is a magazine published by students of maths from UCL Mathematics department. Judging by its first issue, it’s an excellent vehicle for popularisation of maths. I have a piece in the second issue
The piece started out as another exposition of the interpretation of P values, but the whole of the first part turned into an explanation of the principles of randomisation tests. It beats me why anybody still does a Student’s t test. The idea of randomisation tests is very old. They are as powerful as t tests when the assumptions of the latter are fulfilled but a lot better when the assumptions are wrong (in the jargon, they are uniformly-most-powerful tests).
Not only that, but you need no mathematics to do a randomisation test, whereas you need a good deal of mathematics to follow Student’s 1908 paper. And the randomisation test makes transparently clear that random allocation of treatments is a basic and essential assumption that’s necessary for the the validity of any test of statistical significance.
I made a short video that explains the principles behind the randomisation tests, to go with the printed article (a bit of animation always helps).
When I first came across the principals of randomisation tests, i was entranced by the simplicity of the idea. Chapters 6 – 9 of my old textbook were written to popularise them. You can find much more detail there.
In fact it’s only towards the end that I reiterate the idea that P values don’t answer the question that experimenters want to ask, namely:- if I claim I have made a discovery because P is small, what’s the chance that I’ll be wrong?
If you want the full story on that, read my paper. The story it tells is not very original, but it still isn’t known to most experimenters (because most statisticians still don’t teach it on elementary courses). The paper must have struck a chord because it’s had over 80,000 full text views and more than 10,000 pdf downloads. It reached an altmetric score of 975 (since when it has been mysteriously declining). That’s gratifying, but it is also a condemnation of the use of metrics. The paper is not original and it’s quite simple, yet it’s had far more "impact" than anything to do with my real work.
The R code for doing 2-sample randomisation tests
You can download a pdf file that describes the two R scripts. There are two different R programs.
One re-samples randomly a specified number of times (the default is 100,000 times, but you can do any number). Download two_sample_rantest.R
The other uses every possible sample -in the case of the two samples of 10 observations,it gives the distribution for all 184,756 ways of selecting 10 observations from 20. Download 2-sample-rantest-exact.R
The launch party
Today the people who organise Chalkdust magazine held a party in the mathematics department at UCL. The editorial director is a graduate student in maths, Rafael Prieto Curiel. He was, at one time in the Mexican police force (he said he’d suffered more crime in London than in Mexico City). He, and the rest of the team, are deeply impressive. They’ve done a terrific job. Support them.
The party cakes
Rafael Prieto doing the introduction
Rafael Prieto doing the introduction
Rafael Prieto and me
I got the T shirt
Decoding the T shirt
The top line is "I" because that’s the usual symbol for the square root of -1.
The second line is one of many equations that describe a heart shape. It can be plotted by calculating a matrix of values of the left hand side for a range of values of x and y. Then plot the contour for a values x and y for which the left hand side is equal to 1. Download R script for this. (Method suggested by Rafael Prieto Curiel.)
5 November 2015
The Mann-Whitney test
I was stimulated to write this follow-up because yesterday I was asked by a friend to comment on the fact that five different tests all gave identical P values, P = 0.0079. The paper in question was in Science magazine (see Fig. 1), so it wouldn’t surprise me if the statistics were done badly, but in this case there is an innocent explanation.
The Chalkdust article, and the video, are about randomisation tests done using the original observed numbers, so look at them before reading on. There is a more detailed explanation in Chapter 9 of Lectures on Biostatistics. Before it became feasible to do this sort of test, there was a simpler, and less efficient, version in which the observations were ranked in ascending order, and the observed values were replaced by their ranks. This was known as the Mann Whitney test. It had the virtue that because all the ‘observations’ were now integers, the number of possible results of resampling was limited so it was possible to construct tables to allow one to get a rough P value. Of course, replacing observations by their ranks throws away some information, and now that we have computers there is no need to use a Mann-Whitney test ever. But that’s what was used in this paper.
In the paper (Fig 1) comparisons are made between two groups (assumed to be independent) with 5 observations in each group. The 10 observations are just the ranks, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
To do the randomisation test we select 5 of these numbers at random for sample A, and the other 5 are sample B. (Of course this supposes that the treatments were applied randomly in the real experiment, which is unlikely to be true.) In fact there are only 10!/(5!.5!) = 252 possible ways to select a sample of 5 from 10, so it’s easy to list all of them. In the case where there is no overlap between the groups, one group will contain the smallest observations (ranks 1, 2, 3, 4, 5, and the other group will contain the highest observations, ranks 6, 7, 8, 9, 10.
In this case, the sum of the ‘observations’ in group A is 15, and the sum for group B is 40.These add to the sum of the first 10 integers, 10.(10+1)/2 = 55. The mean (which corresponds to a difference between means of zero) is 55/2 = 27.5.
There are two ways of getting an allocation as extreme as this (first group low, as above, or second group low, the other tail of the distribution). The two tailed P value is therefore 2/252 = 0.0079. This will be the result whenever the two groups don’t overlap, regardless of the numerical values of the observations. It’s the smallest P value the test can produce with 5 observations in each group.
The whole randomisation distribution looks like this
In this case, the abscissa is the sum of the ranks in sample A, rather than the difference between means for the two groups (the latter is easily calculated from the former). The red line shows the observed value, 15. There is only one way to get a total of 15 for group A: it must contain the lowest 5 ranks (group A = 1, 2, 3, 4, 5). There is also only one way to get a total of 16 (group A = 1, 2, 3, 4, 6),and there are two ways of getting a total of 17 (group A = 1, 2, 3, 4, 7, or 1, 2, 3, 5, 6), But there are 20 different ways of getting a sum of 27 or 28 (which straddle the mean, 27.5). The printout (.txt file) from the R program that was used to generate the distribution is as follows.
Randomisation test: exact calculation all possible samples
INPUTS: exact calculation: all possible samples
Result of t test
Some problems. Figure 1 alone shows 16 two-sample comparisons, but no correction for multiple comparisons seems to have been made. A crude Bonferroni correction would require replacement of a P = 0.05 threshold with P = 0.05/16 = 0.003. None of the 5 tests that gave P = 0.0079 reaches this level (of course the whole idea of a threshold level is absurd anyway).
Furthermore, even a single test that gave P = 0.0079 would be expected to have a false positive rate of around 10 percent
Today, 25 September, is the first anniversary of the needless death of Stefan Grimm. This post is intended as a memorial.
He should be remembered, in the hope that some good can come from his death.
On 1 December 2014, I published the last email from Stefan Grimm, under the title “Publish and perish at Imperial College London: the death of Stefan Grimm“. Since then it’s been viewed 196,000 times. The day after it was posted, the server failed under the load.
Since than, I posted two follow-up pieces. On December 23, 2014 “Some experiences of life at Imperial College London. An external inquiry is needed after the death of Stefan Grimm“. Of course there was no external inquiry.
And on April 9, 2015, after the coroner’s report, and after Imperial’s internal inquiry, "The death of Stefan Grimm was “needless”. And Imperial has done nothing to prevent it happening again".
The tragedy featured in the introduction of the HEFCE report on the use of metrics.
“The tragic case of Stefan Grimm, whose suicide in September 2014 led Imperial College to launch a review of its use of performance metrics, is a jolting reminder that what’s at stake in these debates is more than just the design of effective management systems.”
“Metrics hold real power: they are constitutive of values, identities and livelihoods ”
I had made no attempt to contact Grimm’s family, because I had no wish to intrude on their grief. But in July 2015, I received, out of the blue, a hand-written letter from Stefan Grimm’s mother. She is now 80 and living in Munich. I was told that his father, Dieter Grimm, had died of cancer when he was only 59. I also learned that Stefan Grimm was distantly related to Wilhelm Grimm, one of the Gebrüder Grimm.
The letter was very moving indeed. It said "Most of the infos about what happened in London, we got from you, what you wrote in the internet".
I responded as sympathetically as I could, and got a reply which included several of Stefan’s drawings, and then more from his sister. The drawings were done while he was young. They show amazing talent, but by the age of 25 he was too busy with science to expoit his artistic talents.
With his mother’s permission, I reproduce ten of his drawings here, as a memorial to a man who whose needless death was attributable to the very worst of the UK university system. He was killed by mindless and cruel "performance management", imposed by Imperial College London. The initial reaction of Imperial gave little hint of an improvement. I hope that their review of the metrics used to assess people will be a bit more sensible,
His real memorial lies in his published work, which continues to be cited regularly after his death.
His drawings are a reminder that there is more to human beings than getting grants. And that there is more to human beings than science.
Click the picture for an album of ten of his drawings. In the album there are also pictures of two books that were written for children by Stefan’s father, Dieter Grimm.
Dated Christmas eve,1979 (age 16)
Well well. It seems that Imperial are having an "HR Showcase: Supporting our people" on 15 October. And the introduction is being given by none other than Professor Martin Wilkins, the very person whose letter to Grimm must bear some responsibility for his death. I’ll be interested to hear whether he shows any contrition. I doubt whether any employees will dare to ask pointed questions at this meeting, but let’s hope they do.
This is very quick synopsis of the 500 pages of a report on the use of metrics in the assessment of research. It’s by far the most thorough bit of work I’ve seen on the topic. It was written by a group, chaired by James Wilsdon, to investigate the possible role of metrics in the assessment of research.
The report starts with a bang. The foreword says
"Too often, poorly designed evaluation criteria are “dominating minds, distorting behaviour and determining careers.”1 At their worst, metrics can contribute to what Rowan Williams, the former Archbishop of Canterbury, calls a “new barbarity” in our universities."
"The tragic case of Stefan Grimm, whose suicide in September 2014 led Imperial College to launch a review of its use of performance metrics, is a jolting reminder that what’s at stake in these debates is more than just the design of effective management systems."
"Metrics hold real power: they are constitutive of values, identities and livelihoods "
And the conclusions (page 12 and Chapter 9.5) are clear that metrics alone can measure neither the quality of research, nor its impact.
"no set of numbers,however broad, is likely to be able to capture the multifaceted and nuanced judgements on the quality of research outputs that the REF process currently provides"
"Similarly, for the impact component of the REF, it is not currently feasible to use quantitative indicators in place of narrative impact case studies, or the impact template"
These conclusions are justified in great detail in 179 pages of the main report, 200 pages of the literature review, and 87 pages of Correlation analysis of REF2014 scores and metrics
The correlation analysis shows clearly that, contrary to some earlier reports, all of the many metrics that are considered predict the outcome of the 2014 REF far too poorly to be used as a substitute for reading the papers.
There is the inevitable bit of talk about the "judicious" use of metrics tp support peer review (with no guidance about what judicious use means in real life) but this doesn’t detract much from an excellent and thorough job.
Needless to say, I like these conclusions since they are quite similar to those recommended in my submission to the report committee, over a year ago.
Of course peer review is itself fallible. Every year about 8 million researchers publish 2.5 million articles in 28,000 peer-reviewed English language journals (STM report 2015 and graphic, here). It’s pretty obvious that there are not nearly enough people to review carefully such vast outputs. That’s why I’ve said that any paper, however bad, can now be printed in a journal that claims to be peer-reviewed. Nonetheless, nobody has come up with a better system, so we are stuck with it.
It’s certainly possible to judge that some papers are bad. It’s possible, if you have enough expertise, to guess whether or not the conclusions are justified. But no method exists that can judge what the importance of a paper will be in 10 or 20 year’s time. I’d like to have seen a frank admission of that.
If the purpose of research assessment is to single out papers that will be considered important in the future, that job is essentially impossible. From that point of view, the cost of research assessment could be reduced to zero by trusting people to appoint the best people they can find, and just give the same amount of money to each of them. I’m willing to bet that the outcome would be little different. Departments have every incentive to pick good people, and scientists’ vanity is quite sufficient motive for them to do their best.
Such a radical proposal wasn’t even considered in the report, which is a pity. Perhaps they were just being realistic about what’s possible in the present climate of managerialism.
Other recommendations include
"HEIs should consider signing up to the San Francisco Declaration on Research Assessment (DORA)"
4. "Journal-level metrics, such as the Journal Impact Factor (JIF), should not be used."
It’s astonishing that it should be still necessary to deplore the JIF almost 20 years after it was totally discredited. Yet it still mesmerizes many scientists. I guess that shows just how stupid scientists can be outside their own specialist fields.
DORA has over 570 organisational and 12,300 individual signatories, BUT only three universities in the UK have signed (Sussex, UCL and Manchester). That’s a shocking indictment of the way (all the other) universities are run.
One of the signatories of DORA is the Royal Society.
"The RS makes limited use of research metrics in its work. In its publishing activities, ever since it signed DORA, the RS has removed the JIF from its journal home pages and marketing materials, and no longer uses them as part of its publishing strategy. As authors still frequently ask about JIFs, however, the RS does provide them, but only as one of a number of metrics".
That’s a start. I’ve advocated making it a condition to get any grant or fellowship, that the university should have signed up to DORA and Athena Swan (with checks to make sure they are actually obeyed).
And that leads on naturally to one of the most novel and appealing recommendations in the report.
"A blog will be set up at http://www.ResponsibleMetrics.org
"every year we will award a “Bad Metric” prize to the most
This should be really interesting. Perhaps I should open a book for which university is the first to win "Bad Metric" prize.
The report covers just about every aspect of research assessment: perverse incentives, whether to include author self-citations, normalisation of citation impact indicators across fields and what to do about the order of authors on multi-author papers.
It’s concluded that there are no satisfactory ways of doing any of these things. Those conclusions are sometimes couched in diplomatic language which may, uh, reduce their impact, but they are clear enough.
The perverse incentives that are imposed by university rankings are considered too. They are commercial products and if universities simply ignored them, they’d vanish. One important problem with rankings is that they never come with any assessment of their errors. It’s been known how to do this at least since Goldstein & Spiegelhalter (1996, League Tables and Their Limitations: Statistical Issues in Comparisons Institutional Performance). Commercial producers of rankings don’t do it, because to do so would reduce the totally spurious impression of precision in the numbers they sell. Vice-chancellors might bully staff less if they knew that the changes they produce are mere random errors.
Metrics, and still more altmetrics, are far too crude to measure the quality of science. To hope to do that without reading the paper is pie in the sky (even reading it, it’s often impossible to tell).
The only bit of the report that I’m not entirely happy about is the recommendation to spend more money investigating the metrics that the report has just debunked. It seems to me that there will never be a way of measuring the quality of work without reading it. To spend money on a futile search for new metrics would take money away from science itself. I’m not convinced that it would be money well-spent.
There is a widespread belief that science is going through a crisis of reproducibility. A meeting was held to discuss the problem. It was organised by Academy of Medical Sciences, the Wellcome Trust, MRC and BBSRC, and It was chaired by Dorothy Bishop (of whose blog I’m a huge fan). It’s good to see that scientific establishment is beginning to take notice. Up to now it’s been bloggers who’ve been making the running. I hadn’t intended to write a whole post about it, but some sufficiently interesting points arose that I’ll have a go.
The first point to make is that, as far as I know, the “crisis” is limited to, or at least concentrated in, quite restricted areas of science. In particular, it doesn’t apply to the harder end of sciences. Nobody in physics, maths or chemistry talks about a crisis of reproducibility. I’ve heard very little about irreproducibility in electrophysiology (unless you include EEG work). I’ve spent most of my life working on single-molecule biophysics and I’ve never encountered serious problems with irreproducibility. It’s a small and specialist field so I think if I would have noticed if it were there. I’ve always posted on the web our analysis programs, and if anyone wants to spend a year re-analysing it they are very welcome to do so (though I have been asked only once).
The areas that seem to have suffered most from irreproducibility are experimental psychology, some areas of cell biology, imaging studies (fMRI) and genome studies. Clinical medicine and epidemiology have been bad too. Imaging and genome studies seem to be in a slightly different category from the others. They are largely statistical problems that arise from the huge number of comparisons that need to be done. Epidemiology problems stem largely from a casual approach to causality. The rest have no such excuses.
The meeting was biased towards psychology, perhaps because that’s an area that has had many problems. The solutions that were suggested were also biased towards that area. It’s hard to see some of them could be applied to electrophysiology for example.
There was, it has to be said, a lot more good intentions than hard suggestions. Pre-registration of experiments might help a bit in a few areas. I’m all for open access and open data, but doubt they will solve the problem either, though I hope they’ll become the norm (they always have been for me).
All the tweets from the meeting hve been collected as a Storify. The most retweeted comment was from Liz Wager
This, I think, cuts to the heart if the problem. Perverse incentives, if sufficiently harsh, will inevitably lead to bad behaviour. Occasionally it will lead to fraud. It’s even led to (at least) two suicides. If you threaten people in their forties and fifties with being fired, and losing their house, because they don’t meet some silly metric, then of course people will cut corners. Curing that is very much more important than pre-registration, data-sharing and concordats, though the latter occupied far more of the time at the meeting.
The primary source of the problem is that there is not enough money for the number of people who want to do research (a matter that was barely mentioned). That leads to the unpalatable conclusion that the only way to cure the problem is to have fewer people competing for the money. That’s part of the reason that I suggested recently a two-stage university system. That’s unlikely to happen soon. So what else can be done in the meantime?
The responsibility for perverse incentives has to rest squarely on the shoulders of the senior academics and administrators who impose them. It is at this level that the solutions must be found. That was said, but not firmly enough. The problems are mostly created by the older generation It’s our fault.
IncidentalIy, I was not impressed by the fact that the Academy of Medical Sciences listed attendees with initials after peoples’ names. There were eight FRSs but I find it a bit embarrassing to be identified as one, as though it made any difference to the value of what I said.
It was suggested that courses in research ethics for young scientists would help. I disagree. In my experience, young scientists are honest and idealistic. The problems arise when their idealism is shattered by the bad example set by their elders. I’ve had a stream of young people in my office who want advice and support because they feel they are being pressured by their elders into behaviour which worries them. More than one of them have burst into tears because they feel that they have been bullied by PIs.
One talk that I found impressive was Ottloline Leyser who chaired the recent report on The Culture of Scientific Research in the UK, from the Nuffield Council on Bioethics. But I found that report to be bland and its recommendations, though well-meaning, unlikely to result in much change. The report was based on a relatively small, self-selected sample of 970 responses to a web survey, and on 15 discussion events. Relatively few people seem to have spent time filling in the text boxes, For example
“Of the survey respondents who provided a negative comment on the effects of competition in science, 24 out of 179 respondents (13 per cent) believe that high levels of competition between individuals discourage research collaboration and the sharing of data and methodologies.&rdquo:
Such numbers are too small to reach many conclusions, especially since the respondents were self-selected rather than selected at random (poor experimental design!). Nevertheless, the main concerns were all voiced. I was struck by
“Almost twice as many female survey respondents as male respondents raise issues related to career progression and the short term culture within UK research when asked which features of the research environment are having the most negative effect on scientists”
But no conclusions or remedies were put forward to remedy this problem. It was all put rather better, and much more frankly, some time ago by Peter Lawrence. I do have the impression that bloggers (including Dorothy Bishop) get to the heart of the problems much more directly than any official reports.
The Nuffield report seemed to me to put excessive trust in paper exercises, such as the “Concordat to Support the Career Development of Researchers”. The word “bullying” does not occur anywhere in the Nuffield document, despite the fact that it’s problem that’s been very widely discussed and a problem that’s critical for the problems of reproducibility. The Concordat (unlike the Nuffield report) does mention bullying.
"All managers of research should ensure that measures exist at every institution through which discrimination, bullying or harassment can be reported and addressed without adversely affecting the careers of innocent parties. "
That sounds good, but it’s very obvious that there are many places simply ignore it. All universities subscribe to the Concordat. But signing is as far as it goes in too many places. It was signed by Imperial College London, the institution with perhaps the worst record for pressurising its employees, but official reports would not dream of naming names or looking at publicly available documentation concerning bullying tactics. For that, you need bloggers.
On the first day, the (soon-to-depart) Dean of Medicine at Imperial, Dermot Kelleher, was there. He seemed a genial man, but he would say nothing about the death of Stefan Grimm. I find that attitude incomprehensible. He didn’t reappear on the second day of the meeting.
The San Francisco Declaration on Research Assessment (DORA) is a stronger statement than the Concordat, but its aims are more limited. DORA states that the impact factor is not to be used as a substitute “measure of the quality of individual research articles, or in hiring, promotion, or funding decisions”. That’s something that I wrote about in 2003, in Nature. In 2007 it was still rampant, including at Imperial College. It still is in many places. The Nuffield Council report says that DORA has been signed by “over 12,000 individuals and 500 organisations”, but fails to mention the fact that only three UK universities have signed up to DORA (oneof them, I’m happy to say, is UCL). That’s a pretty miserable record. And, of course, it remains to be seen whether the signatories really abide by the agreement. Most such worthy agreements are ignored on the shop floor.
The recommendations of the Nuffield Council report are all worthy, but they are bland and we’ll be lucky if they have much effect. For example
“Ensure that the track record of researchers is assessed broadly, without undue reliance on journal impact factors”
What on earth is “undue reliance”? That’s a far weaker statement than DORA. Why?
“Ensure researchers, particularly early career researchers, have a thorough grounding in research ethics”
In my opinion, what we should say to early career researchers is “avoid the bad example that’s set by your elders (but not always betters)”. It’s the older generation which has produced the problems and it’s unbecoming to put the blame on the young. It’s the late career researchers who are far more in need of a thorough grounding in research ethics than early-career researchers.
Although every talk was more or less interesting, the one I enjoyed most was the first one, by Marcus Munafo. It assessed the scale of the problem (though with a strong emphasis on psychology, plus some genetics and epidemiology), and he had good data on under-powered studies. It also made a fleeting mention of the problem of the false discovery rate. Since the meeting was essentially about the publication of results that aren’t true, I would have expected the statistical problem of the false discovery rate to have been given much more prominence than it was. Although Ioannidis’ now-famous paper “Why most published research is wrong” got the occasional mention, very little attention (apart from Munafo and Button) was given to the problems which he pointed out.
I’ve recently convinced myself that, if you declare that you’ve made a discovery when you observe P = 0.047 (as is almost universal in the biomedical literature) you’ll be wrong 30 – 70% of the time (see full paper, "An investigation of the false discovery rate and the misinterpretation of p-values".and simplified versions on Youtube and on this blog). If that’s right, then surely an important way to reduce the publication of false results is for journal editors to give better advice about statistics. This is a topic that was almost absent from the meeting. It’s also absent from the Nuffield Council report (the word “statistics” does not occur anywhere).
In summary, the meeting was very timely, and it was fun. But I ended up thinking it had a bit too much of preaching good intentions to the converted. It failed to grasp some of the nettles firmly enough. There was no mention of what’s happening at Imperial, or Warwick, or Queen Mary, or at Kings College London. Let’s hope that when it’s written up, the conclusion will be a bit less bland than those of most official reports.
"Modern scientists are doing too much trusting and not enough verifying -to the detriment of the whole of science, and of humanity.
Too many of the findings that fill the academic ether are the result of shoddy experiments or poor analysis"
"Careerism also encourages exaggeration and the cherrypicking of results."
This is what the public think of us. It’s time that vice-chancellors did something about it, rather than willy-waving about rankings.
After criticism of the conclusions of official reports, I guess that I have to make an attempt at recommendations myself. Here’s a first attempt.
- The heart of the problem is money. Since the total amount of money is not likely to increase in the short term, the only solution is to decrease the number of applicants. This is a real political hot-potato, but unless it’s tackled the problem will persist. The most gentle way that I can think of doing this is to restrict research to a subset of universities. My proposal for a two stage university system might go some way to achieving this. It would result in better postgraduate education, and it would be more egalitarian for students. But of course universities that became “teaching only” would see (wrongly) as demotion, and it seems that UUK is unlikely to support any change to the status quo (except, of course, for increasing fees).
- Smaller grants, smaller groups and fewer papers would benefit science.
- Ban completely the use of impact factors and discourage use of all metrics. None has been shown to measure future quality. All increase the temptation to “game the system” (that’s the usual academic euphemism for what’s called cheating if an undergraduate does it).
- “Performance management” is the method of choice for bullying academics. Don’t allow people to be fired because they don’t achieve arbitrary targets for publications or grant income. The criteria used at Queen Mary London, and Imperial, and Warwick and at Kings, are public knowledge. They are a recipe for employing spivs and firing Nobel Prize winners: the 1991 Nobel Laureate in Physiology or Medicine would have failed Imperial’s criteria in 6 years out of 10 years when he was doing the work which led to the prize.
- Universities must learn that if you want innovation and creativity you have also to tolerate a lot of failure.
- The ranking of universities by ranking businesses or by the REF encourages bad behaviour by encouraging vice-chancellors to improve their ranking, by whatever means they can. This is one reason for bullying behaviour. The rankings are totally arbitrary and a huge waste of money. I’m not saying that universities should be unaccountable to taxpayers. But all you have to do is to produce a list of publications to show that very few academics are not trying. It’s absurd to try to summarise a whole university in a single number. It’s simply statistical illiteracy
- Don’t waste money on training courses in research ethics. Everyone already knows what’s honest and what’s dodgy (though a bit more statistics training might help with that). Most people want to do the honest thing, but few have the nerve to stick to their principles if the alternative is to lose your job and your home. Senior university people must stop behaving in that way.
- University procedures for protecting the young are totally inadequate. A young student who reports bad behaviour of his seniors is still more likely to end up being fired than being congratulated (see, for example, a particularly bad case at the University of Sheffield). All big organisations close ranks to defend themselves when criticised. Even extreme cases, as when an employee commits suicide after being bullied, universities issue internal reports which blame nobody.
- Universities must stop papering over the cracks when misbehaviour is discovered. It seems to be beyond the wit of PR people to realise that often it’s best (and always the cheapest) to put your hands up and say “sorry, we got that wrong”
- There an urgent need to get rid of the sort of statistical illiteracy that allows P = 0.06 to be treated as failure and P = 0.04 as success. This is almost universal in biomedical papers, and given the hazards posed by the false discovery rate, could well be a major contribution to false claims. Journal editors need to offer much better statistical advice than is the case at the moment.
[This an update of a 2006 post on my old blog]
The New York Times (17 January 2006) published a beautiful spoof that illustrates only too clearly some of the bad practices that have developed in real science (as well as in quackery). It shows that competition, when taken to excess, leads to dishonesty.
More to the point, it shows that the public is well aware of the dishonesty that has resulted from the publish or perish culture, which has been inflicted on science by numbskull senior administrators (many of them scientists, or at least ex-scientists). Part of the blame must attach to "bibliometricians" who have armed administrators with simple-minded tools the usefulness is entirely unverified. Bibliometricians are truly the quacks of academia. They care little about evidence as long as they can sell the product.
The spoof also illustrates the folly of allowing the hegemony of a handful of glamour journals to hold scientists in thrall. This self-inflicted wound adds to the pressure to produce trendy novelties rather than solid long term work.
It also shows the only-too-frequent failure of peer review to detect problems.
The future lies on publication on the web, with post-publication peer review. It has been shown by sites like PubPeer that anonymous post-publication review can work very well indeed. This would be far cheaper, and a good deal better than the present extortion practised on universities by publishers. All it needs is for a few more eminent people like mathematician Tim Gowers to speak out (see Elsevier – my part in its downfall).
Recent Nobel-prizewinner Randy Schekman has helped with his recent declaration that "his lab will no longer send papers to Nature, Cell and Science as they distort scientific process"
The spoof is based on the fraudulent papers by Korean cloner, Woo Suk Hwang, which were published in Science, in 2005. As well as the original fraud, this sad episode exposed the practice of ‘guest authorship’, putting your name on a paper when you have done little or no work, and cannot vouch for the results. The last (‘senior’) author on the 2005 paper, was Gerald Schatten, Director of the Pittsburgh Development Center. It turns out that Schatten had not seen any of the original data and had contributed very little to the paper, beyond lobbying Scienceto accept it. A University of Pittsburgh panel declared Schatten guilty of “research misbehavior”, though he was, amazingly, exonerated of “research misconduct”. He still has his job. Click here for an interesting commentary.
The New York Times carried a mock editorial to introduce the spoof..
One Last Question: Who Did the Work?
By NICHOLAS WADE
In the wake of the two fraudulent articles on embryonic stem cells published in Science by the South Korean researcher Hwang Woo Suk, Donald Kennedy, the journal’s editor, said last week that he would consider adding new requirements that authors “detail their specific contributions to the research submitted,” and sign statements that they agree with the conclusions of their article.
A statement of authors’ contributions has long been championed by Drummond Rennie, deputy editor of The Journal of the American Medical Association,
Explicit statements about the conclusions could bring to light many reservations that individual authors would not otherwise think worth mentioning. The article shown [below] from a future issue of the Journal of imaginary Genomics, annotated in the manner required by Science‘s proposed reforms, has been released ahead of its embargo date.
The old-fashioned typography makes it obvious that the spoof is intended to mock a paper in Science.
The problem with this spoof is its only too accurate description of what can happen at the worst end of science.
Something must be done if we are to justify the money we get and and we are to retain the confidence of the public
My suggestions are as follows
- Nature Science and Cell should become news magazines only. Their glamour value distorts science and encourages dishonesty
- All print journals are outdated. We need cheap publishing on the web, with open access and post-publication peer review. The old publishers would go the same way as the handloom weavers. Their time has past.
- Publish or perish has proved counterproductive. You’d get better science if you didn’t have any performance management at all. All that’s needed is peer review of grant applications.
- It’s better to have many small grants than fewer big ones. The ‘celebrity scientist’, running a huge group funded by many grants has not worked well. It’s led to poor mentoring and exploitation of junior scientists.
- There is a good case for limiting the number of original papers that an individual can publish per year, and/or total grant funding. Fewer but more complete papers would benefit everyone.
- Everyone should read, learn and inwardly digest Peter Lawrence’s The Mismeasurement of Science.
3 January 2014.
Yet another good example of hype was in the news. “Effect of Vitamin E and Memantine on Functional Decline in Alzheimer Disease“. It was published in the Journal of the American Medical Association. The study hit the newspapers on January 1st with headlines like Vitamin E may slow Alzheimer’s Disease (see the excellent analyis by Gary Schwitzer). The supplement industry was ecstatic. But the paper was behind a paywall. It’s unlikely that many of the tweeters (or journalists) had actually read it.
The trial was a well-designed randomised controlled trial that compared four treatments: placebo, vitamin E, memantine and Vitamin E + memantine.
Reading the paper gives a rather different impression from the press release. Look at the pre-specified primary outcome of the trial.
The primary outcome measure was
" . . the Alzheimer’s Disease Cooperative Study/Activities of Daily Living (ADCSADL) Inventory.12 The ADCS-ADL Inventory is designed to assess functional abilities to perform activities of daily living in Alzheimer patients with a broad range of dementia severity. The total score ranges from 0 to 78 with lower scores indicating worse function."
It looks as though any difference that might exist between the four treaments is trivial in size. In fact the mean difference between Vitamin E and placebos was only 3.15 (on a 78 point scale) with 95% confidence limits from 0.9 to 5.4. This gave a modest P = 0.03 (when properly corrected for multiple comparisons), a result that will impress only those people who regard P = 0.05 as a sort of magic number. Since the mean effect is so trivial in size that it doesn’t really matter if the effect is real anyway.
It is not mentioned in the coverage that none of the four secondary outcomes achieved even a modest P = 0.05 There was no detectable effect of Vitamin E on
- Mean annual rate of cognitive decline (Alzheimer Disease Assessment Scale–Cognitive Subscale)
- Mean annual rate of cognitive decline (Mini-Mental State Examination)
- Mean annual rate of increased symptoms
- Mean annual rate of increased caregiver time,
The only graph that appeared to show much effect was The Dependence Scale. This scale
“assesses 6 levels of functional dependence. Time to event is the time to loss of 1 dependence level (increase in dependence). We used an interval-censored model assuming a Weibull distribution because the time of the event was known only at the end of a discrete interval of time (every 6 months).”
It’s presented as a survival (Kaplan-Meier) plot. And it is this somewhat obscure secondary outcome that was used by the Journal of the American Medical Assocciation for its publicity.
Note also that memantine + Vitamin E was indistinguishable from placebo. There are two ways to explain this: either Vitamin E has no effect, or memantine is an antagonist of Vitamin E. There are no data on the latter, but it’s certainly implausible.
The trial used a high dose of Vitamin E (2000 IU/day). No toxic effects of Vitamin E were reported, though a 2005 meta-analysis concluded that doses greater than 400 IU/d "may increase all-cause mortality and should be avoided".
In my opinion, the outcome of this trial should have been something like “Vitamin E has, at most, trivial effects on the progress of Alzheimer’s disease”.
Both the journal and the authors are guilty of disgraceful hype. This continual raising of false hopes does nothing to help patients. But it does damage the reputation of the journal and of the authors.
This paper constitutes yet another failure of altmetrics. (see more examples on this blog). Not surprisingly, given the title, It was retweeted widely, but utterly uncritically. Bad science was promoted. And JAMA must take much of the blame for publishing it and promoting it.
Academic staff are going to be fired at Queen Mary University of London (QMUL). It’s possible that universities may have to contract a bit in hard times, so what’s wrong?
What’s wrong is that the victims are being selected in a way that I can describe only as insane. The criteria they use are guaranteed to produce a generation of second-rate spiv scientists, with a consequent progressive decline in QMUL’s reputation.
The firings, it seems, are nothing to do with hard financial times, but are a result of QMUL’s aim to raise its ranking in university league tables.
In the UK university league table, a university’s position is directly related to its government research funding. So they need to do well in the 2014 ‘Research Excellence Framework’ (REF). To achieve that they plan to recruit new staff with high research profiles, take on more PhD students and post-docs, obtain more research funding from grants, and get rid of staff who are not doing ‘good’ enough research.
So far, that’s exactly what every other university is trying to do. This sort of distortion is one of the harmful side-effects of the REF. But what’s particularly stupid about QMUL’s behaviour is the way they are going about it. You can assess your own chances of survival at QMUL’s School of Biological and Chemical Sciences from the following table, which is taken from an article by Jeremy Garwood (Lab Times Online. July 4, 2012). The numbers refer to the four year period from 2008 to 2011.
Category of staff
Research Output Quantity
Research Income (£)
Research Income (£)
at least 200,000
at least 150,000
at least 120,000
at least 100,000
In addition to the three criteria, ‘Research Output ‐ quality’, ‘Research Output – quantity’, and ‘Research Income’, there is a minimum threshold of 1 PhD completion for staff at each academic level. All this data is “evidenced by objective metrics; publications cited in Web of Science, plus official QMUL metrics on grant income and PhD completion.”
To survive, staff must meet the minimum threshold in three out of the four categories, except as follows:
Demonstration of activity at an exceptional level in either ‘research outputs’ or ‘research income’, termed an ‘enhanced threshold’, is “sufficient” to justify selection regardless of levels of activity in the other two categories. And what are these enhanced thresholds?
The university notes that the above criteria “are useful as entry standards into the new school, but they fall short of the levels of activity that will be expected from staff in the future. These metrics should not, therefore, be regarded as targets for future performance.”
This means that those who survived the redundancy criteria will simply have to do better. But what is to reassure them that it won’t be their turn next time should they fail to match the numbers?
To help them, Queen Mary is proposing to introduce ‘D3’ performance management (www.unions.qmul.ac.uk/ucu/docs/d3-part-one.doc). Based on more ‘administrative physics’, D3 is shorthand for ‘Direction × Delivery × Development.’ Apparently “all three are essential to a successful team or organisation. The multiplication indicates that where one is absent/zero, then the sum is zero!”
D3 is based on principles of accountability: “A sign of a mature organisation is where its members acknowledge that they face choices, they make commitments and are ready to be held to account for discharging these commitments, accepting the consequences rather than seeking to pass responsibility.” Inspired?
I presume the D3 document must have been written by an HR person. It has all the incoherent use of buzzwords so typical of HR. And it says "sum" when it means "product" (oh dear, innumeracy is rife).
The criteria are utterly brainless. The use of impact factors for assessing people has been discredited at least since Seglen (1997) showed that the number of citations that a paper gets is not perceptibly correlated with the impact factor of the journal in which it’s published. The reason for this is the distribution of the number of citations for papers in a particular journal is enormously skewed. This means that high-impact journals get most of their citations from a few articles.
The distribution for Nature is shown in Fig. 1. Far from being gaussian, it is even more skewed than a geometric distribution; the mean number of citations is 114, but 69% of papers have fewer than the mean, and 24% have fewer than 30 citations. One paper has 2,364 citations but 35 have 10 or fewer. ISI data for citations in 2001 of the 858 papers published in Nature in 1999 show that the 80 most-cited papers (16% of all papers) account for half of all the citations (from Colquhoun, 2003)
The Institute of Scientific Information, ISI, is guilty of the unsound statistical practice of characterizing a distribution by its mean only, with no indication of its shape or even its spread. School of Biological and Chemical Sciences-QMUL is expecting everyone has to be above average in the new regime. Anomalously, the thresholds for psychologists are lower because it is said that it’s more difficult for them to get grants. This undermines even the twisted logic applied at the outset.
All this stuff about skewed distributions is, no doubt, a bit too technical for HR people to understand. Which, of course, is precisely why they should have nothing to do with assessing people.
At a time when so may PhDs fail to get academic jobs we should be limiting the numbers. But QMUL requires everyone to have a PhD student, not for the benefit of the student, but to increase its standing in league tables. That is deeply unethical.
The demand to have two papers in journals with impact factor greater than seven is nonsense. In physiology, for example, there are only four journals with an impact factor greater that seven and three of them are review journals that don’t publish original research. The two best journals for electrophysiology are Journal of Physiology (impact factor 4.98, in 2010) and Journal of General Physiology (IF 4.71). These are the journals that publish papers that get you into the Royal Society or even Nobel prizes. But for QMUL, they don’t count.
I have been lucky to know well three Nobel prize winners. Andrew Huxley. Bernard Katz, and Bert Sakmann. I doubt that any of them would pass the criteria laid down for a professor by QMUL. They would have been fired.
The case of Sakmann is analysed in How to Get Good Science, [pdf version]. In the 10 years from 1976 to 1985, when Sakmann rose to fame, he published an average of 2.6 papers per year (range 0 to 6). In two of these 10 years he had no publications at all. In the 4 year period (1976 – 1979 ) that started with the paper that brought him to fame (Neher & Sakmann, 1976) he published 9 papers, just enough for the Reader grade, but in the four years from 1979 – 1982 he had 6 papers, in 2 of which he was neither first nor last author. His job would have been in danger if he’d worked at QMUL. In 1991 Sakmann, with Erwin Neher, got the Nobel Prize for Physiology or Medicine.
The most offensive thing of the lot is the way you can buy yourself out if you publish 26 papers in the 4 year period. Sakmann came nowhere near this. And my own total, for the entire time from my first paper (1963) until I was elected to the Royal Society (May 1985) was 27 papers (and 7 book chapters). I would have been fired.
Peter Higgs had no papers at all from the time he moved to Edinburgh in 1960, until 1964 when his two paper’s on what’s now called the Higgs’ Boson were published in Physics Letters. That journal now has an impact factor less than 7 so Queen Mary would not have counted them as “high quality” papers, and he would not have been returnable for the REF. He too would have been fired.
The encouragement to publish large numbers of papers is daft. I have seen people rejected from the Royal Society for publishing too much. If you are publishing a paper every six weeks, you certainly aren’t writing them, and possibly not even reading them. Most likely you are appending your name to somebody else’s work with little or no checking of the data. Such numbers can be reached only by unethical behaviour, as described by Peter Lawrence in The Mismeasurement of Science. Like so much managerialism, the rules provide an active encouragement to dishonesty.
In the face of such a boneheaded approach to assessment of your worth, it’s the duty of any responsible academic to point out the harm that’s being done to the College. Richard Horton, in the Lancet, did so in Bullying at Barts. There followed quickly letters from Stuart McDonald and Nick Wright, who used the Nuremburg defence, pointing out that the Dean (Tom Macdonald) was just obeying orders from above. That has never been as acceptable defence. If Macdonald agreed with the procedure, he should be fired for incompetence. If he did not agree with it he should have resigned.
It’s a pity, because Tom Macdonald was one of the people with whom I corresponded in support of Barts’ students who, very reasonably, objected to having course work marked by homeopaths (see St Bartholomew’s teaches antiscience, but students revolt, and, later, Bad medicine. Barts sinks further into the endarkenment). In that case he was not unreasonable, and, a mere two years later I heard that he’d taken action.
To cap it all, two academics did their job by applying a critical eye to what’s going on at Queen Mary. They wrote to the Lancet under the title Queen Mary: nobody expects the Spanish Inquisition
"For example, one of the “metrics” for research output at professorial level is to have published at least two papers in journals with impact factors of 7 or more. This is ludicrous, of course—a triumph of vanity as sensible as selecting athletes on the basis of their brand of track suit. But let us follow this “metric” for a moment. How does the Head of School fair? Zero, actually. He fails. Just consult Web of Science. Take care though, the result is classified information. HR’s “data” are marked Private and Confidential. Some things must be believed. To question them is heresy."
Astoundingly, the people who wrote this piece are now under investigation for “gross misconduct”. This is behaviour worthy of the University of Poppleton, as pointed out by the inimitable Laurie Taylor, in Times Higher Education (June 7)
The rustle of censorship
It appears that last week’s edition of our sister paper, The Poppleton Evening News, carried a letter from Dr Gene Ohm of our Biology Department criticising this university’s metrics-based redundancy programme.
We now learn that, following the precedent set by Queen Mary, University of London, Dr Ohm could be found guilty of “gross misconduct” and face “disciplinary proceedings leading to dismissal” for having the effrontery to raise such issues in a public place.
Louise Bimpson, the corporate director of our ever-expanding human resources team, admitted that this response might appear “severe” but pointed out that Poppleton was eager to follow the disciplinary practices set by such soon-to-be members of the prestigious Russell Group as Queen Mary. Thus it was only to be expected that we would seek to emulate its espousal of draconian censorship. She hoped this clarified the situation.
David Bignell, emeritus professor of zoology at Queen Mary hit the nail on the head.
"These managers worry me. Too many are modest achievers, retired from their own studies, intoxicated with jargon, delusional about corporate status and forever banging the metrics gong. Crucially, they don’t lead by example."
What the managers at Queen Mary have failed to notice is that the best academics can choose where to go.
People are being told to pack their bags and move out with one day’s notice. Access to journals stopped, email address removed, and you may need to be accompanied to your (ex)-office. Good scientists are being treated like criminals.
What scientist in their right mind would want to work at QMUL, now that their dimwitted assessment methods, and their bullying tactics, are public knowledge?
The responsibility must lie with the principal, Simon Gaskell. And we know what the punishment is for bringing your university into disrepute.
Send an email. You may want to join the many people who have already written to QMUL’s principal, Simon Gaskell (email@example.com), and/or to Sir Nicholas Montagu, Chairman of Council, firstname.lastname@example.org.
Sunday 1 July 2012. Since this blog was posted after lunch on Friday 29th June, it has had around 9000 visits from 72 countries. Here is one of 17 maps showing the origins of 200 of the hits in the last two days
The tweets about QMUL are collected in a Storify timeline.
Thanks for that – I LOVED IT. It’s fantastic that the truth of HR (I truly hate that phrase) has been so ruthlessly exposed. Should be part of the School Handbook. Any VC who stripped out all the BS would immediately retain and attract good people and see their productivity soar.
That’s advice that Queen Mary should heed.
Part of the reason for that popularity was Ben Goldacre’s tweet, to his 201,000 followers
3 July 2012. I have come by a copy of this email, which was sent to Queen Mary by a senior professor from the USA (word travels fast on the web). It shows just how easy it is to destroy the reputation of an institution.
I was appalled to read the criteria devised by your University to evaluate its faculty. There are so flawed it is hard to know where to begin.
Your criteria are antithetical to good scientific research. The journals are littered with weak publications, which are generated mainly by scientists who feel the pressure to publish, no matter whether the results are interesting, valid, or meaningful. The literature is flooded by sheer volume of these publications.
Your attempt to require “quality” research is provided by the requirement for publications in “high Impact Factor” journals. IF has been discredited among scientists for many reasons: it is inaccurate in not actually reflecting the merit of the specific paper, it is biased toward fields with lots of scientists, etc. The demand for publications in absurdly high IF journals encourages, and practically enforces scientific fraud. I have personally experienced those reviews from Nature demanding one or two more “final” experiments that will clinch the publication. The authors KNOW how these experiments MUST turn out. If they want their Nature paper (and their very academic survival if they are at a brutal, anti-scientific university like QMUL), they must get the “right” answer. The temptation to fudge the data to get this answer is extreme. Some scientists may even be able to convince themselves that each contrary piece of data that they discard to ensure the “correct” answer is being discarded for a valid reason. But the result is that scientific misconduct occurs. I did not see in your criteria for “success” at QMUL whether you discount retracted papers from the tally of high IF publications, or perhaps the retraction itself counts as yet another high IF publication!
Your requirement for each faculty to have one or more postdocs or students promotes the abusive exploitation of these individuals for their cheap labor, and ignores the fact that they are being “trained” for jobs that do not exist.
The “standards” you set are fantastically unrealistic. For example, funding is not graded, but a sharp step function – we have 1 or 2 or 0 grants and even if the average is above your limits, no one could sustain this continuously. Once you have fired every one of your faculty, which will almost certainly happen within 1-2 rounds of pogroms, where will you find legitimate scientists who are willing to join such a ludicrous University?
4 July 2012.
Professor John F. Allen is Professor of Biochemistry at Queen Mary, University of London, and distinguished in the fields of Photosynthesis, Chloroplasts, Mitochondria, Genome function and evolution and Redox signalling. He, with a younger colleague, wrote a letter to the Lancet, Queen Mary: nobody expects the Spanish Inquisition. It is an admirable letter, the sort of thing any self-respecting academic should write. But not according to HR. On 14 May, Allen got a letter from HR, which starts thus.
14th May 2012
Dear Professor Allen
I am writing to inform you that the College had decided to commence a factfinding investigation into the below allegation: That in writing and/or signing your name to a letter entitled "Queen Mary: nobody expects the Spanish Inquisition," (enclosed) which was published in the Lancet online on 4th May 2012, you sought to bring the Head of School of Biological and Chemical Sciences and the Dean for Research in the School of Medicine and Dentistry into disrepute.
. . . .
Download the entire letter. It is utterly disgraceful bullying. If anyone is bringing Queen Mary into disrepute, it is Sam Holborn and the principal, Simon Gaskell.
Here’s another letter, from the many that have been sent. This is from a researcher in the Netherlands.
Dear Sir Nicholas,
I am addressing this to you in the hope that you were not directly involved in creating this extremely stupid set of measures that have been thought up, not to improve the conduct of science at QMUL, but to cheat QMUL’s way up the league tables over the heads of the existing academic staff.
Others have written more succinctly about the crass stupidity of your Human Resources department than I could, and their apparent ignorance of how science actually works. As your principal must bear full responsibility for the introduction of these measures, I am not sending him a copy of this mail. I am pretty sure that his “principal” mail address will no longer be operative.
We have had a recent scandal in the Netherlands where a social psychology professor, who even won a national “Man of the Year” award, as well as as a very large amount of research money, was recently exposed as having faked all the data that went into a total number of articles running into three figures. This is not the sort of thing one wants to happen to one’s own university. He would have done well according to your REF .. before he was found out.
Human Resources departments have gained too much power, and are completely incompetent when it comes to judging academic standards. Let them get on with the old dull, and gobbledigook-free, tasks that personnel departments should be carrying out.
5 July 2012.
Here’s another letter. It’s from a member of academic staff at QMUL, someone who is not himself threatened with being fired. It certainly shows that I’m not making a fuss about nothing. Rather, I’m the only person old enough to say what needs to be said without fear of losing my job and my house.
Dear Prof. Colquhoun,
I am an academic staff member in SBCS, QMUL. I am writing from my personal email account because the risks of using my work account to send this email are too great.
I would like to thank you for highlighting our problems and how we have been treated by our employer (Queen Mary University of London), in your blog. I would please urge you to continue to tweet and blog about our plight, and staff in other universities experiencing similarly horrific working conditions.
I am not threatened with redundancy by QMUL, and in fact my research is quite successful. Nevertheless, the last nine months have been the most stressful of all my years of academic life. The best of my colleagues in SBCS, QMUL are leaving already and I hope to leave, if I can find another job in London.
Staff do indeed feel very unfairly treated, intimidated and bullied. I never thought a job at a university could come to this.
Thank you again for your support. It really does matter to the many of us who cannot really speak out openly at present.
In a later letter, the same person pointed out
"There are many of us who would like to speak more openly, but we simply cannot."
"I have mortgage . . . . Losing my job would probably mean losing my home too at this point."
"The plight of our female staff has not even been mentioned. We already had very few female staff. And with restructuring, female staff are more likely to be forced into teaching-only contracts or indeed fired"."
"total madness in the current climate – who would want to join us unless desperate for a job!"
“fuss about nothing” – absolutely not. It is potentially a perfect storm leading to teaching and research disaster for a university! Already the reputation of our university has been greatly damaged. And senior staff keep blaming and targeting the “messengers"."
6 July 2012.
Througn the miracle of WiFi, this is coming from Newton, MA. The Lancet today has another editorial on the Queen Mary scandal.
"As hopeful scientists prepare their applications to QMUL, they should be aware that, behind the glossy advertising, a sometimes harsh, at times repressive, and disturbingly unforgiving culture awaits them."
That sums it up nicely.
24 July 2012. I’m reminded by Nature writer, Richard van Noorden (@Richvn) that Nature itself has wriiten at least twice about the iniquity of judging people by impact factors. In 2005 Not-so-deep impact said
"Only 50 out of the roughly 1,800 citable items published in those two years received more than 100 citations in 2004. The great majority of our papers received fewer than 20 citations."
"None of this would really matter very much, were it not for the unhealthy reliance on impact factors by administrators and researchers’ employers worldwide to assess the scientific quality of nations and institutions, and often even to judge individuals."
And, more recently, in Assessing assessment” (2010).
29 July 2012. Jonathan L Rees. of the University of Edinburgh, ends his blog:
"I wonder what career advice I should offer to a young doctor circa 2012. Apart from not taking a job at Queen Mary of course. "
How to select candidates
I have, at various times, been asked how I would select candidates for a job, if not by counting papers and impact factors. This is a slightly modified version of a comment that I left on a blog, which describes roughly what I’d advocate
After a pilot study the entire Research Excellence Framework (which attempts to assess the quality of research in every UK university) made the following statement.
“No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs”
It seems that the REF is paying attention to the science not to bibliometricians.
It has been the practice at UCL to ask people to nominate their best papers (2 -4 papers depending on age). We then read the papers and asked candidates hard questions about them (not least about the methods section). It’s a method that I learned a long time ago from Stephen Heinemann, a senior scientist at the Salk Institute. It’s often been surprising to learn how little some candidates know about the contents of papers which they themselves select as their best. One aim of this is to find out how much the candidate understands the principles of what they are doing, as opposed to following a recipe.
Of course we also seek the opinions of people who know the work, and preferably know the person. Written references have suffered so much from ‘grade inflation’ that they are often worthless, but a talk on the telephone to someone that knows both the work, and the candidate, can be useful, That, however, is now banned by HR who seem to feel that any knowledge of the candidate’s ability would lead to bias.
It is not true that use of metrics is universal and thank heavens for that. There are alternatives and we use them.
Incidentally, the reason that I have described the Queen Mary procedures as insane, brainless and dimwitted is because their aim to increase their ratings is likely to be frustrated. No person in their right mind would want to work for a place that treats its employees like that, if they had any other option. And it is very odd that their attempt to improve their REF rating uses criteria that have been explicitly ruled out by the REF. You can’t get more brainless than that.
This discussion has been interesting to me, if only because it shows how little bibliometricians understand how to get good science.