LOB-vs
Download Lectures on Biostatistics (1971).
Corrected and searchable version of Google books edition

Download review of Lectures on Biostatistics (THES, 1973).

Latest Tweets
Categories
Archives

David Colquhoun

Jump to follow-up

The "supplement" industry is a scam that dwarfs all other forms of alternative medicine. Sales are worth over $100 billion a year, a staggering sum. But the claims they make are largely untrue: plain fraudulent. Although the industry’s advertisements like to claim "naturalness". in fact most of the synthetic vitamins are manufactured by big pharma companies. The pharmaceutical industry has not been slow to cash in on an industry in which unverified claims can be made with impunity.

When I saw advertised Hotshot, "a proprietary formulation of organic ingredients" that is alleged to cure or prevent muscle cramps, I would have assumed that it was just another scam. Then I saw that the people behind it were very highly-regarded scientists, Rod MacKinnon and Bruce Bean, both of whom I have met.

The Hotshot’s website gives this background.

"For Dr. Rod MacKinnon, a Nobel Prize-winning neuroscientist/endurance athlete, the invention of HOTSHOT was personal.

After surviving life threatening muscle cramps while deep sea kayaking off the coast of Cape Cod, he discovered that existing cramp remedies – that target the muscle – didn’t work. Calling upon his Nobel Prize-winning expertise on ion channels, Rod reasoned that preventing and treating cramps began with focusing on the nerve, not the muscle.

Five years of scientific research later, Rod has perfected HOTSHOT, the kick-ass, proprietary formulation of organic ingredients, powerful enough to stop muscle cramps where they start. At the nerve.

Today, Rod’s genius solution has created a new category in sports nutrition: Neuro Muscular Performance (NMP). It’s how an athlete’s nerves and muscles work together in an optimal way. HOTSHOT boosts your NMP to stop muscle cramps. So you can push harder, train longer and finish stronger."  

For a start, it’s pretty obvious that MacKinnon has not spent the last five years developing a cure for cramp. His publications don’t even mention the topic. Neither do Bruce Bean’s.

I’d like to thank Bruce Bean for answering some questions I put to him. He said it’s "designed to be as strong as possible in activating TRPV1 and TRPA1 channels". After some hunting I found that it contains

Filtered Water, Organic Cane Sugar, Organic Gum Arabic, Organic Lime Juice Concentrate, Pectin, Sea Salt, Natural Flavor, Organic Stevia Extract, Organic Cinnamon, Organic Ginger, Organic Capsaicin

The first ingredient is sugar: "the 1.7oz shot contains enough sugar to make a can of Coke blush with 5.9 grams per ounce vs. 3.3 per ounce of Coke".[ref].

The TRP (transient receptor potential) receptors form a family of 28 related ion channels,Their physiology is far from being well understood, but they are thought to be important for mediating taste and pain, The TRPV1 channel is also known as the receptor for capsaicin (found in chilli peppers). The TRPA1 responds to the active principle in Wasabi.

I’m quite happy to believe that most cramp is caused by unsychronised activity of motor nerves causing muscle fibres to contract in an uncordinated way (though it isn’t really known that this is the usual mechanism, or what triggers it in the first place), The problem is that there is no good reason at all to think that stimulating TRP receptors in the gastro-intestinal tract will stop, within a minute or so, the activity of motor nerves in the spinal cord.

But, as always, there is no point in discussing mechanisms until we are sure that there is a phenomenon to be explained. What is the actual evidence that Hotshot either prevents of cures cramps, as claimed? The Hotshot’s web site has pages about Our Science, Its title is The Truth about Muscle Cramps. That’s not a good start because it’s well known that nobody understands cramp.

So follow the link to See our Scientific Studies. It has three references, two are to unpublished work. The third is not about Hotshot, but about pickle juice. This was also the only reference sent to me by Bruce Bean. Its title is ‘Reflex Inhibition of Electrically Induced
Muscle Cramps in Hypohydrated Humans’, Miller et al,, 2010 [Download pdf]. Since it’s the only published work, it’s worth looking at in detail.

Miller et al., is not about exercise-induced cramp, but about a cramp-like condition that can be induced by electrical stimulation of a muscle in the sole of the foot (flexor hallucis brevis). The intention of the paper was to investigate anecdotal reports that pickle juice and prevent or stop cramps. It was a small study (only 10 subjects). After getting the subjects dehydrated, they cramp was induced electrically, and two seconds after it started, they drank either pickle juice or distilled water. They weren’t asked about pain: the extent of cramp was judged by electromyograph records. At least a week later, the test was repeated with the other drink (the order in which they were given was randomised). So it was a crossover design.

There was no detectable difference between water and pickle juice on the intensity of the cramp. But the duration of the cramp was said to be shorter. The mean duration after water was 133.7 ± 15.9 s and the mean duration after pickle juice was 84.6 ± 18.5 s. A t test gives P = 0.075. However each subject had both treatments and the mean reduction in duration was 49.1 ± 14.6 s and a paired t test gives P = 0.008. This is close to the 3-standard-deviation difference which I recommended as a minimal criterion, so what could possibly go wrong?.

The result certainly suggests that pickle juice might reduce the duration of cramps, but it’s far from conclusive, for the following reasons. First, it must have been very obvious indeed to the subjects whether they were drinking water or pickle juice. Secondly, paired t tests are not the right way to analyse crossover experiments, as explained here, Unfortunately the 10 differences are not given so there is no way to judge the consistency of the responses. Thirdly, two outcomes were measured (intensity and duration), and no correction was made for multiple comparisons. Finally, P = 0.008 is convincing evidence only if you assume that there’s a roughly 50:50 chance of the pickle-juice folk-lore being right before the experiment was started. For most folk remedies, that would be a pretty implausible assumption. The vast majority of folk remedies turn out to be useless when tested properly.

Nevertheless, the results are sufficiently suggestive that it might be worth testing Hotshot properly. One might have expected that would have been done before marketing started, It wasn’t.

Bruce Bean tells me that they tried it on friends who said that it worked. Perhaps that’s not so surprising: there can be no condition more susceptible than muscle cramps to self-deception because of regression to the mean

They found a business partner, Flex Pharma, and Mackinnon set up a company. Let’s see how they are doing.

Flex Pharma

The hyperbole in the advertisements for Hotshots is entirely legal in the USA. The infamous 1994 “Dietary Supplement Health and Education Act (DSHEA)” allows almost any claim to be made for herbs etc as long as they are described as a "dietary supplement". All they have to do is add in the small print:

"These statements have not been evaluated by the Food and Drug Administration. This product is not intended to diagnose, treat, cure or prevent any disease".

Of course medical claims are made: it’s sold to prevent and treat muscle cramp (and I can’t even find the weasel words on the web site).

As well as Hotshot, Flex Pharma are also testing a drug, FLX-787, a TRP receptor agonist of undisclosed structure.  It is hoping get FDA approval for treatment of nocturnal leg cramps (NLCs) and treatment of spasticity in multiple sclerosis (MS) and amyotrophic lateral sclerosis (ALS) patients. It would be great if it works, but we still don’t know whether it does,

The financial press doesn’t seem to be very optimistic. When Flex Pharma was launched on the stock market at the beginning of 2015, its initial public offering, raised $$86.4 million, at $16 per share. The biotech boom of the previous few years was still strong. In 2016, the outlook seems less rosy. The investment advice site Seeking Alpha had a scathing evaluation in June 2016. Its title was "Flex Pharma: What A Load Of Cramp". It has some remarkably astute assessments of the pharmacology, as well as of financial risks. The summary reads thus:

  • We estimate FLKS will burn at least 40 million of its $84 million in cash this year on clinical trials for FLX-787 and marketing spend for its new cramp supplement called “HOTSHOT.”
  • Based on its high cash burn, we expect a large, dilutive equity raise is likely over the next 12 months.
  • We believe the company’s recent study on nocturnal leg cramps (NLCs) may be flawed. We also highlight risks to its lead drug candidate, FLX-787, that we believe investors are currently overlooking.
  • We highlight several competitive available alternatives to FLKS’s cramp products that we believe investors have not factored into current valuation.
  • Only 2.82% of drugs from companies co-founded by CEO Westphal have achieved FDA approval.

The last bullet point refers to Flex Pharma’s CEO, Christoph Westphal MD PhD (described bi Fierce Biotech as "serial biotech entrepreneur"). Only two out of his 71 requests for FDA approval were successful.

On October 13th 2016 it was reported that early trials of FLX-787 had been disappointing. The shares plunged.

fp1

On October 17th 2016, Seeking Alpha posted another evaluation: “Flex Pharma Has Another Cramp“. Also StreetInsider,com. They were not optimistic. The former made the point (see above) that crossover trials are not what should be done. In fact the FDA have required that regular parallel RCTs should be done before FLX-787 can be approved.

Summary

Drug discovery is hard and it’s expensive. The record for small molecule discovery has not been good in the last few decades. Many new introductions have, at best, marginal efficacy, and at worst may do more harm than good. In the conditions for which understanding of causes is poor or non-existent, it’s impossible to design new drugs rationally. There are only too many such conditions: from low back pain to almost anything that involves the brain, knowledge of causes is fragmentary to non-existent. This leads guidance bodies to clutch at straws. Disappointing as this is, it’s not for want of trying. And it’s not surprising. Serious medical research hasn’t been going for long and the systems are very complicated.

But this is no excuse for pretending that things work on tha basis of the flimsiest of evidence, Bruce Bean advised me to try Hotshot on friends, and says that it doesn’t work for everybody. This is precisely what one is told by homeopaths, and just about every other sort of quack. Time and time again, that sort of evidence has proved to be misleading,

I have the greatest respect for the science that’s published by both Bruce Bean and Rod MacKinnon. I guess that they aren’t familiar with the sort of evidence that’s required to show that a new treatment works. That isn’t solved by describing a treament as a "dietary supplement".

I’ll confess that I’m a bit disappointed by their involvement with Flex Pharma, a company that makes totally unjustified claims. Or should one just say caveat emptor?

Follow-up

Before posting this, I sent it to Bruce Bean to be checked. Here was his response, which I’m posting in full (hoping not to lose a friend).

"Want to be UK representative for Hotshot? Sample on the way!"

"I do not see anything wrong with the facts. I have a different opinion – that it is perfectly appropriate to have different standards of proof of efficacy for consumer products made from general-recognized-as-safe ingredients and for an FDA-approved drug. I’d be happy for the opportunity to post something like the following your blog entry (and suffer any consequent further abuse) if there is an opportunity".  

  " I think it would be unfair to lump Hotshot with “dietary supplements” targeted to exploit the hopes of people with serious diseases who are desperate for magic cures. Hotshot is designed and marketed to athletes who experience exercise-induced cramping that can inhibit their training or performance – hardly a population of desperate people susceptible of exploitation. It costs only a few dollars for someone to try it. Lots of people use it regularly and find it helpful. I see nothing wrong with this and am glad that something that I personally found helpful is available for others to try. "

     " Independently of Hotshot, Flex Pharma is hoping to develop treatments for cramping associated with diseases like ALS, MS, and idiopathic nocturnal leg cramps. These treatments are being tested in rigorous clinical trials that will be reviewed by the FDA. As with any drug development it is very expensive to do the clinical trials and there is no guarantee of success. I give credit to the investors who are underwriting the effort. The trials are openly publicly reported. I would note that Flex Pharma voluntarily reported results of a recent trial for night leg cramps that led to a nearly 50% drop in the stock price. I give the company credit for that openness and for spending a lot of money and a lot of effort to attempt to develop a treatment to help people – if it can pass the appropriately high hurdle of FDA approval."

     " On Friday, I sent along 8 bottles of Hotshot by FedEx, correctly labeled for customs as a commercial sample. Of course, I’d be delighted if you would agree to act as UK representative for the product but absent that, it should at least convince you that the TRP stimulators are present at greater than homeopathic doses. If you can find people who get exercise-induced cramping that can’t be stretched out, please share with them."

6 January 2017

It seems that more than one Nobel prizewinner is willing to sell their names to dodgy businesses. The MIT Tech Review tweeted a link to Imagine Albert Einstein getting paid to put his picture on tin of anti-wrinkle cream. No fewer than seven Nobel prizewinners have lent their names to a “supplement” pill that’s claimed to prolong your life. Needless to say, there isn’t the slightest reason to think it works. What posesses these people beats me. Here are their names.

Aaron Ciechanover (Cancer Biology, Technion – Israel Institute of Technology).

Eric Kandel (Neuroscience, Columbia University).

Jack Szostak (Origins of Life & Telomeres, Harvard University).

Martin Karplus (Complex Chemical Systems, Harvard University).

Sir Richard Roberts(Biochemistry, New England Biolabs).

Thomas Südhof (Neuroscience, Stanford University).

Paul Modrich (Biochemistry, Duke University School of Medicine).

Then there’s the Amyway problem. Watch this space.

‘We know little about the effect of diet on health. That’s why so much is written about it’. That is the title of a post in which I advocate the view put by John Ioannidis that remarkably little is known about the health effects if individual nutrients. That ignorance has given rise to a vast industry selling advice that has little evidence to support it.

The 2016 Conference of the so-called "College of Medicine" had the title "Food, the Forgotten Medicine". This post gives some background information about some of the speakers at this event. I’m sorry it appears to be too ad hominem, but the only way to judge the meeting is via the track record of the speakers.

com0

com1

Quite a lot has been written here about the "College of Medicine". It is the direct successor of the Prince of Wales’ late, unlamented, Foundation for Integrated Health. But unlike the latter, its name is disguises its promotion of quackery. Originally it was going to be called the “College of Integrated Health”, but that wasn’t sufficently deceptive so the name was dropped.

For the history of the organisation, see

The new “College of Medicine” arising from the ashes of the Prince’s Foundation for Integrated Health

Don’t be deceived. The new “College of Medicine” is a fraud and delusion

The College of Medicine is in the pocket of Crapita Capita. Is Graeme Catto selling out?

The conference programme (download pdf) is a masterpiece of bait and switch. It is a mixture of very respectable people, and outright quacks. The former are invited to give legitimacy to the latter. The names may not be familiar to those who don’t follow the antics of the magic medicine community, so here is a bit of information about some of them.

The introduction to the meeting was by Michael Dixon and Catherine Zollman, both veterans of the Prince of Wales Foundation, and both devoted enthusiasts for magic medicne. Zollman even believes in the battiest of all forms of magic medicine, homeopathy (download pdf), for which she totally misrepresents the evidence. Zollman works now at the Penny Brohn centre in Bristol. She’s also linked to the "Portland Centre for integrative medicine" which is run by Elizabeth Thompson, another advocate of homeopathy. It came into being after NHS Bristol shut down the Bristol Homeopathic Hospital, on the very good grounds that it doesn’t work.

Now, like most magic medicine it is privatised. The Penny Brohn shop will sell you a wide range of expensive and useless "supplements". For example, Biocare Antioxidant capsules at £37 for 90. Biocare make several unjustified claims for their benefits. Among other unnecessary ingredients, they contain a very small amount of green tea. That’s a favourite of "health food addicts", and it was the subject of a recent paper that contains one of the daftest statistical solecisms I’ve ever encountered

"To protect against type II errors, no corrections were applied for multiple comparisons".

If you don’t understand that, try this paper.
The results are almost certainly false positives, despite the fact that it appeared in Lancet Neurology. It’s yet another example of broken peer review.

It’s been know for decades now that “antioxidant” is no more than a marketing term, There is no evidence of benefit and large doses can be harmful. This obviously doesn’t worry the College of Medicine.

Margaret Rayman was the next speaker. She’s a real nutritionist. Mixing the real with the crackpots is a standard bait and switch tactic.

Eleni Tsiompanou, came next. She runs yet another private "wellness" clinic, which makes all the usual exaggerated claims. She seems to have an obsession with Hippocrates (hint: medicine has moved on since then). Dr Eleni’s Joy Biscuits may or may not taste good, but their health-giving properties are make-believe.

Andrew Weil, from the University of Arizona
gave the keynote address. He’s described as "one of the world’s leading authorities on Nutrition and Health". That description alone is sufficient to show the fantasy land in which the College of Medicine exists. He’s a typical supplement salesman, presumably very rich. There is no excuse for not knowing about him. It was 1988 when Arnold Relman (who was editor of the New England Journal of Medicine) wrote A Trip to Stonesville: Some Notes on Andrew Weil, M.D..

“Like so many of the other gurus of alternative medicine, Weil is not bothered by logical contradictions in his argument, or encumbered by a need to search for objective evidence.”

This blog has mentioned his more recent activities, many times.

Alex Richardson, of Oxford Food and Behaviour Research (a charity, not part of the university) is an enthusiast for omega-3, a favourite of the supplement industry, She has published several papers that show little evidence of effectiveness. That looks entirely honest. On the other hand, their News section contains many links to the notorious supplement industry lobby site, Nutraingredients, one of the least reliable sources of information on the web (I get their newsletter, a constant source of hilarity and raised eyebrows). I find this worrying for someone who claims to be evidence-based. I’m told that her charity is funded largely by the supplement industry (though I can’t find any mention of that on the web site).

Stephen Devries was a new name to me. You can infer what he’s like from the fact that he has been endorsed byt Andrew Weil, and that his address is "Institute for Integrative Cardiology" ("Integrative" is the latest euphemism for quackery). Never trust any talk with a title that contains "The truth about". His was called "The scientific truth about fats and sugars," In a video, he claims that diet has been shown to reduce heart disease by 70%. which gives you a good idea of his ability to assess evidence. But the claim doubtless helps to sell his books.

Prof Tim Spector, of Kings College London, was next. As far as I know he’s a perfectly respectable scientist, albeit one with books to sell, But his talk is now online, and it was a bit like a born-again microbiome enthusiast. He seemed to be too impressed by the PREDIMED study, despite it’s statistical unsoundness, which was pointed out by Ioannidis. Little evidence was presented, though at least he was more sensible than the audience about the uselessness of multivitamin tablets.

Simon Mills talked on “Herbs and spices. Using Mother Nature’s pharmacy to maintain health and cure illness”. He’s a herbalist who has featured here many times. I can recommend especially his video about Hot and Cold herbs as a superb example of fantasy science.

Annie Anderson, is Professor of Public Health Nutrition and
Founder of the Scottish Cancer Prevention Network. She’s a respectable nutritionist and public health person, albeit with their customary disregard of problems of causality.

Patrick Holden is chair of the Sustainable Food Trust. He promotes "organic farming". Much though I dislike the cruelty of factory farms, the "organic" industry is largely a way of making food more expensive with no health benefits.

The Michael Pittilo 2016 Student Essay Prize was awarded after lunch. Pittilo has featured frequently on this blog as a result of his execrable promotion of quackery -see, in particular, A very bad report: gamma minus for the vice-chancellor.

Nutritional advice for patients with cancer. This discussion involved three people.
Professor Robert Thomas, Consultant Oncologist, Addenbrookes and Bedford Hospitals, Dr Clare Shaw, Consultant Dietitian, Royal Marsden Hospital and Dr Catherine Zollman, GP and Clinical Lead, Penny Brohn UK.

Robert Thomas came to my attention when I noticed that he, as a regular cancer consultant had spoken at a meeting of the quack charity, “YestoLife”. When I saw he was scheduled tp speak at another quack conference. After I’d written to him to point out the track records of some of the people at the meeting, he withdrew from one of them. See The exploitation of cancer patients is wicked. Carrot juice for lunch, then die destitute. The influence seems to have been temporary though. He continues to lend respectability to many dodgy meetings. He edits the Cancernet web site. This site lends credence to bizarre treatments like homeopathy and crystal healing. It used to sell hair mineral analysis, a well-known phony diagnostic method the main purpose of which is to sell you expensive “supplements”. They still sell the “Cancer Risk Nutritional Profile”. for £295.00, despite the fact that it provides no proven benefits.

Robert Thomas designed a food "supplement", Pomi-T: capsules that contain Pomegranate, Green tea, Broccoli and Curcumin. Oddly, he seems still to subscribe to the antioxidant myth. Even the supplement industry admits that that’s a lost cause, but that doesn’t stop its use in marketing. The one randomised trial of these pills for prostate cancer was inconclusive. Prostate Cancer UK says "We would not encourage any man with prostate cancer to start taking Pomi-T food supplements on the basis of this research". Nevertheless it’s promoted on Cancernet.co.uk and widely sold. The Pomi-T site boasts about the (inconclusive) trial, but says "Pomi-T® is not a medicinal product".

There was a cookery demonstration by Dale Pinnock "The medicinal chef" The programme does not tell us whether he made is signature dish "the Famous Flu Fighting Soup". Needless to say, there isn’t the slightest reason to believe that his soup has the slightest effect on flu.

In summary, the whole meeting was devoted to exaggerating vastly the effect of particular foods. It also acted as advertising for people with something to sell. Much of it was outright quackery, with a leavening of more respectable people, a standard part of the bait-and-switch methods used by all quacks in their attempts to make themselves sound respectable. I find it impossible to tell how much the participants actually believe what they say, and how much it’s a simple commercial drive.

The thing that really worries me is why someone like Phil Hammond supports this sort of thing by chairing their meetings (as he did for the "College of Medicine’s" direct predecessor, the Prince’s Foundation for Integrated Health. His defence of the NHS has made him something of a hero to me. He assured me that he’d asked people to stick to evidence. In that he clearly failed. I guess they must pay well.

Follow-up

This is my version of a post which I was asked to write for the Independent. It’s been published, though so many changes were made by the editor that I’m posting the original here (below).

Superstition is rife in all sports. Mostly it does no harm, and it might even have a placebo effect that’s sufficient to make a difference of 0.01%. That might just get you a medal. But what does matter is that superstition has given rise to an army of charlatans who are only to willing to sell their magic medicine to athletes, most of whom are not nearly as rich as Phelps.

So much has been said about cupping during the last week
that it’s hard to say much that’s original. Yesterday I did six radio interviews and two for TV, and today Associated Press TV came to film a piece about it. Everyone else must have been on holiday. The only one I’ve checked was the piece on the BBC News channel. That one didn’t seem to go too badly, so it’s here

BBC news coverage

It starts with the usual lengthy, but uninformative, pictures of someone being cupped, The cupper in this case was actually a chiropractor, Rizwhan Suleman. Chiropractic is, of course a totally different form of alternative medicine and its value has been totally discredited in the wake of the Simon Singh case. It’s not unusual for people to sell different therapies with conflicting beliefs. Truth is irrelevant. Once you’ve believed one impossible thing, it seems that the next ones become quite easy.

The presenter, Victoria Derbyshire, gave me a fair chance to debunk it afterwards.

Nevertheless, the programme suffered from the usual pretence that there is a controversy about the medical value of cupping. There isn’t. But despite Steve Jones’ excellent report to the BBC Trust, the media insist on giving equal time to flat-earth advocates. The report, (Review of impartiality and accuracy of the BBC’s coverage of science) was no doubt commissioned with good intentions, but it’s been largely ignored.

Still worse, the BBC News Channel, when it repeated the item (its cycle time is quite short) showed only Rizwhan Suleman and cut out my comments altogether. This is not false balance. It’s no balance whatsoever. A formal complaint has been sent. It is not the job of the BBC to provide free advertising to quacks.

After this, a friend drew my attention to a much worse programme on the subject.

The Jeremy Vine show on BBC Radio 2, at 12.00 on August 10th, 2016. This was presented by Vanessa Feltz. It was beyond appalling. There was absolutely zero attempt at balance, false or otherwise. The guest was described as being am "expert" on cupping. He was Yusef Noden, of the London Hijama Clinic, who "trained and qualified with the Hijama & Prophetic Medicine Institute". No doubt he’s a nice bloke, but he really could use a first year course in physiology. His words were pure make-believe. His repeated statements about "withdrawing toxins" are well know to be absolutely untrue. It was embarrassing to listen to. If you really want to hear it, here is an audio recording.

The Jeremy Vine show

This programme is one of the worst cases I’ve heard of the BBC mis-educating the public by providing free advertising for quite outrageous quackery. Another complaint will be submitted. The only form of opposition was a few callers who pointed out the nonsense, mixed with callers who endorsed it. That is not, by any stretch of the imagination, fair and balanced.

It’s interesting that, although cupping is often associated with Traditional Chinese Medicine, neither of the proponents in these two shows was Chinese, but rather they were Muslim. This should not be surprising as neither cupping nor acupuncture are exclusively Chinese. Similar myths have arisen in many places. My first encounter with this particular branch of magic medicine was when I was asked to make a podcast for “Things Unseen”, in which I debated with a Muslim hijama practitioner and an Indian Ayurvedic practitioner. It’s even harder to talk sense to practitioners of magic medicine who believe that god is on their side, as well as believing that selling nonsense is a good way to make a living.

An excellent history of the complex emergence of similar myths in different parts of the world has been published by Ben Kavoussi, under the title "Acupuncture is astrology with needles".

Now the original version of my blog for the Independent.


Cupping: Michael Phelps and Gwyneth Paltrow may be believers, but the truth behind it is what really sucks

The sight of Olympic swimmer, Michael Phelps, with bruises on his body caused by cupping resulted in something of a media feeding-frenzy this week. He’s a great athlete so cupping must be responsible for his performance, right?  Just as cupping must be responsible for the complexion of an earlier enthusiast, Gwyneth Paltrow.

The main thing in common between Phelps and Paltrow is that they both have a great deal of money, and neither has much interest in how you distinguish truth from myth.  They can afford to indulge any whim, however silly.

And cupping is pretty silly. It’s a pre-scientific medical practice that started in a time when there was no understanding of physiology, much like bloodletting. Indeed one version does involve a bit of bloodletting.  Perhaps bloodletting is the best argument against the belief that it’s ancient wisdom, so it must work. It was a standard part of medical treatment for hundreds of years, and killed countless people.

It is desperately implausible that putting suction cups on your skin would benefit anything, so it’s not surprising that there is no worthwhile empirical evidence that it does.  The Chinese version of cupping is related to acupuncture and, unlike cupping, acupuncture has been very thoroughly tested. Over 3000 trials have failed to show any benefit that’s big enough to benefit patients. Acupuncture is no more than a theatrical placebo.  And even its placebo effects are too small to be useful.

At least it’s likely that cupping usually does no lasting damage.. We don’t know for sure because in the world of alternative medicine there is no system for recording bad effects (and there is a vested interest in not reporting them).  In extreme cases, it can leave holes in your skin that pose a serious danger of infection, but most people probably end up with just broken capillaries and bruises.  Why would anyone want that? 
The answer to that question seems to be a mixture of wishful thinking about the benefits and vastly exaggerated claims made by the people who sell the product.

It’s typical that the sales people can’t even agree on what the benefits are alleged to be.  If selling to athletes, the claim may be that it relieves pain, or that it aids recovery, or that it increases performance.  Exactly the same cupping methods are sold to celebs with the claim that their beauty will be improved because cupping will “boost your immune system”.  This claim is universal in the world of make-believe medicine, when the salespeople can think of nothing else. There is no surer sign of quackery.  It means nothing whatsoever.  No procedure is known to boost your immune system.  And even if anything did, it would be more likely to cause inflammation and blood clots than to help you run faster or improve your complexion.

It’s certainly most unlikely that sucking up bits of skin into evacuated jars would have any noticeable effect on blood flow in underlying muscles, and so increase your performance.  The salespeople would undoubtedly benefit from a first year physiology course.

Needless to say, they haven’t tried to actually measuring blood flow, or performance. To do that might reduce sales.  As Kate Carter said recently “Eating jam out of those jars would probably have a more significant physical impact”.

The problem with all sports medicine is that tiny effects could make a difference. When three hour endurance events end with a second or so separating the winner from the rest, that is an effect of less than 0.01%.   Such tiny effects will never be detectable experimentally.  That leaves the door open to every charlatan to sell miracle treatments that might just work.  If, like steroids, they do work, there is a good chance that they’ll harm your health in the long run.

You might be better off eating the jam.


Here is a very small selection of the many excellent accounts of cupping on the web.

There have been many good blogs. The mainstream media have, on the whole, been dire. Here are three that I like,

In July 2016, Orac posted in ScienceBlogs. "What’s the harm? Cupping edition". He used his expertise as a surgeon to explain the appalling wounds that can be produced by excessive cupping.

cup

Photo from news,com.au

Timothy Caulfield, wrote "Olympic debunk!". He’s  Chair in Health Law and Policy at the University of Alberta, and the author of Is Gwyneth Paltrow Wrong about Everything.

“The Olympics are a wonderful celebration of athletic performance. But they have also become an international festival of sports pseudoscience. It will take an Olympic–sized effort to fight this bunk and bring a win to the side of evidence-based practice.”

Jennifer Raff wrote Pseudoscience is common among elite athletes outside of the Olympics too…and it makes me furious. She works on the genomes of modern and ancient people at the University of Kansas, and, as though that were not a full-time job for most people, she writes blogs, books and she’s also "training (and occasionally competing) in Muay Thai, boxing, BJJ, and MMA".

"I’m completely unsurprised to find that pseudoscience is common among the elite athletes competing in the Olympics. I’ve seen similar things rampant in the combat sports world as well."

What she writes makes perfect sense. Just don’t bother with the comments section which is littered with Trump-like post-factual comments from anonymous conspiracy theorists.

Follow-up

Of all types of alternative medicine, acupuncture is the one that has received the most approval from regular medicine. The benefit of that is that it’s been tested more thoroughly than most others. The result is now clear. It doesn’t work. See the evidence in Acupuncture is a theatrical placebo.

This blog has documented many cases of misreported tests of acupuncture, often from people have a financial interests in selling it. Perhaps the most egregious spin came from the University of Exeter. It was published in a normal journal, and endorsed by the journal’s editor, despite showing clearly that acupuncture didn’t even have much placebo effect.

Acupuncture got a boost in 2009 from, of all unlikely sources, the National Institute for Health and Care Excellence (NICE). The judgements of NICE and the benefit / cost ratio of treatments are usually very good. But the guidance group that they assembled to judge treatments for low back pain was atypically incompetent when it came to assessment of evidence. They recommended acupuncture as one option. At the time I posted “NICE falls for Bait and Switch by acupuncturists and chiropractors: it has let down the public and itself“. That was soon followed by two more posts:

NICE fiasco, part 2. Rawlins should withdraw guidance and start again“,

and

The NICE fiasco, Part 3. Too many vested interests, not enough honesty“.

At the time, NICE was being run by Michael Rawlins, an old friend. No doubt he was unaware of the bad guidance until it was too late and he felt obliged to defend it.

Although the 2008 guidance referred only to low back pain, it gave an opening for acupuncturists to penetrate the NHS. Like all quacks, they are experts at bait and switch. The penetration of quackery was exacerbated by the privatisation of physiotherapy services to organisations like Connect Physical Health which have little regard for evidence, but a good eye for sales. If you think that’s an exaggeration, read "Connect Physical Health sells quackery to NHS".

When David Haslam took over the reins at NICE, I was optimistic that the question would be revisited (it turned out that he was aware of this blog). I was not disappointed. This time the guidance group had much more critical members.

The new draft guidance on low back pain was released on 24 March 2016. The final guidance will not appear until September 2016, but last time the final version didn’t differ much from the draft.

Despite modern imaging methods, it still isn’t possible to pinpoint the precise cause of low back pain (LBP) so diagnoses are lumped together as non-specific low back pain (NSLBP).

The summary guidance is explicit.

“1.2.8 Do not offer acupuncture for managing non-specific low back 7 pain with or without sciatica.”
 

The evidence is summarised section 13.6 of the main report (page 493).There is a long list of other proposed treatments that are not recommended.

Because low back pain is so common, and so difficult to treat, many treatments have been proposed. Many of them, including acupuncture, have proved to be clutching at straws. It’s to the great credit of the new guidance group that they have resisted that temptation.

Among the other "do not offer" treatments are

  • imaging (except in specialist setting)
  • belts or corsets
  • foot orthotics
  • acupuncture
  • ultrasound
  • TENS or PENS
  • opioids (for acute or chronic LBP)
  • antidepressants (SSRI and others)
  • anticonvulsants
  • spinal injections
  • spinal fusion for NSLBP (except as part of a randomised controlled trial)
  • disc replacement

At first sight, the new guidance looks like an excellent clear-out of the myths that surround the treatment of low back pain.

The positive recommendations that are made are all for things that have modest effects (at best). For example “Consider a group exercise programme”, and “Consider manipulation, mobilisation”. The use of there word “consider”, rather than “offer” seems to be NICE-speak -an implicit suggestion that it doesn’t work very well. My only criticism of the report is that it doesn’t say sufficiently bluntly that non-specific low back pain is largely an unsolved problem. Most of what’s seen is probably a result of that most deceptive phenomenon, regression to the mean.

One pain specialist put it to me thus. “Think of the billions spent on back pain research over the years in order to reach the conclusion that nothing much works – shameful really.” Well perhaps not shameful: it isn’t for want of trying. It’s just a very difficult problem. But pretending that there are solutions doesn’t help anyone.

Follow-up

This post arose from a recent meeting at the Royal Society. It was organised by Julie Maxton to discuss the application of statistical methods to legal problems. I found myself sitting next to an Appeal Court Judge who wanted more explanation of the ideas. Here it is.

Some preliminaries

The papers that I wrote recently were about the problems associated with the interpretation of screening tests and tests of significance. They don’t allude to legal problems explicitly, though the problems are the same in principle.  They are all open access. The first appeared in 2014:
http://rsos.royalsocietypublishing.org/content/1/3/140216

Since the first version of this post, March 2016, I’ve written two more papers and some popular pieces on the same topic. There’s a list of them at http://www.onemol.org.uk/?page_id=456.
I also made a video for YouTube of a recent talk.

In these papers I was interested in the false positive risk (also known as the false discovery rate) in tests of significance. It turned out to be alarmingly large. That has serious consequences for the credibility of the scientific literature. In legal terms, the false positive risk means the proportion of cases in which, on the basis of the evidence, a suspect is found guilty when in fact they are innocent. That has even more serious consequences.

Although most of what I want to say can be said without much algebra, it would perhaps be worth getting two things clear before we start.

The rules of probability.

(1) To get any understanding, it’s essential to understand the rules of probabilities, and, in particular, the idea of conditional probabilities. One source would be my old book, Lectures on Biostatistics (now free), The account on pages 19 to 24 give a pretty simple (I hope) description of what’s needed. Briefly, a vertical line is read as “given”, so Prob(evidence | not guilty) means the probability that the evidence would be observed given that the suspect was not guilty.

(2) Another potential confusion in this area is the relationship between odds and probability. The relationship between the probability of an event occurring, and the odds on the event can be illustrated by an example. If the probability of being right-handed is 0.9, then the probability of being not being right-handed is 0.1.  That means that 9 people out of 10 are right-handed, and one person in 10 is not. In other words for every person who is not right-handed there are 9 who are right-handed. Thus the odds that a randomly-selected person is right-handed are 9 to 1. In symbols this can be written

\[ \mathrm{probability=\frac{odds}{1 + odds}} \]

In the example, the odds on being right-handed are 9 to 1, so the probability of being right-handed is 9 / (1+9) = 0.9.

Conversely,

\[ \mathrm{odds =\frac{probability}{1 – probability}} \]

In the example, the probability of being right-handed is 0.9, so the odds of being right-handed are 0.9 / (1 – 0.9) = 0.9 / 0.1 = 9 (to 1).

With these preliminaries out of the way, we can proceed to the problem.

The legal problem

The first problem lies in the fact that the answer depends on Bayes’ theorem. Although that was published in 1763, statisticians are still arguing about how it should be used to this day.  In fact whenever it’s mentioned, statisticians tend to revert to internecine warfare, and forget about the user.

Bayes’ theorem can be stated in words as follows

\[ \mathrm{\text{posterior odds ratio} = \text{prior odds ratio} \times \text{likelihood ratio}} \]

“Posterior odds ratio” means the odds that the person is guilty, relative to the odds that they are innocent, in the light of the evidence, and that’s clearly what one wants to know.  The “prior odds” are the odds that the person was guilty before any evidence was produced, and that is the really contentious bit.

Sometimes the need to specify the prior odds has been circumvented by using the likelihood ratio alone, but, as shown below, that isn’t a good solution.

The analogy with the use of screening tests to detect disease is illuminating.

Screening tests

A particularly straightforward application of Bayes’ theorem is in screening people to see whether or not they have a disease.  It turns out, in many cases, that screening gives a lot more wrong results (false positives) than right ones.  That’s especially true when the condition is rare (the prior odds that an individual suffers from the condition is small).  The process of screening for disease has a lot in common with the screening of suspects for guilt. It matters because false positives in court are disastrous.

The screening problem is dealt with in sections 1 and 2 of my paper. or on this blog (and here). A bit of animation helps the slides, so you may prefer the Youtube version.

The rest of my paper applies similar ideas to tests of significance.  In that case the prior probability is the probability that there is in fact a real effect, or, in the legal case, the probability that the suspect is guilty before any evidence has been presented. This is the slippery bit of the problem both conceptually, and because it’s hard to put a number on it.

But the examples below show that to ignore it, and to use the likelihood ratio alone, could result in many miscarriages of justice.

In the discussion of tests of significance, I took the view that it is not legitimate (in the absence of good data to the contrary) to assume any prior probability greater than 0.5. To do so would presume you know the answer before any evidence was presented.  In the legal case a prior probability of 0.5 would mean assuming that there was a 50:50 chance that the suspect was guilty before any evidence was presented. A 50:50 probability of guilt before the evidence is known corresponds to a prior odds ratio of 1 (to 1)  If that were true, the likelihood ratio would be a good way to represent the evidence, because the posterior odds ratio would be equal to the likelihood ratio.

It could be argued that 50:50 represents some sort of equipoise, but in the example below it is clearly too high, and if it is less that 50:50, use of the likelihood ratio runs a real risk of convicting an innocent person.

The following example is modified slightly from section 3 of a book chapter by Mortera and Dawid (2008). Philip Dawid is an eminent statistician who has written a lot about probability and the law, and he’s a member of the legal group of the Royal Statistical Society.

My version of the example removes most of the algebra, and uses different numbers.

Example: The island problem

The “island problem” (Eggleston 1983, Appendix 3) is an imaginary example that provides a good illustration of the uses and misuses of statistical logic in forensic identification.

A murder has been committed on an island, cut off from the outside world, on which 1001 (= N + 1) inhabitants remain. The forensic evidence at the scene consists of a measurement, x, on a “crime trace” characteristic, which can be assumed to come from the criminal. It might, for example, be a bit of the DNA sequence from the crime scene.

Say, for the sake of example, that the probability of a random member of the population having characteristic x is P = 0.004 (i.e. 0.4% ), so the probability that a random member of the population does not have the characteristic is 1 – P = 0.996. The mainland police arrive and arrest a random islander, Jack. It is found that Jack matches the crime trace. There is no other relevant evidence.

How should this match evidence be used to assess the claim that Jack is the murderer? We shall consider three arguments that have been used to address this question. The first is wrong. The second and third are right. (For illustration, we have taken N = 1000, P = 0.004.)

(1) Prosecutor’s fallacy

Prosecuting counsel, arguing according to his favourite fallacy, asserts that the probability that Jack is guilty is 1 – P , or 0.996, and that this proves guilt “beyond a reasonable doubt”.

The probability that Jack would show characteristic x if he were not guilty would be 0.4% i.e. Prob(Jack has x | not guilty) = 0.004.  Therefore the probability of the evidence, given that Jack is guilty, Prob(Jack has x | Jack is guilty), is one 1 – 0.004 = 0.996.

But this is Prob(evidence | guilty) which is not what we want.  What we need is the probability that Jack is guilty, given the evidence, P(Jack is guilty | Jack has characteristic x).

To mistake the latter for the former is the prosecutor’s fallacy, or the error of the transposed conditional.

Dawid gives an example that makes the distinction clear.

“As an analogy to help clarify and escape this common and seductive confusion, consider the difference between “the probability of having spots, if you have measles” -which is close to 1  and “the probability of having measles, if you have spots” -which, in the light of the many alternative possible explanations for spots, is much smaller.”

(2) Defence counter-argument

Counsel for the defence points out that, while the guilty party must have characteristic x, he isn’t the only person on the island to have this characteristic. Among the remaining N = 1000 innocent islanders, 0.4% have characteristic x, so the number who have it will be NP = 1000 x 0.004 = 4 . Hence the total number of islanders that have this characteristic must be 1 + NP = 5 . The match evidence means that Jack must be one of these 5 people, but does not otherwise distinguish him from any of the other members of it.  Since just one of these is guilty, the probability that this is Jack is thus 1/5, or 0.2— very far from being “beyond all reasonable doubt”.

(3) Bayesian argument

The probability of the having characteristic x (the evidence) would be Prob(evidence | guilty) = 1 if Jack were guilty, but if Jack were not guilty it would be 0.4%, i.e. Prob(evidence | not guilty) = P. Hence the likelihood ratio in favour of guilt, on the basis of the evidence, is

\[ LR=\frac{\text{Prob(evidence } | \text{ guilty})}{\text{Prob(evidence }|\text{ not guilty})} = \frac{1}{P}=250 \]

In words, the evidence would be 250 times more probable if Jack were guilty than if he were innocent.  While this seems strong evidence in favour of guilt, it still does not tell us what we want to know, namely the probability that Jack is guilty in the light of the evidence: Prob(guilty | evidence), or, equivalently, the odds ratio -the odds of guilt relative to odds of innocence, given the evidence,

To get that we must multiply the likelihood ratio by the prior odds on guilt, i.e. the odds on guilt before any evidence is presented. It’s often hard to get a numerical value for this. But in our artificial example, it is possible. We can argue that, in the absence of any other evidence, Jack is no more nor less likely to be the culprit than any other islander, so that the prior probability of guilt is 1/(N + 1), corresponding to prior odds on guilt of 1/N.

We can now apply Bayes’s theorem to obtain the posterior odds on guilt:

\[ \text {posterior odds} = \text{prior odds} \times LR = \left ( \frac{1}{N}\right ) \times \left ( \frac{1}{P} \right )= 0.25 \]

Thus the odds of guilt in the light of the evidence are 4 to 1 against. The corresponding posterior probability of guilt is

\[ Prob( \text{guilty } | \text{ evidence})= \frac{1}{1+NP}= \frac{1}{1+4}=0.2 \]

This is quite small –certainly no basis for a conviction.

This result is exactly the same as that given by the Defence Counter-argument’, (see above). That argument was simpler than the Bayesian argument. It didn’t explicitly use Bayes’ theorem, though it was implicit in the argument. The advantage of using the former is that it looks simpler. The advantage of the explicitly Bayesian argument is that it makes the assumptions more clear.

In summary The prosecutor’s fallacy suggested, quite wrongly, that the probability that Jack was guilty was 0.996. The likelihood ratio was 250, which also seems to suggest guilt, but it doesn’t give us the probability that we need. In stark contrast, the defence counsel’s argument, and equivalently, the Bayesian argument, suggested that the probability of Jack’s guilt as 0.2. or odds of 4 to 1 against guilt. The potential for wrong conviction is obvious.

Conclusions.

Although this argument uses an artificial example that is simpler than most real cases, it illustrates some important principles.

(1) The likelihood ratio is not a good way to evaluate evidence, unless there is good reason to believe that there is a 50:50 chance that the suspect is guilty before any evidence is presented.

(2) In order to calculate what we need, Prob(guilty | evidence), you need to give numerical values of how common the possession of characteristic x (the evidence) is the whole population of possible suspects (a reasonable value might be estimated in the case of DNA evidence),  We also need to know the size of the population.  In the case of the island example, this was 1000, but in general, that would be hard to answer and any answer might well be contested by an advocate who understood the problem.

These arguments lead to four conclusions.

(1) If a lawyer uses the prosecutor’s fallacy, (s)he should be told that it’s nonsense.

(2) If a lawyer advocates conviction on the basis of likelihood ratio alone, s(he) should be asked to justify the implicit assumption that there was a 50:50 chance that the suspect was guilty before any evidence was presented.

(3) If a lawyer uses Defence counter-argument, or, equivalently, the version of Bayesian argument given here, (s)he should be asked to justify the estimates of the numerical value given to the prevalence of x in the population (P) and the numerical value of the size of this population (N).  A range of values of P and N could be used, to provide a range of possible values of the final result, the probability that the suspect is guilty in the light of the evidence.

(4) The example that was used is the simplest possible case.  For more complex cases it would be advisable to ask a professional statistician. Some reliable people can be found at the Royal Statistical Society’s section on Statistics and the Law.

If you do ask a professional statistician, and they present you with a lot of mathematics, you should still ask these questions about precisely what assumptions were made, and ask for an estimate of the range of uncertainty in the value of Prob(guilty | evidence) which they produce.

Postscript: real cases

Another paper by Philip Dawid, Statistics and the Law, is interesting because it discusses some recent real cases: for example the wrongful conviction of Sally Clark because of the wrong calculation of the statistics for Sudden Infant Death Syndrome.

On Monday 21 March, 2016, Dr Waney Squier was struck off the medical register by the General Medical Council because they claimed that she misrepresented the evidence in cases of Shaken Baby Syndrome (SBS).

This verdict was questioned by many lawyers, including Michael Mansfield QC and Clive Stafford Smith, in a letter. “General Medical Council behaving like a modern inquisition

The latter has already written “This shaken baby syndrome case is a dark day for science – and for justice“..

The evidence for SBS is based on the existence of a triad of signs (retinal bleeding, subdural bleeding and encephalopathy). It seems likely that these signs will be present if a baby has been shake, i.e Prob(triad | shaken) is high. But this is irrelevant to the question of guilt. For that we need Prob(shaken | triad). As far as I know, the data to calculate what matters are just not available.

It seem that the GMC may have fallen for the prosecutor’s fallacy. Or perhaps the establishment won’t tolerate arguments. One is reminded, once again, of the definition of clinical experience: “Making the same mistakes with increasing confidence over an impressive number of years.” (from A Sceptic’s Medical Dictionary by Michael O’Donnell. A Sceptic’s Medical Dictionary BMJ publishing, 1997).

Appendix (for nerds). Two forms of Bayes’ theorem

The form of Bayes’ theorem given at the start is expressed in terms of odds ratios. The same rule can be written in terms of probabilities. (This was the form used in the appendix of my paper.) For those interested in the details, it may help to define explicitly these two forms.

In terms of probabilities, the probability of guilt in the light of the evidence (what we want) is

\[ \text{Prob(guilty } | \text{ evidence}) = \text{Prob(evidence } | \text{ guilty}) \frac{\text{Prob(guilty })}{\text{Prob(evidence })} \]

In terms of odds ratios, the odds ratio on guilt, given the evidence (which is what we want) is

\[ \frac{ \text{Prob(guilty } | \text{ evidence})} {\text{Prob(not guilty } | \text{ evidence}} =
\left ( \frac{ \text{Prob(guilty)}} {\text {Prob((not guilty)}} \right )
\left ( \frac{ \text{Prob(evidence } | \text{ guilty})} {\text{Prob(evidence } | \text{ not guilty}} \right ) \]

or, in words,

\[ \text{posterior odds of guilt } =\text{prior odds of guilt} \times \text{likelihood ratio} \]

This is the precise form of the equation that was given in words at the beginning.

A derivation of the equivalence of these two forms is sketched in a document which you can download.

Follow-up

23 March 2016

It’s worth pointing out the following connection between the legal argument (above) and tests of significance.

(1) The likelihood ratio works only when there is a 50:50 chance that the suspect is guilty before any evidence is presented (so the prior probability of guilt is 0.5, or, equivalently, the prior odds ratio is 1).

(2) The false positive rate in signiifcance testing is close to the P value only when the prior probability of a real effect is 0.5, as shown in section 6 of the P value paper.

However there is another twist in the significance testing argument. The statement above is right if we take as a positive result any P < 0.05. If we want to interpret a value of P = 0.047 in a single test, then, as explained in section 10 of the P value paper, we should restrict attention to only those tests that give P close to 0.047. When that is done the false positive rate is 26% even when the prior is 0.5 (and much bigger than 30% if the prior is smaller –see extra Figure), That justifies the assertion that if you claim to have discovered something because you have observed P = 0.047 in a single test then there is a chance of at least 30% that you’ll be wrong. Is there, I wonder, any legal equivalent of this argument?

Jump to follow-up

“Statistical regression to the mean predicts that patients selected for abnormalcy will, on the average, tend to improve. We argue that most improvements attributed to the placebo effect are actually instances of statistical regression.”

“Thus, we urge caution in interpreting patient improvements as causal effects of our actions and should avoid the conceit of assuming that our personal presence has strong healing powers.”

McDonald et al., (1983)

In 1955, Henry Beecher published "The Powerful Placebo". I was in my second undergraduate year when it appeared. And for many decades after that I took it literally, They looked at 15 studies and found that an average 35% of them got "satisfactory relief" when given a placebo. This number got embedded in pharmacological folk-lore. He also mentioned that the relief provided by placebo was greatest in patients who were most ill.

Consider the common experiment in which a new treatment is compared with a placebo, in a double-blind randomised controlled trial (RCT). It’s common to call the responses measured in the placebo group the placebo response. But that is very misleading, and here’s why.

The responses seen in the group of patients that are treated with placebo arise from two quite different processes. One is the genuine psychosomatic placebo effect. This effect gives genuine (though small) benefit to the patient. The other contribution comes from the get-better-anyway effect. This is a statistical artefact and it provides no benefit whatsoever to patients. There is now increasing evidence that the latter effect is much bigger than the former.

How can you distinguish between real placebo effects and get-better-anyway effect?

The only way to measure the size of genuine placebo effects is to compare in an RCT the effect of a dummy treatment with the effect of no treatment at all. Most trials don’t have a no-treatment arm, but enough do that estimates can be made. For example, a Cochrane review by Hróbjartsson & Gøtzsche (2010) looked at a wide variety of clinical conditions. Their conclusion was:

“We did not find that placebo interventions have important clinical effects in general. However, in certain settings placebo interventions can influence patient-reported outcomes, especially pain and nausea, though it is difficult to distinguish patient-reported effects of placebo from biased reporting.”

In some cases, the placebo effect is barely there at all. In a non-blind comparison of acupuncture and no acupuncture, the responses were essentially indistinguishable (despite what the authors and the journal said). See "Acupuncturists show that acupuncture doesn’t work, but conclude the opposite"

So the placebo effect, though a real phenomenon, seems to be quite small. In most cases it is so small that it would be barely perceptible to most patients. Most of the reason why so many people think that medicines work when they don’t isn’t a result of the placebo response, but it’s the result of a statistical artefact.

Regression to the mean is a potent source of deception

The get-better-anyway effect has a technical name, regression to the mean. It has been understood since Francis Galton described it in 1886 (see Senn, 2011 for the history). It is a statistical phenomenon, and it can be treated mathematically (see references, below). But when you think about it, it’s simply common sense.

You tend to go for treatment when your condition is bad, and when you are at your worst, then a bit later you’re likely to be better, The great biologist, Peter Medawar comments thus.

"If a person is (a) poorly, (b) receives treatment intended to make him better, and (c) gets better, then no power of reasoning known to medical science can convince him that it may not have been the treatment that restored his health"
(Medawar, P.B. (1969:19). The Art of the Soluble: Creativity and originality in science. Penguin Books: Harmondsworth).

This is illustrated beautifully by measurements made by McGorry et al., (2001). Patients with low back pain recorded their pain (on a 10 point scale) every day for 5 months (they were allowed to take analgesics ad lib).

The results for four patients are shown in their Figure 2. On average they stay fairly constant over five months, but they fluctuate enormously, with different patterns for each patient. Painful episodes that last for 2 to 9 days are interspersed with periods of lower pain or none at all. It is very obvious that if these patients had gone for treatment at the peak of their pain, then a while later they would feel better, even if they were not actually treated. And if they had been treated, the treatment would have been declared a success, despite the fact that the patient derived no benefit whatsoever from it. This entirely artefactual benefit would be the biggest for the patients that fluctuate the most (e.g this in panels a and d of the Figure).

fig2
Figure 2 from McGorry et al, 2000. Examples of daily pain scores over a 6-month period for four participants. Note: Dashes of different lengths at the top of a figure designate an episode and its duration.

The effect is illustrated well by an analysis of 118 trials of treatments for non-specific low back pain (NSLBP), by Artus et al., (2010). The time course of pain (rated on a 100 point visual analogue pain scale) is shown in their Figure 2. There is a modest improvement in pain over a few weeks, but this happens regardless of what treatment is given, including no treatment whatsoever.

artus2

FIG. 2 Overall responses (VAS for pain) up to 52-week follow-up in each treatment arm of included trials. Each line represents a response line within each trial arm. Red: index treatment arm; Blue: active treatment arm; Green: usual care/waiting list/placebo arms. ____: pharmacological treatment; – – – -: non-pharmacological treatment; . . .. . .: mixed/other. 

The authors comment

"symptoms seem to improve in a similar pattern in clinical trials following a wide variety of active as well as inactive treatments.", and "The common pattern of responses could, for a large part, be explained by the natural history of NSLBP".

In other words, none of the treatments work.

This paper was brought to my attention through the blog run by the excellent physiotherapist, Neil O’Connell. He comments

"If this finding is supported by future studies it might suggest that we can’t even claim victory through the non-specific effects of our interventions such as care, attention and placebo. People enrolled in trials for back pain may improve whatever you do. This is probably explained by the fact that patients enrol in a trial when their pain is at its worst which raises the murky spectre of regression to the mean and the beautiful phenomenon of natural recovery."

O’Connell has discussed the matter in recent paper, O’Connell (2015), from the point of view of manipulative therapies. That’s an area where there has been resistance to doing proper RCTs, with many people saying that it’s better to look at “real world” outcomes. This usually means that you look at how a patient changes after treatment. The hazards of this procedure are obvious from Artus et al.,Fig 2, above. It maximises the risk of being deceived by regression to the mean. As O’Connell commented

"Within-patient change in outcome might tell us how much an individual’s condition improved, but it does not tell us how much of this improvement was due to treatment."

In order to eliminate this effect it’s essential to do a proper RCT with control and treatment groups tested in parallel. When that’s done the control group shows the same regression to the mean as the treatment group. and any additional response in the latter can confidently attributed to the treatment. Anything short of that is whistling in the wind.

Needless to say, the suboptimal methods are most popular in areas where real effectiveness is small or non-existent. This, sad to say, includes low back pain. It also includes just about every treatment that comes under the heading of alternative medicine. Although these problems have been understood for over a century, it remains true that

"It is difficult to get a man to understand something, when his salary depends upon his not understanding it."
Upton Sinclair (1935)

Responders and non-responders?

One excuse that’s commonly used when a treatment shows only a small effect in proper RCTs is to assert that the treatment actually has a good effect, but only in a subgroup of patients ("responders") while others don’t respond at all ("non-responders"). For example, this argument is often used in studies of anti-depressants and of manipulative therapies. And it’s universal in alternative medicine.

There’s a striking similarity between the narrative used by homeopaths and those who are struggling to treat depression. The pill may not work for many weeks. If the first sort of pill doesn’t work try another sort. You may get worse before you get better. One is reminded, inexorably, of Voltaire’s aphorism "The art of medicine consists in amusing the patient while nature cures the disease".

There is only a handful of cases in which a clear distinction can be made between responders and non-responders. Most often what’s observed is a smear of different responses to the same treatment -and the greater the variability, the greater is the chance of being deceived by regression to the mean.

For example, Thase et al., (2011) looked at responses to escitalopram, an SSRI antidepressant. They attempted to divide patients into responders and non-responders. An example (Fig 1a in their paper) is shown.

Thase fig 1a

The evidence for such a bimodal distribution is certainly very far from obvious. The observations are just smeared out. Nonetheless, the authors conclude

"Our findings indicate that what appears to be a modest effect in the grouped data – on the boundary of clinical significance, as suggested above – is actually a very large effect for a subset of patients who benefited more from escitalopram than from placebo treatment. "

I guess that interpretation could be right, but it seems more likely to be a marketing tool. Before you read the paper, check the authors’ conflicts of interest.

The bottom line is that analyses that divide patients into responders and non-responders are reliable only if that can be done before the trial starts. Retrospective analyses are unreliable and unconvincing.

Some more reading

Senn, 2011 provides an excellent introduction (and some interesting history). The subtitle is

"Here Stephen Senn examines one of Galton’s most important statistical legacies – one that is at once so trivial that it is blindingly obvious, and so deep that many scientists spend their whole career being fooled by it."

The examples in this paper are extended in Senn (2009), “Three things that every medical writer should know about statistics”. The three things are regression to the mean, the error of the transposed conditional and individual response.

You can read slightly more technical accounts of regression to the mean in McDonald & Mazzuca (1983) "How much of the placebo effect is statistical regression" (two quotations from this paper opened this post), and in Stephen Senn (2015) "Mastering variation: variance components and personalised medicine". In 1988 Senn published some corrections to the maths in McDonald (1983).

The trials that were used by Hróbjartsson & Gøtzsche (2010) to investigate the comparison between placebo and no treatment were looked at again by Howick et al., (2013), who found that in many of them the difference between treatment and placebo was also small. Most of the treatments did not work very well.

Regression to the mean is not just a medical deceiver: it’s everywhere

Although this post has concentrated on deception in medicine, it’s worth noting that the phenomenon of regression to the mean can cause wrong inferences in almost any area where you look at change from baseline. A classical example concern concerns the effectiveness of speed cameras. They tend to be installed after a spate of accidents, and if the accident rate is particularly high in one year it is likely to be lower the next year, regardless of whether a camera had been installed or not. To find the true reduction in accidents caused by installation of speed cameras, you would need to choose several similar sites and allocate them at random to have a camera or no camera. As in clinical trials. looking at the change from baseline can be very deceptive.

Statistical postscript

Lastly, remember that it you avoid all of these hazards of interpretation, and your test of significance gives P = 0.047. that does not mean you have discovered something. There is still a risk of at least 30% that your ‘positive’ result is a false positive. This is explained in Colquhoun (2014),"An investigation of the false discovery rate and the misinterpretation of p-values". I’ve suggested that one way to solve this problem is to use different words to describe P values: something like this.

P > 0.05 very weak evidence
P = 0.05 weak evidence: worth another look
P = 0.01 moderate evidence for a real effect
P = 0.001 strong evidence for real effect

But notice that if your hypothesis is implausible, even these criteria are too weak. For example, if the treatment and placebo are identical (as would be the case if the treatment were a homeopathic pill) then it follows that 100% of positive tests are false positives.

Follow-up

12 December 2015

It’s worth mentioning that the question of responders versus non-responders is closely-related to the classical topic of bioassays that use quantal responses. In that field it was assumed that each participant had an individual effective dose (IED). That’s reasonable for the old-fashioned LD50 toxicity test: every animal will die after a sufficiently big dose. It’s less obviously right for ED50 (effective dose in 50% of individuals). The distribution of IEDs is critical, but it has very rarely been determined. The cumulative form of this distribution is what determines the shape of the dose-response curve for fraction of responders as a function of dose. Linearisation of this curve, by means of the probit transformation used to be a staple of biological assay. This topic is discussed in Chapter 10 of Lectures on Biostatistics. And you can read some of the history on my blog about Some pharmacological history: an exam from 1959.

Every day one sees politicians on TV assuring us that nuclear deterrence works because there no nuclear weapon has been exploded in anger since 1945. They clearly have no understanding of statistics.

With a few plausible assumptions, we can easily calculate that the time until the next bomb explodes could be as little as 20 years.

Be scared, very scared.

The first assumption is that bombs go off at random intervals. Since we have had only one so far (counting Hiroshima and Nagasaki as a single event), this can’t be verified. But given the large number of small influences that control when a bomb explodes (whether in war or by accident), it is the natural assumption to make. The assumption is given some credence by the observation that the intervals between wars are random [download pdf].

If the intervals between bombs are random, that implies that the distribution of the length of the intervals is exponential in shape, The nature of this distribution has already been explained in an earlier post about the random lengths of time for which a patient stays in an intensive care unit. If you haven’t come across an exponential distribution before, please look at that post before moving on.

All that we know is that 70 years have elapsed since the last bomb. so the interval until the next one must be greater than 70 years. The probability that a random interval is longer than 70 years can be found from the cumulative form of the exponential distribution.

If we denote the true mean interval between bombs as $\mu$ then the probability that an intervals is longer than 70 years is

\[ \text{Prob}\left( \text{interval > 70}\right)=\exp{\left(\frac{-70}{\mu_\mathrm{lo}}\right)} \]

We can get a lower 95% confidence limit (call it $\mu_\mathrm{lo}$) for the mean interval between bombs by the argument used in Lecture on Biostatistics, section 7.8 (page 108). If we imagine that $\mu_\mathrm{lo}$ were the true mean, we want it to be such that there is a 2.5% chance that we observe an interval that is greater than 70 years. That is, we want to solve

\[ \exp{\left(\frac{-70}{\mu_\mathrm{lo}}\right)} = 0.025\]

That’s easily solved by taking natural logs of both sides, giving

\[ \mu_\mathrm{lo} = \frac{-70}{\ln{\left(0.025\right)}}= 19.0\text{ years}\]

A similar argument leads to an upper confidence limit, $\mu_\mathrm{hi}$, for the mean interval between bombs, by solving

\[ \exp{\left(\frac{-70}{\mu_\mathrm{hi}}\right)} = 0.975\]
so
\[ \mu_\mathrm{hi} = \frac{-70}{\ln{\left(0.975\right)}}= 2765\text{ years}\]

If the worst case were true, and the mean interval between bombs was 19 years. then the distribution of the time to the next bomb would have an exponential probability density function, $f(t)$,

\[ f(t) = \frac{1}{19} \exp{\left(\frac{-70}{19}\right)} \]

There would be a 50% chance that the waiting time until the next bomb would be less than the median of this distribution, =19 ln(0.5) = 13.2 years.

expdist19

In summary, the observation that there has been no explosion for 70 years implies that the mean time until the next explosion lies (with 95% confidence) between 19 years and 2765 years. If it were 19 years, there would be a 50% chance that the waiting time to the next bomb could be less than 13.2 years. Thus there is no reason at all to think that nuclear deterrence works well enough to protect the world from incineration.

Another approach

My statistical colleague, the ace probabilist Alan Hawkes, suggested a slightly different approach to the problem, via likelihood. The likelihood of a particular value of the interval between bombs is defined as the probability of making the observation(s), given a particular value of $\mu$. In this case, there is one observation, that the interval between bombs is more than 70 years. The likelihood, $L\left(\mu\right)$, of any specified value of $\mu$ is thus

\[L\left(\mu\right)=\text{Prob}\left( \text{interval > 70 | }\mu\right) = \exp{\left(\frac{-70}{\mu}\right)} \]

If we plot this function (graph on right) shows that it increases with $\mu$ continuously, so the maximum likelihood estimate of $\mu$ is infinity. An infinite wait until the next bomb is perfect deterrence.

bl240

But again we need confidence limits for this. Since the upper limit is infinite, the appropriate thing to calculate is a one-sided lower 95% confidence limit. This is found by solving

\[ \exp{\left(\frac{-70}{\mu_\mathrm{lo}}\right)} = 0.05\]

which gives

\[ \mu_\mathrm{lo} = \frac{-70}{\ln{\left(0.05\right)}}= 23.4\text{ years}\]

Summary

The first approach gives 95% confidence limits for the average time until we get incinerated as 19 years to 2765 years. The second approach gives the lower limit as 23.4 years. There is no important difference between the two methods of calculation. This shows that the bland assurances of politicians that “nuclear deterrence works” is not justified.

It is not the purpose of this post to predict when the next bomb will explode, but rather to point out that the available information tells us very little about that question. This seems important to me because it contradicts directly the frequent assurances that deterrence works.

The only consolation is that, since I’m now 79, it’s unlikely that I’ll live long enough to see the conflagration.

Anyone younger than me would be advised to get off their backsides and do something about it, before you are destroyed by innumerate politicians.

Postscript

While talking about politicians and war it seems relevant to reproduce Peter Kennard’s powerful image of the Iraq war.

kennard

and with that, to quote the comment made by Tony Blair’s aide, Lance Price

blair-price

It’s a bit like my feeling about priests doing the twelve stations of the cross. Politicians and priests masturbating at the expense of kids getting slaughtered (at a safe distance, of course).

Follow-up

Chalkdust is a magazine published by students of maths from UCL Mathematics department. Judging by its first issue, it’s an excellent vehicle for popularisation of maths. I have a piece in the second issue

You can view the whole second issue on line, or download a pdf of the whole issue. Or a pdf of my bit only: On the Perils of P values.

The piece started out as another exposition of the interpretation of P values, but the whole of the first part turned into an explanation of the principles of randomisation tests. It beats me why anybody still does a Student’s t test. The idea of randomisation tests is very old. They are as powerful as t tests when the assumptions of the latter are fulfilled but a lot better when the assumptions are wrong (in the jargon, they are uniformly-most-powerful tests).

Not only that, but you need no mathematics to do a randomisation test, whereas you need a good deal of mathematics to follow Student’s 1908 paper. And the randomisation test makes transparently clear that random allocation of treatments is a basic and essential assumption that’s necessary for the the validity of any test of statistical significance.

I made a short video that explains the principles behind the randomisation tests, to go with the printed article (a bit of animation always helps).

When I first came across the principals of randomisation tests, i was entranced by the simplicity of the idea. Chapters 6 – 9 of my old textbook were written to popularise them. You can find much more detail there.

In fact it’s only towards the end that I reiterate the idea that P values don’t answer the question that experimenters want to ask, namely:- if I claim I have made a discovery because P is small, what’s the chance that I’ll be wrong?

If you want the full story on that, read my paper. The story it tells is not very original, but it still isn’t known to most experimenters (because most statisticians still don’t teach it on elementary courses). The paper must have struck a chord because it’s had over 80,000 full text views and more than 10,000 pdf downloads. It reached an altmetric score of 975 (since when it has been mysteriously declining). That’s gratifying, but it is also a condemnation of the use of metrics. The paper is not original and it’s quite simple, yet it’s had far more "impact" than anything to do with my real work.

If you want simpler versions than the full paper, try this blog (part 1 and part 2), or the Youtube video about misinterpretation of P values.

The R code for doing 2-sample randomisation tests

You can download a pdf file that describes the two R scripts. There are two different R programs.

One re-samples randomly a specified number of times (the default is 100,000 times, but you can do any number). Download two_sample_rantest.R

The other uses every possible sample -in the case of the two samples of 10 observations,it gives the distribution for all 184,756 ways of selecting 10 observations from 20. Download 2-sample-rantest-exact.R

The launch party

Today the people who organise Chalkdust magazine held a party in the mathematics department at UCL. The editorial director is a graduate student in maths, Rafael Prieto Curiel. He was, at one time in the Mexican police force (he said he’d suffered more crime in London than in Mexico City). He, and the rest of the team, are deeply impressive. They’ve done a terrific job. Support them.

cdparty1
The party cakes

cd2
Rafael Prieto doing the introduction

cd2
Rafael Prieto doing the introduction

pic 3
Rafael Prieto and me

cd4
I got the T shirt

Decoding the T shirt

The top line is "I" because that’s the usual symbol for the square root of -1.

The second line is one of many equations that describe a heart shape. It can be plotted by calculating a matrix of values of the left hand side for a range of values of x and y. Then plot the contour for a values x and y for which the left hand side is equal to 1. Download R script for this. (Method suggested by Rafael Prieto Curiel.)

H4

Follow-up

5 November 2015

The Mann-Whitney test

I was stimulated to write this follow-up because yesterday I was asked by a friend to comment on the fact that five different tests all gave identical P values, P = 0.0079. The paper in question was in Science magazine (see Fig. 1), so it wouldn’t surprise me if the statistics were done badly, but in this case there is an innocent explanation.

The Chalkdust article, and the video, are about randomisation tests done using the original observed numbers, so look at them before reading on. There is a more detailed explanation in Chapter 9 of Lectures on Biostatistics. Before it became feasible to do this sort of test, there was a simpler, and less efficient, version in which the observations were ranked in ascending order, and the observed values were replaced by their ranks. This was known as the Mann Whitney test. It had the virtue that because all the ‘observations’ were now integers, the number of possible results of resampling was limited so it was possible to construct tables to allow one to get a rough P value. Of course, replacing observations by their ranks throws away some information, and now that we have computers there is no need to use a Mann-Whitney test ever. But that’s what was used in this paper.

In the paper (Fig 1) comparisons are made between two groups (assumed to be independent) with 5 observations in each group. The 10 observations are just the ranks, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

To do the randomisation test we select 5 of these numbers at random for sample A, and the other 5 are sample B. (Of course this supposes that the treatments were applied randomly in the real experiment, which is unlikely to be true.) In fact there are only 10!/(5!.5!) = 252 possible ways to select a sample of 5 from 10, so it’s easy to list all of them. In the case where there is no overlap between the groups, one group will contain the smallest observations (ranks 1, 2, 3, 4, 5, and the other group will contain the highest observations, ranks 6, 7, 8, 9, 10.

In this case, the sum of the ‘observations’ in group A is 15, and the sum for group B is 40.These add to the sum of the first 10 integers, 10.(10+1)/2 = 55. The mean (which corresponds to a difference between means of zero) is 55/2 = 27.5.

There are two ways of getting an allocation as extreme as this (first group low, as above, or second group low, the other tail of the distribution). The two tailed P value is therefore 2/252 = 0.0079. This will be the result whenever the two groups don’t overlap, regardless of the numerical values of the observations. It’s the smallest P value the test can produce with 5 observations in each group.

The whole randomisation distribution looks like this

2grpsof5

In this case, the abscissa is the sum of the ranks in sample A, rather than the difference between means for the two groups (the latter is easily calculated from the former). The red line shows the observed value, 15. There is only one way to get a total of 15 for group A: it must contain the lowest 5 ranks (group A = 1, 2, 3, 4, 5). There is also only one way to get a total of 16 (group A = 1, 2, 3, 4, 6),and there are two ways of getting a total of 17 (group A = 1, 2, 3, 4, 7, or 1, 2, 3, 5, 6), But there are 20 different ways of getting a sum of 27 or 28 (which straddle the mean, 27.5). The printout (.txt file) from the R program that was used to generate the distribution is as follows.

Randomisation test: exact calculation all possible samples

INPUTS: exact calculation: all possible samples
Total number of combinations = 252
number obs per sample = 5
sample A 1 2 3 4 5
sample B 6 7 8 9 10

OUTPUTS
sum for sample A= 15
sum for sample B = 40
mean for sample A= 3
mean for sample B = 8
Observed difference between sums (A-B) -25
Observed difference between means (A-B) -5
SD for sample A) = 1.581139
SD for sample B) = 1.581139
mean and SD for randomisation dist = 27.5 4.796662
quantiles for ran dist (0.025, 0.975) 18.275 36.725
Area equal to orless than observed diff 0.003968254
Area equal to or greater than minus observed diff 0.003968254
Two-tailed P value 0.007936508

Result of t test
P value (2 tail) 0.001052826
confidence interval 2.693996 7.306004


Some problems. Figure 1 alone shows 16 two-sample comparisons, but no correction for multiple comparisons seems to have been made. A crude Bonferroni correction would require replacement of a P = 0.05 threshold with P = 0.05/16 = 0.003. None of the 5 tests that gave P = 0.0079 reaches this level (of course the whole idea of a threshold level is absurd anyway).

Furthermore, even a single test that gave P = 0.0079 would be expected to have a false positive rate of around 10 percent

Jump to follow-up

Today, 25 September, is the first anniversary of the needless death of Stefan Grimm. This post is intended as a memorial.

He should be remembered, in the hope that some good can come from his death.

grimm

On 1 December 2014, I published the last email from Stefan Grimm, under the title “Publish and perish at Imperial College London: the death of Stefan Grimm“. Since then it’s been viewed 196,000 times. The day after it was posted, the server failed under the load.

Since than, I posted two follow-up pieces. On December 23, 2014 “Some experiences of life at Imperial College London. An external inquiry is needed after the death of Stefan Grimm“. Of course there was no external inquiry.

And on April 9, 2015, after the coroner’s report, and after Imperial’s internal inquiry, "The death of Stefan Grimm was “needless”. And Imperial has done nothing to prevent it happening again".

The tragedy featured in the introduction of the HEFCE report on the use of metrics.

“The tragic case of Stefan Grimm, whose suicide in September 2014 led Imperial College to launch a review of its use of performance metrics, is a jolting reminder that what’s at stake in these debates is more than just the design of effective management systems.”

“Metrics hold real power: they are constitutive of values, identities and livelihoods ”

I had made no attempt to contact Grimm’s family, because I had no wish to intrude on their grief. But in July 2015, I received, out of the blue, a hand-written letter from Stefan Grimm’s mother. She is now 80 and living in Munich. I was told that his father, Dieter Grimm, had died of cancer when he was only 59. I also learned that Stefan Grimm was distantly related to Wilhelm Grimm, one of the Gebrüder Grimm.

The letter was very moving indeed. It said "Most of the infos about what happened in London, we got from you, what you wrote in the internet".

I responded as sympathetically as I could, and got a reply which included several of Stefan’s drawings, and then more from his sister. The drawings were done while he was young. They show amazing talent, but by the age of 25 he was too busy with science to expoit his artistic talents.

With his mother’s permission, I reproduce ten of his drawings here, as a memorial to a man who whose needless death was attributable to the very worst of the UK university system. He was killed by mindless and cruel "performance management", imposed by Imperial College London. The initial reaction of Imperial gave little hint of an improvement. I hope that their review of the metrics used to assess people will be a bit more sensible,

His real memorial lies in his published work, which continues to be cited regularly after his death.

His drawings are a reminder that there is more to human beings than getting grants. And that there is more to human beings than science.

Click the picture for an album of ten of his drawings. In the album there are also pictures of two books that were written for children by Stefan’s father, Dieter Grimm.

sg1

Dated Christmas eve,1979 (age 16)

 

Follow-up

Well well. It seems that Imperial are having an "HR Showcase: Supporting our people" on 15 October. And the introduction is being given by none other than Professor Martin Wilkins, the very person whose letter to Grimm must bear some responsibility for his death. I’ll be interested to hear whether he shows any contrition. I doubt whether any employees will dare to ask pointed questions at this meeting, but let’s hope they do.

This is very quick synopsis of the 500 pages of a report on the use of metrics in the assessment of research. It’s by far the most thorough bit of work I’ve seen on the topic. It was written by a group, chaired by James Wilsdon, to investigate the possible role of metrics in the assessment of research.

The report starts with a bang. The foreword says

"Too often, poorly designed evaluation criteria are “dominating minds, distorting behaviour and determining careers.”1 At their worst, metrics can contribute to what Rowan Williams, the former Archbishop of Canterbury, calls a “new barbarity” in our universities."

"The tragic case of Stefan Grimm, whose suicide in September 2014 led Imperial College to launch a review of its use of performance metrics, is a jolting reminder that what’s at stake in these debates is more than just the design of effective management systems." 

"Metrics hold real power: they are constitutive of values, identities and livelihoods "

And the conclusions (page 12 and Chapter 9.5) are clear that metrics alone can measure neither the quality of research, nor its impact.

"no set of numbers,however broad, is likely to be able to capture the multifaceted and nuanced judgements on the quality of research outputs that the REF process currently provides"

"Similarly, for the impact component of the REF, it is not currently feasible to use quantitative indicators in place of narrative impact case studies, or the impact template"

These conclusions are justified in great detail in 179 pages of the main report, 200 pages of the literature review, and 87 pages of Correlation analysis of REF2014 scores and metrics

The correlation analysis shows clearly that, contrary to some earlier reports, all of the many metrics that are considered predict the outcome of the 2014 REF far too poorly to be used as a substitute for reading the papers.

There is the inevitable bit of talk about the "judicious" use of metrics tp support peer review (with no guidance about what judicious use means in real life) but this doesn’t detract much from an excellent and thorough job.

Needless to say, I like these conclusions since they are quite similar to those recommended in my submission to the report committee, over a year ago.

Of course peer review is itself fallible. Every year about 8 million researchers publish 2.5 million articles in 28,000 peer-reviewed English language journals (STM report 2015 and graphic, here). It’s pretty obvious that there are not nearly enough people to review carefully such vast outputs. That’s why I’ve said that any paper, however bad, can now be printed in a journal that claims to be peer-reviewed. Nonetheless, nobody has come up with a better system, so we are stuck with it.

It’s certainly possible to judge that some papers are bad. It’s possible, if you have enough expertise, to guess whether or not the conclusions are justified. But no method exists that can judge what the importance of a paper will be in 10 or 20 year’s time. I’d like to have seen a frank admission of that.

If the purpose of research assessment is to single out papers that will be considered important in the future, that job is essentially impossible. From that point of view, the cost of research assessment could be reduced to zero by trusting people to appoint the best people they can find, and just give the same amount of money to each of them. I’m willing to bet that the outcome would be little different. Departments have every incentive to pick good people, and scientists’ vanity is quite sufficient motive for them to do their best.

Such a radical proposal wasn’t even considered in the report, which is a pity. Perhaps they were just being realistic about what’s possible in the present climate of managerialism.

Other recommendations include

"HEIs should consider signing up to the San Francisco Declaration on Research Assessment (DORA)"

4. "Journal-level metrics, such as the Journal Impact Factor (JIF), should not be used."

It’s astonishing that it should be still necessary to deplore the JIF almost 20 years after it was totally discredited. Yet it still mesmerizes many scientists. I guess that shows just how stupid scientists can be outside their own specialist fields.

DORA has over 570 organisational and 12,300 individual signatories, BUT only three universities in the UK have signed (Sussex, UCL and Manchester). That’s a shocking indictment of the way (all the other) universities are run.

One of the signatories of DORA is the Royal Society.

"The RS makes limited use of research metrics in its work. In its publishing activities, ever since it signed DORA, the RS has removed the JIF from its journal home pages and marketing materials, and no longer uses them as part of its publishing strategy. As authors still frequently ask about JIFs, however, the RS does provide them, but only as one of a number of metrics".

That’s a start. I’ve advocated making it a condition to get any grant or fellowship, that the university should have signed up to DORA and Athena Swan (with checks to make sure they are actually obeyed).

And that leads on naturally to one of the most novel and appealing recommendations in the report.

"A blog will be set up at http://www.ResponsibleMetrics.org
The site will celebrate responsible practices, but also name and shame bad practices when they occur"

"every year we will award a “Bad Metric” prize to the most
egregious example of an inappropriate use of quantitative indicators in research management."

This should be really interesting. Perhaps I should open a book for which university is the first to win "Bad Metric" prize.

The report covers just about every aspect of research assessment: perverse incentives, whether to include author self-citations, normalisation of citation impact indicators across fields and what to do about the order of authors on multi-author papers.

It’s concluded that there are no satisfactory ways of doing any of these things. Those conclusions are sometimes couched in diplomatic language which may, uh, reduce their impact, but they are clear enough.

The perverse incentives that are imposed by university rankings are considered too. They are commercial products and if universities simply ignored them, they’d vanish. One important problem with rankings is that they never come with any assessment of their errors. It’s been known how to do this at least since Goldstein & Spiegelhalter (1996, League Tables and Their Limitations: Statistical Issues in Comparisons Institutional Performance). Commercial producers of rankings don’t do it, because to do so would reduce the totally spurious impression of precision in the numbers they sell. Vice-chancellors might bully staff less if they knew that the changes they produce are mere random errors.

Metrics, and still more altmetrics, are far too crude to measure the quality of science. To hope to do that without reading the paper is pie in the sky (even reading it, it’s often impossible to tell).

The only bit of the report that I’m not entirely happy about is the recommendation to spend more money investigating the metrics that the report has just debunked. It seems to me that there will never be a way of measuring the quality of work without reading it. To spend money on a futile search for new metrics would take money away from science itself. I’m not convinced that it would be money well-spent.

Follow-up

Jump to follow-up

There can be no doubt that the situation for women has improved hugely since I started at UCL, 50 years ago. At that time women were not allowed in the senior common room. It’s improved even more since the 1930s (read about the attitude of the great statistician, Ronald Fisher, to Florence Nightinglale David).

Recently Williams & Ceci published data that suggest that young women no longer face barriers in job selection in the USA (though it will take 20 years before that feeds through to professor level). But no sooner than one was feeling optimistic, along comes Tim Hunt who caused a media storm by advocating male-only labs. I’ll say a bit about that case below.

First some very preliminary concrete proposals.

The job of emancipation is not yet completed. I’ve recently become a member of the Royal Society diversity committee, chaired by Uta Frith. That’s made me think more seriously about the evidence concerning the progress of women and of black and minority ethnic (BME) people in science, and what can be done about it. Here are some preliminary thoughts. They are my opinions, not those of the committee.

I suspect that much of the problem for women and BME results from over-competitiveness and perverse incentives that are imposed on researchers. That’s got progressively worse, and it affects men too. In fact it corrupts the entire scientific process.

One of the best writers on these topics is Peter Lawrence. He’s an eminent biologist who worked at the famous Lab for Molecular Biology in Cambridge, until he ‘retired’.

Here are three things by him that everyone should read.

PL

The politics of publication (Nature, 2003) [pdf]

The mismeasurement of science (Current Biology, 2007) [pdf]

The heart of research is sick (Lab Times, 2011) [pdf]

From Lawrence (2003)

"Listen. All over the world scientists are fretting. It is night in London and Deborah Dormouse is unable to sleep. She can’t decide whether, after four weeks of anxious waiting, it would be counterproductive to call a Nature editor about her manuscript. In the sunlight in Sydney, Wayne Wombat is furious that his student’s article was rejected by Science and is taking revenge on similar work he is reviewing for Cell. In San Diego, Melissa Mariposa reads that her article submitted to Current Biology will be reconsidered, but only if it is cut in half. Against her better judgement, she steels herself to throw out some key data and oversimplify the conclusions— her postdoc needs this journal on his CV or he will lose a point in the Spanish league, and that job in Madrid will go instead to Mar Maradona."

and

"It is we older, well-established scientists who have to act to change things. We should make these points on committees for grants and jobs, and should not be so desperate to push our papers into the leading journals. We cannot expect younger scientists to endanger their future by making sacrifices for the common good, at least not before we do."

From Lawrence (2007)

“The struggle to survive in modern science, the open and public nature of that competition, and the advantages bestowed on those who are prepared to show off and to exploit others have acted against modest and gentle people of all kinds — yet there is no evidence, presumption or likelihood that less pushy people are less creative.  As less aggressive people are predominantly women [14,15] it should be no surprise that, in spite of an increased proportion of women entering biomedical research as students, there has been little, if any, increase in the representation of women at the top [16]. Gentle people of both sexes vote with their feet and leave a profession that they, correctly, perceive to discriminate against them [17]. Not only do we lose many original researchers, I think science would flourish more in an understanding and empathetic workplace.”

From Lawrence (2011).

"There’s a reward system for building up a large group, if you can, and it doesn’t really matter how many of your group fail, as long as one or two succeed. You can build your career on their success".

Part of this pressure comes from university rankings. They are statistically-illiterate and serve no useful purpose, apart from making money for their publishers and providing vice-chancellors with an excuse to bullying staff in the interests of institutional willy-waving.

And part of the pressure arises from the money that comes with the REF.  A recent survey gave rise to the comment

"Early career researchers overwhelmingly feel that the research excellence framework has created “a huge amount of pressure and anxiety, which impacts particularly on those at the bottom rung of the career ladder"

In fact the last REF was conducted quite sensibly (e.g. use of silly metrics was banned).  The problem was that universities didn’t believe that the rules would be followed.

For example, academics in the Department of Medicine at Imperial College London were told (in 2007) they are expected to

“publish three papers per annum, at least one in a prestigious journal with an impact factor of at least five”. 

And last year a 51-year-old academic with a good publication record was told that unless he raised £200,000 in grants in the next year, he’d be fired.  There can be little doubt that this “performance management” contributed to his decision to commit suicide.  And Imperial did nothing to remedy the policy after an internal investigation.

Several other universities have policies that are equally brutal. For example, Warwick, Queen Mary College London and Kings College London

Crude financial targets for grant income should be condemned as defrauding the taxpayer (you are compelled to make your work as expensive as possible)  As usual, women and BME suffer disproportionately from such bullying.

What can be done about this in practice?

I feel that some firm recommendations will be useful. 

One thing that could be done is to make sure that all universities sign, and adhere to, the San Francisco Declaration on Research Assessment (DORA), and adhere to the Athena Swan charter

The Royal Society has already signed DORA, but, shockingly, only three universities in the UK have done so (Sussex, UCL and Manchester).

Another well-meaning initiative is The Concordat to Support the Career Development of Researchers. It’s written very much from the HR point of view and I’d argue that that’s part of the problem, not part of the solution.
For example it says

“3. Research managers should be required to participate in active performance management, including career development guidance”

That statement is meaningless without any definition of how performance management should be done. It’s quite clear that “performance management”, in the form of crude targets, was a large contributor to Stefan Grimm’s suicide

The Concordat places great emphasis in training programmes, but ignores the fact that it’s doubtful whether diversity training works, and it may even have bad effects.

The Concordat is essentially meaningless in its present form.

My proposals

I propose that all fellowships and grants should be awarded only to universities who have signed DORA and Athena Swan.

I have little faith that signing DORA, or the Concordat, will have much effect on the shop floor, but they do set a standard, and eventually, as with changes in the law, improvements in behaviour are effected.

But, as a check, It should be announced at the start that fellows and employees paid by grants will be asked directly whether or not these agreements have been honoured in practice.

Crude financial targets are imposed at one in six universities. Those who do that should be excluded from getting fellowships or grants, on the grounds that the process gives bad value to the funders (and taxpayer) and that it endangers objectivity.

Some thoughts in the Hunt affair

It’s now 46 years since I and Brian Woledge managed to get UCL’s senior common room, the Housman room, opened to women. That was 1969, and since then, I don’t think that I’ve heard any public statement that was so openly sexist as Tim Hunt’s now notorious speech in Korea.

Listen to Hunt, Connie St Louis and Jenny Rohn on the Today programme (10 June, 2015). sl50

On the Today Programme, Hunt himself said "What I said was quite accurately reported" and "I just wanted to be honest", so there’s no doubt that those are his views. He confirmed that the account that was first tweeted by Connie St Louis was accurate

Inevitably, there was a backlash from libertarians and conservatives. That was fuelled by a piece in today’s Observer, in which Hunt seems to regard himself as being victimised. My comment on the Observer piece sums up my views.

I was pretty shaken when I heard what Tim Hunt had said, all the more because I have recently become a member of the Royal Society’s diversity committee. When he talked about the incident on the Today programme on 10 June, it certainly didn’t sound like a joke to me. It seems that he carried on for more than 5 minutes in they same vein.

Everyone appreciates Hunt’s scientific work, but the views that he expressed about women are from the dark ages. It seemed to me, and to Dorothy Bishop, and to many others, that with views like that. Hunt should not play any part in selection or policy matters. The Royal Society moved with admirable speed to do that.

The views that were expressed are so totally incompatible with UCL’s values, so it was right that UCL too acted quickly. His job at UCL was an honorary one: he is retired and he was not deprived of his lab and his living, as some people suggested.

Although the initial reaction, from men as well as from women, was predictably angry, it very soon turned to humour, with the flood of #distractinglysexy tweets.

It would be a mistake to think that these actions were the work of PR people. They were thought to be just by everyone, female or male, who wants to improve diversity in science.

The episode is sad and disappointing. But the right things were done quickly.

Now Hunt can be left in peace to enjoy his retirement.

Look at it this way. If you were a young woman, applying for a fellowship in competition with men. what would you think if Tim Hunt were on the selection panel?

After all this fuss, we need to laugh.

Here is a clip from the BBC News Quiz, in which actor, Rebecca Front, gives her take on the affair.sl50

Follow-up

Some great videos soon followed Hunt’s comments. Try these.
Nobel Scientist Tim Hunt Sparks a #Distractinglysexy Campaign
(via Jennifer Raff)

This video has some clips from an earlier one, from Suzi Gage “Science it’s a girl thing”.

15 June 2015

An update on what happened from UCL. From my knowledge of what happened, this is not PR spin. It’s true.

16 June 2015

There is an interview with Tim Hunt in Lab Times that’s rather revealing. This interview was published in April 2014, more than a year before the Korean speech. Right up to the penultimate paragraph we agree on just about everything, from the virtue of small groups to the iniquity of impact factors. But then right at the end we read this.

In your opinion, why are women still under-represented in senior positions in academia and funding bodies?

Hunt:  I’m not sure there is really a problem, actually. People just look at the statistics. I dare, myself, think there is any discrimination, either for or against men or women. I think people are really good at selecting good scientists but I must admit the inequalities in the outcomes, especially at the higher end, are quite staggering. And I have no idea what the reasons are. One should start asking why women being under-represented in senior positions is such a big problem. Is this actually a bad thing? It is not immediately obvious for me… is this bad for women? Or bad for science? Or bad for society? I don’t know, it clearly upsets people a lot.

This suggests to me that the outburst on 8th June reflected opinions that Hunt has had for a while.

There has been quite a lot of discussion of Hunt’s track record. These tweets suggest it may not be blameless.

19 June 2015

Yesterday I was asked by the letters editor of the Times, Andrew Riley, to write a letter in response to a half-witted, anonymous, Times leading article. I dropped everything, and sent it. It was neither acknowledged nor published. Here it is [download pdf].

One of the few good outcomes of the sad affair of Tim Hunt is that it has brought to light the backwoodsmen who are eager to defend his actions, and to condemn UCL.  The anonymous Times leader of 16 June was as good an example as any.
Here are seven relevant considerations.

  1. Honorary jobs have no employment contract, so holders of them are not employees in the normal sense of the term.  Rather, they are eminent people who agree to act as ambassadors for the university,
  2. Hunt’s remarks were not a joke –they were his genuine views. He has stated them before and he confirmed them on the Today programme,
  3. He’s entitled to hold these views but he’s quite sensible enough to see that UCL would be criticised harshly if he were to remain in his ambassadorial role so he relinquished it before UCL was able to talk to him.
  4. All you have to do to see the problems is to imagine yourself as a young women, applying for a grant or fellowship, in competition with men, knowing that Hunt was one of her judges.  Would your leader have been so eager to defend a young Muslim who advocated men only labs?  Or someone who advocated Jew-free labs? The principle is the same.
  5. Advocacy of all male labs is not only plain silly, it’s also illegal under the Equalities Act (2010). 
  6. UCL’s decision to accept Hunt’s offer to relinquish his role was not the result of a twitter lynch mob. The comments there rapidly became good humoured  If there is a witch hunt, it is by your leader writer and the Daily Mail, eager to defend the indefensible and to condemn UCL and the Royal Society
  7. It has been suggested to me that it would have been better if Hunt had been brought before a disciplinary committee, so due process would have been observed.  I can imagine nothing that would have been more cruel to a distinguished colleague than to put him through such a miserable ordeal.

Some quotations from this letter were used by Tom Whipple in an article about Richard Dawkins surprising (to me) emergence as an unreconstructed backwoodsman.

18 June 2015

Adam Rutherford’s excellent Radio 4 programme, Inside Science, had an episode “Women Scientists on Sexism in Science". The last speaker was Uta Frith (who is chair of the Royal Society’s diversity committee). Her contribution started at about 23 min.

Listen to Uta Frith’s contribution. sl30

" . . this over-competitiveness, and this incredible rush to publish fast, and publish in quantity rather than in quality, has been extremely detrimental for science, and it has been disproportionately bad, I think, for under-represented groups who don’t quite fit in to this over-competitive climate. So I am proposing something I like to call slow science . . . why is this necessary, to do this extreme measurement-driven, quantitative judgement of output, rather than looking at the actual quality"

That, I need hardly say, is music to my ears. Why not, for example, restrict the number of papers that an be submitted with fellowship applications to four (just as the REF did)?

21 June 2015

I’ve received a handful of letters, some worded in a quite extreme way, telling me I’m wrong. It’s no surprise that 100% of them are from men. Most are from more-or-less elderly men. A few are from senior men who run large groups. I have no way to tell whether their motive is a genuine wish to have freedom of speech at any price. Or whether their motives are less worthy: perhaps some of them are against anything that prevents postdocs working for 16 hours a day, for the glory of the boss. I just don’t know.

I’ve had far more letters saying that UCL did the right thing when it accepted Tim Hunt’s offer to resign from his non job at UCL. These letters are predominantly from young people, men as well as women. Almost all of them ask not to be identified in public. They are, unsurprisingly, scared to argue with the eight Nobel prizewinners who have deplored UCL’s action (without bothering to ascertain the facts). The fact that they are scared to speak out is hardly surprising. It’s part of the problem.

What you can do, if you don’t want to put your head above the public parapet. is simply to email the top people at UCL, in private. to express your support. All these email addresses are open to the public in UCL’s admirably open email directory.

Michael Arthur (provost): michael.arthur@ucl.ac.uk

David Price (vice-provost research): d.price@ucl.ac.uk

Geraint Rees (Dean of the Faculty of Life Sciences): g.rees@ucl.ac.uk

All these people have an excellent record on women in science, as illustrated by the response to Daily Mail’s appalling behaviour towards UCL astrophysicist, Hiranya Pereis.

26 June 2015

The sad matter of Tim Hunt is over, at last. The provost of UCL, Michael Arthur has now made a statement himself. Provost’s View: Women in Science is an excellent reiteration of UCL’s principles.

By way of celebration, here is the picture of the quad, taken on 23 March, 2003. It was the start of the second great march to try to stop the war in Iraq. I use it to introduce talks, as a reminder that there are more serious consequences of believing things that aren’t true than a handful of people taking sugar pills.

ucl-quad-200303-500.jpg

11 October 2015

In which I agree with Mary Collins

Long after this unpleasant row died down, it was brought back to life yesterday when I heard that Colin Blakemore had resigned as honorary president of the Association of British Science Writers (ABSW), on the grounds that that organisation had not been sufficiently hard on Connie St Louis, whose tweet initiated the whole affair. I’m not a member of the ABSW and I have never met St Louis, but I know Blakemore well and like him. Nevertheless it seems to me to be quite disproportionate for a famous elderly white man to take such dramatic headline-grabbing action because a young black women had exaggerated bits of her CV. Of course she shouldn’t have done that, but it everyone were punished so severely for "burnishing" their CV there would be a large number of people in trouble.

Blakemore’s own statement also suggested that her reporting was inaccurate (though it appears that he didn’t submitted a complaint to ABSW). As I have said above, I don’t think that this is true to any important extent. The gist of it was said was verified by others, and, most importantly, Hunt himself said "What I said was quite accurately reported" and "I just wanted to be honest". As far as I know, he hasn’t said anything since that has contradicted that view, which he gave straight after the event. The only change that I know of is that the words that were quoted turned out to have been followed by "Now, seriously", which can be interpreted as meaning that the sexist comments were intended as a joke. If it were not for earlier comments along the same lines, that might have been an excuse.

Yesterday, on twitter, I was asked by Mary Collins, Hunt’s wife, whether I thought he was misogynist. I said no and I don’t believe that it is. It’s true that I had used that word in a single tweet, long since deleted, and that was wrong. I suspect that I felt at the time that it sounded like a less harsh word than sexist, but it was the wrong word and I apologised for using it.

So do I believe that Tim Hunt is sexist? No I don’t. But his remarks both in Korea and earlier were undoubtedly sexist. Nevertheless, I don’t believe that, as a person, he suffers from ingrained sexism. He’s too nice for that. My interpretation is that (a) he’s so obsessive about his work that he has little time to think about political matters, and (b) he’s naive about the public image that he presents, and about how people will react to them. That’s a combination that I’ve seen before among some very eminent scientists.

In fact I find myself in almost complete agreement with Mary Collins, Hunt’s wife, when she said (I quote the Observer)

“And he is certainly not an old dinosaur. He just says silly things now and again.” “Collins clutches her head as Hunt talks. “It was an unbelievably stupid thing to say,” she says. “You can see why it could be taken as offensive if you didn’t know Tim. But really it was just part of his upbringing. He went to a single-sex school in the 1960s.”

Nevertheless, I think it’s unreasonable to think that comments such as those made in Korea (and earlier) would not have consequences, "naive" or not, "joke" or not, "upbringing" or not,

It’s really not hard to see why there were consequences. All you have to do is to imagine yourself as a woman, applying for a grant or fellowship, and realising that you’d be judged by Hunt. And if you think that the reaction was too harsh, imagine the same words being spoken with "blacks", or "Jews" substituted for "women". Of course I’m not suggesting for a moment that he’d have done this, but if anybody did, I doubt whether many people would have thought it was a good joke.

9 November 2015

An impressively detailed account of the Hunt affair has appeared. The gist can be inferred from the title: "Saving Tim Hunt
The campaign to exonerate Tim Hunt for his sexist remarks in Seoul is built on myths, misinformation, and spin
". It was written by Dan Waddell (@danwaddell) and Paula Higgins (@justamusicprof). It is long and it’s impressively researched. it’s revealing to see the bits that Louise Mensch omitted from her quotations. I can’t disagree with its conclusion.

"In the end, the parable of Tim Hunt is indeed a simple one. He said something casually sexist, stupid and inappropriate which offended many of his audience. He then confirmed he said what he was reported to have said and apologised twice. The matter should have stopped there. Instead a concerted effort to save his name — which was not disgraced, nor his reputation as a scientist jeopardized — has rewritten history. Science is about truth. As this article has shown, we have seen very little of it from Hunt’s apologists — merely evasions, half-truths, distortions, errors and outright falsehoods.

"

8 April 2017

This late addition is to draw attention to a paper, wriiten by Edwin Boring in 1951, about the problems for the advancement of women in psychology. It’s remarkable reading and many of the roots of the problems have hardly changed today. (I chanced on the paper while looking for a paper that Boring wrote about P values in 1919.)

Here is a quotation from the conclusions.

“Here then is the Woman Problem as I see it. For the ICWP or anyone else to think that the problem.can be advanced toward solution by proving that professional women undergo more frustration and disappointment than professional men, and by calling then on the conscience of the profession to right a wrong, is to fail to see the problem clearly in all its psychosocial complexities. The problem turns on the mechanisms for prestige, and that prestige, which leads to honor and greatness and often to the large salaries, is not with any regularity proportional to professional merit or the social value of professional achievement. Nor is there any presumption that the possessor of prestige knows how to lead the good life. You may have to choose. Success is never whole, and, if you have it for this, you mayhave to give it up for that.”

Jump to follow-up

This post was written for the Spectator Health section, at short notice after the release of the spider letters. The following version is almost the same as appeared there, with a few updates. Some of the later sections are self-plagiarised from earlier posts.


pow
Picture: Getty

The age of enlightenment was a beautiful thing. People cast aside dogma and authority. They started to think for themselves. Natural science flourished. Understanding of the natural world increased. The hegemony of religion slowly declined. Eventually real universities were created and real democracy developed. The modern world was born.

People like Francis Bacon, Voltaire and Isaac Newton changed the world for the better. Well, that’s what most people think. But not Charles, Prince of Wales and Duke of Cornwall.

In 2010 he said

"I was accused once of being the enemy of the Enlightenment,” he told a conference at St James’s Palace. “I felt proud of that.” “I thought, ‘Hang on a moment’. The Enlightenment started over 200 years ago. It might be time to think again and review it and question whether it is really effective in today’s conditions."

It seems that the Prince preferred things as they were before 1650. That’s a remarkable point of view for someone who, if he succeeds, will become the patron of that product of the age of enlightenment, the Royal Society, a venture that got its Royal Charter from King Charles II in1622.

I suppose that the Prince cannot be blamed for his poor education. He may have been at Trinity College Cambridge, his 2.2 degree is the current euphemism for a fail (it seems that he even failed to learn the dates of the enlightenment).

His behaviour has brought to the fore the question of the role of the monarchy.

A constitutional monarch is purely ceremonial and plays no part in politics. Well actually in the UK it isn’t quite as simple as that. The first problem is that we have no constitution. Things haven’t changed much since the 19th century when Walter Bagehot said “the Sovereign has, under a constitutional monarchy… three rights—the right to be consulted, the right to encourage, the right to warn.”.

These are real powers in a country which is meant to be run by elected representatives. But nobody knows how these powers are used: it is all done in secret. Well, almost all. The Prince of Wales has been unusually public in expressing his views. His views bear directly on government policy in many areas: medicine, architecture, agriculture and the environment. These are mostly areas that involve at least an elementary knowledge of science. But that is something that he lacks. Worse still, he seems to have no consciousness of his ignorance.

The Royal family should clearly have no influence whatsoever on government policies in a democracy. And they should be seen to have no influence. The Queen is often praised for her neutrality, but the fact is that nobody has the slightest idea what happens at the weekly meetings between the Prime Minister and the Queen. I doubt that she advises the prime minister to create a National Health Service, or to tax the rich. We shall never know that. We should do.

Almost the only light that has been thrown on the secret activities of Charles was the release, on 13 May, of 27 letters that the Prince wrote to government ministers in the Blair government between 2004 and 2005. It has take 10 years of effort by the Guardian to get hold of the letters. It was ike getting blood from a stone. When the Information Commissioner ruled that the letters should be made public, the decision was vetoed by the Conservative attorney general, Dominic Grieve. He said. of the "particularly frank" letters,

" Disclosure of the correspondence could damage The Prince of Wales’ ability to perform his duties when he becomes King."

That, of course, is precisely why the documents should be revealed.

If Charles’ ability to perform his duty as King is damaged, should his subjects be kept unaware of that fact? Of course not.

In this case, the law prevailed over the attorney general. After passing through the hands of 16 different judges, the Supreme Court eventually ruled, in March, that the government’s attempts to block release were unlawful. The government spent over £400,000 in trying, and failing, to conceal what we should know. The Freedom of Information Act (2000) is the best thing that Tony Blair did, though he, and Jack Straw, thought it was the worst. I expect they are afraid of what it might reveal about their own records. Transparency is not favoured by governments of any hue.

What do the letters say?

You can read all the letters on the Guardian web site. They give the impression of being written by a rather cranky old man with bees in his bonnet and too much time on his hands. The problem is that not all cranky old men can write directly to the prime minister, and get an answer.

Not all the letters are wrong headed. But all attempt to change government policy. They represent a direct interference in the political process by the heir to the throne. That is unacceptable in a democracy. It disqualifies him from becoming king.

Some letters verged on the bizarre.

21 October 2004
To Elliot Morley (Minister for the Environment)

I particularly hope that the illegal fishing of the Patagonian Toothfish will be high on your list of priorities because until the trade is stopped, there is little hope for the poor old albatross.

No doubt illegal fishing is a problem, but not many people would write directly to a minister about the Patagonian Toothfish.

Others I agree with. But they are still attempts to influence the policies of the elected government. This one was about the fact that supermarkets pay so little to dairy farmers for milk that sometimes it’s cheaper than bottled water.

To Tony Blair 8 September 2004

". . . unless United Kingdom co-operatives can grow sufficiently the processors and retailers will continue to have the farmers in an arm lock and we will continue to shoot ourselves in the foot! You did kindly say that you would look at this . . . ".

Yours ever,

Charles

He wrote to the minister of education to try to influence education policy.

22 February 2005
Ruth Kelly

"I understand from your predecessor, Charles Clarke, that he has spoken to you about my most recent letter of 24th November, and specifically about the impact of my Education Summer School for teachers of English and History. This Programme, which involves up to ninety state school teachers each year, has been held over the past three years in Dartington, Devon, at Dunston, in Norfolk and at Buxton, in Derbyshire. I believe that they have added fresh inspiration to the national debate about the importance of English Literature and History in schools."

Despite having made substantial progress, as you may be aware I remain convinced that the correct approaches to teaching and learning need to be challenged

It’s interesting that the meeting was in Dartington. That’s near Totnes ("twinned with Narnia") and it’s a centre for the bizarre educational cult promoted by the mystic and racist, Rudolf Steiner.

Then we get a reference to one of Charles’ most bizarre beliefs, alternative medicine.

24 February 2005
Tony Blair

Dear Prime Minister, 

We briefly mentioned the European Union Directive on Herbal Medicines, which is having such a deleterious effect on complementary medicine sector in this country by effectively outlawing the use of certain herbal extracts. I think we both agreed this was using a sledgehammer to crack a nut. You rightly asked me what could be done about it and I am asking the Chief Executive of my Foundation for Integrated Health to provide a more detailed briefing which I hope to be able to send shortly so that your advisers can look at it. Meanwhile, I have given Martin Hurst a note suggesting someone he could talk to who runs the Herbal Practitioner’s Association.

Yours ever, Charles

In this he opposes the EU Directive on Herbal Medicines. All this directive did was to insist that there was some anecdotal evidence for the safety of things that are sold to you. It asked for no evidence at all that they work, and it allowed very misleading labels. It provided the weakest form of protection from the deluded and charlatans. It was put into effect in the UK by the Medicines and Healthcare Products Regulatory Authority (MHRA). They even allowed products that were registered under this scheme to display an impressive-looking “kite-mark”. Most people would interpret this as a government endorsement of herbal medicines.

This got a sympathetic response from Tony Blair, someone who, along with his wife, was notoriously sympathetic to magic medicine.

30 March 2005
Response from Tony Blair

Dear Prince Charles

Thanks too for your contacts on herbal medicines who have been sensible and constructive. They feel that the directive itself is sound and the UK regulators excellent, but are absolutely correct in saying that the implementation as it is currently planned is crazy. We can do quite a lot here: we will delay implementation for all existing products to 2011; we will take more of the implementation upon ourselves; and I think we can sort out the problems in the technical committee – where my European experts have some very good ideas. We will be consulting with your contacts and others on the best way to do this we simply cannot have burdensome regulation here.

Yours ever, Tony

Note "absolutely correct in saying that the implementation as it is currently planned is crazy. We can do quite a lot here: we will delay implementation for all existing products to 2011".

Government support for acupuncture and herbal medicine was made explicit in a letter from Health Secretary, John Reid (February 2005). He assures the prince that government is taking action to "enhance the status of the herbal medicine and acupuncture professions".

jr to pow

Nothing could reveal more clearly the clueless attitude of the then government to quackery. In fact, after 15 years of wrangling, the promised recognition of herbalism by statutory regulation never happened. One is reminded of the time that an equally-clueless minister, Lord (Phillip) Hunt, referred to ‘psychic surgery’ as a “profession”.

We got a preview of the Prince’s letters a month before the release when Max Hastings wrote in the Spectator

I have beside me a copy of a letter allegedly written by him some years ago to a cultural institution, asserting the conviction that ‘there is a DIVINE Source which is ultimate TRUTH… that this Truth can be expressed by means of numbers… and that, if followed correctly, these principles can be expressed with infinite variety to produce Beauty’.

You can’t get much barmier than that.

Are the letters harmless?

That has been the reaction on the BBC. I can’t agree. In one sense they so trivial that it’s amazing that the government thought it was a good use of £400,000 to conceal them. But they are all the evidence that we’ll get of the Prince’s very direct attempts to influence the political process.

The Prince of Wales is more than just a crank. He has done real harm. Here are some examples.

When the generally admirable NHS Choices re-wrote their advice on homeopathy (the medicines that contain no medicine) the new advice took two years to appear. It was held up in the Department of Health while consultations were made with the Prince’s Foundation for Integrated Health. That’s Charles’ lobby organisation for crackpot medicine. (The word "integrated" is the euphemism for alternative medicine that’s in favour with its advocates.) If it were not for the fact that I used the Freedom of Information Act to find out what was going on, the public would have been given bad advice as a direct result of the Prince’s political interference.

The Prince’s Foundation for Integrated Health (FIH) folded in 2010 as a result of a financial scandal, but it was quickly reincarnated as the "College of Medicine". It was originally going to be named the College of Integrated Medicine, but it was soon decided that this sounded too much like quackery, so it was given the deceptive name, College of Medicine. It appears to be financed by well-known outsourcing company Capita. It’s closely connected with Dr Michael Dixon, who was medical advisor to the FIH, and who tried to derail the advice given by NHS Choices.

Perhaps the worst example of interference by the Prince of Wales, was his attempt to get an academic fired. Prof Edzard Ernst is the UK’s foremost expert on alternative medicine. He has examined with meticulous care the evidence for many sorts of alternative medicine.Unfortunately for its advocates, it turned out that there is very little evidence that any of it works. This attention to evidence annoyed the Prince, and a letter was sent from Clarence House to Ernst’s boss, the vice-chancellor of the University of Exeter, Steve Smith. Shamefully, Smith didn’t tell the prince to mind his ow business, but instead subjected Ernst to disciplinary proceedings, After subjecting him to a year of misery, he was let off with a condescending warning letter, but Ernst was forced to retire early. In 2011and the vice-chancellor was rewarded with a knighthood. His university has lost an honest scientist but continues to employ quacks.

Not just interfering but costing taxpayers’ money

The Prince’s influence seems to be big in the Department of Health (DH).  He was given £37,000 of taxpayers’ money to produce his Patients’ Guide (I produced a better version for nothing). And he was paid an astonishing £900,000 by DH to prepare the ground for the setting up of the hapless self-regulator, the Complementary and Natural Healthcare Council (CNHC, also known as Ofquack).

The Prince of Wales’ business, Duchy Originals, has been condemned by the Daily Mail, (of all places) for selling unhealthy foods. And when his business branched into selling quack “detox” and herbal nonsense he found himself censured by both the MHRA and the Advertising Standards Authority (ASA) for making unjustifiable medical claims for these products.

It runs in the family

The Prince of Wales is not the only member of the royal family to be obsessed with bizarre forms of medicine. The first homeopath to the British royal family, Frederick Quin, was a son of the Duchess of Devonshire (1765-1824).  Queen Mary (1865-1953), wife of King George V, headed the fundraising efforts to move and expand the London Homeopathic Hospital.  King George VI was so enthusiastic that in 1948 he conferred the royal title on the London Homeopathic Hospital.

The Queen Mother loved homeopathy too (there is no way to tell whether this contributed to her need for a colostomy in the 1960s).

The present Queen’s homeopathic physician is Peter Fisher, who is medical director of what, until recently was called the Royal London Homeopathic Hospital (RLHH).  In 2010 that hospital was rebranded as the Royal London Hospital for Integrated medicine (RLHIM) in another unsubtle bait and switch move. 

The RLHIM is a great embarrassment to the otherwise excellent UCLH Trust.  It has been repeatedly condemned by the Advertising Standards Authority for making false claims.  As a consequence, it has been forced to withdraw all of its patient information. 

The patron of the RLHIM is the Queen, not the Prince of Wales.  It is hard to imagine that this anachronistic institution would still exist if it were not for the influence, spoken or unspoken, of the Queen.  Needless to say we will never be told.

The royal warrant for a firm that sells "meningitis vaccine" that contains nothing

Ainsworth’s homeopathic pharmacy is endorsed by both Prince Charles and the Queen: it has two Royal Warrants, one from each of them.  They sell “homeopathic vaccines” for meningitis, measles, rubella and whooping cough. These “vaccines” contain nothing whatsoever so they are obviously a real danger to public health. 

Despite the fact that Ainsworth’s had already been censured by the ASA in 2011 for selling similar products, Ainsworth’s continued to recommend them with a “casual disregard for the law”.

The regulator (the MHRA) failed to step in to stop them until it was eventually stirred into action by a young BBC reporter, Sam Smith who made a programme for BBC South West.  Then, at last, the somnolent regulator was stirred into action.  The MHRA “told Ainsworths to stop advertising a number of products” (but apparently not to stop making them or selling them). 

They still sell Polonium metal 30C and Swine Meningitis 36C, and a booklet that recommends homeopathic “vaccination”.

Ainsworth’s sales are no doubt helped by the Royal Warrants.  The consequence is that people may die of meningitis. In 2011, the MHRA Chief Executive Professor Kent Woods, was knighted. It was commented, justly, that

"Children will be harmed by this inaction. Children will die. And the fault must lie with Professor Sir Kent Woods, chairman of the regulator "

But the regulator has to fight the political influence of the Queen and Prince Charles. They lost.

The attorney general, while trying to justify the secrecy of Charles’ letters, said

“It is a matter of the highest importance within our constitutional framework that the Monarch is a politically neutral figure”.

Questions about health policy are undoubtedly political, and the highly partisan interventions of the Prince in the political process make his behaviour unconstitutional.

The Prince’s petulant outbursts not only endanger patients. They endanger the monarchy itself.  Whether that matters depends on how much you value the tourist business generated by the Gilbert & Sullivan flummery at which royals excel. 

The least that one can ask of the royal family is that they should not endanger the health of the nation. It would help if they refrained from using their influence on matters that are beyond their intellectual grasp.. 

If I wanted to know the winner of the 2.30 at Ascot, I’d ask a royal. For any other question I’d ask someone with more education.

Follow-up

The letters have made headlines in just about every newspaper. The Guardian had extensive coverage, of course.

The Times had a front page story "Revealed: how Charles got Blair to alter health policy" [pdf]

The British Medical Journal wrote "Prince Charles delayed regulation of herbal medicines" [pdf]

For me, the most shocking item was an interview given by Jack Straw, on Radio 4’s Today Programme. He was Home Secretary from 1997 to 2001 and Foreign Secretary from 2001 to 2006 under Tony Blair. From 2007 to 2010 he was Lord Chancellor. His response to the letters sounded like that of a right-wing conservative.

Like Blair. he deplored the Freedom of Information Act that his own government passed. He defended the secrecy, and supported the Conservative attorney-general’s attempt to veto the release of the letters. Perhaps his defence of secrecy is not surprising, He has a lot to hide, His involvement in the mendacity that led to the Iraq war, the dodgy dossier, his role in covering up torture (the "rendition" scandal). And He was suspended by the Labour party in February 2015 due to allegation of cash bribes.

He is certainly a man with plenty of things to hide.

Listen to the interview, with John Humphrys speaker

There is a widespread belief that science is going through a crisis of reproducibility.  A meeting was held to discuss the problem.  It was organised by Academy of Medical Sciences, the Wellcome Trust, MRC and BBSRC, and It was chaired by Dorothy Bishop (of whose blog I’m a huge fan).  It’s good to see that scientific establishment is beginning to take notice.  Up to now it’s been bloggers who’ve been making the running.  I hadn’t intended to write a whole post about it, but some sufficiently interesting points arose that I’ll have a go.

The first point to make is that, as far as I know, the “crisis” is limited to, or at least concentrated in, quite restricted areas of science.  In particular, it doesn’t apply to the harder end of sciences. Nobody in physics, maths or chemistry talks about a crisis of reproducibility.  I’ve heard very little about irreproducibility in electrophysiology (unless you include EEG work).  I’ve spent most of my life working on single-molecule biophysics and I’ve never encountered serious problems with irreproducibility.  It’s a small and specialist field so I think if I would have noticed if it were there.  I’ve always posted on the web our analysis programs, and if anyone wants to spend a year re-analysing it they are very welcome to do so (though I have been asked only once).

The areas that seem to have suffered most from irreproducibility are experimental psychology, some areas of cell biology, imaging studies (fMRI) and genome studies.  Clinical medicine and epidemiology have been bad too.  Imaging and genome studies seem to be in a slightly different category from the others. They are largely statistical problems that arise from the huge number of comparisons that need to be done.  Epidemiology problems stem largely from a casual approach to causality. The rest have no such excuses.

The meeting was biased towards psychology, perhaps because that’s an area that has had many problems.  The solutions that were suggested were also biased towards that area.  It’s hard to see some of them could be applied to electrophysiology for example.

There was, it has to be said, a lot more good intentions than hard suggestions.  Pre-registration of experiments might help a bit in a few areas.  I’m all for open access and open data, but doubt they will solve the problem either, though I hope they’ll become the norm (they always have been for me).

All the tweets from the meeting hve been collected as a Storify. The most retweeted comment was from Liz Wager

@SideviewLiz: Researchers are incentivised to publish, get grants, get promoted but NOT incentivised to be right! #reprosymp

This, I think, cuts to the heart if the problem.  Perverse incentives, if sufficiently harsh, will inevitably lead to bad behaviour.  Occasionally it will lead to fraud. It’s even led to (at least) two suicides.  If you threaten people in their forties and fifties with being fired, and losing their house, because they don’t meet some silly metric, then of course people will cut corners.  Curing that is very much more important than pre-registration, data-sharing and concordats, though the latter occupied far more of the time at the meeting.  

The primary source of the problem is that there is not enough money for the number of people who want to do research (a matter that was barely mentioned).  That leads to the unpalatable conclusion that the only way to cure the problem is to have fewer people competing for the money.  That’s part of the reason that I suggested recently a two-stage university system.  That’s unlikely to happen soon. So what else can be done in the meantime?

The responsibility for perverse incentives has to rest squarely on the shoulders of the senior academics and administrators who impose them.  It is at this level that the solutions must be found.  That was said, but not firmly enough. The problems are mostly created by the older generation   It’s our fault.

IncidentalIy, I was not impressed by the fact that the Academy of Medical Sciences listed attendees with initials after peoples’ names. There were eight FRSs but I find it a bit embarrassing to be identified as one, as though it made any difference to the value of what I said.

It was suggested that courses in research ethics for young scientists would help.  I disagree.  In my experience, young scientists are honest and idealistic. The problems arise when their idealism is shattered by the bad example set by their elders.  I’ve had a stream of young people in my office who want advice and support because they feel they are being pressured by their elders into behaviour which worries them. More than one of them have burst into tears because they feel that they have been bullied by PIs.

One talk that I found impressive was Ottloline Leyser who chaired the recent report on The Culture of Scientific Research in the UK, from the Nuffield Council on Bioethics.  But I found that report to be bland and its recommendations, though well-meaning, unlikely to result in much change.  The report was based on a relatively small, self-selected sample of 970 responses to a web survey, and on 15 discussion events.  Relatively few people seem to have spent time filling in the text boxes, For example

“Of the survey respondents who provided a negative comment on the effects of competition in science, 24 out of 179 respondents (13 per cent) believe that high levels of competition between individuals discourage research collaboration and the sharing of data and methodologies.&rdquo:

Such numbers are too small to reach many conclusions, especially since the respondents were self-selected rather than selected at random (poor experimental design!).  Nevertheless, the main concerns were all voiced.  I was struck by

“Almost twice as many female survey respondents as male respondents raise issues related to career progression and the short term culture within UK research when asked which features of the research environment are having the most negative effect on scientists”

But no conclusions or remedies were put forward to remedy this problem.  It was all put rather better, and much more frankly, some time ago by Peter Lawrence.  I do have the impression that bloggers (including Dorothy Bishop) get to the heart of the problems much more directly than any official reports.

The Nuffield report seemed to me to put excessive trust in paper exercises, such as the “Concordat to Support the Career Development of Researchers”.  The word “bullying” does not occur anywhere in the Nuffield document, despite the fact that it’s problem that’s been very widely discussed and a problem that’s critical for the problems of reproducibility. The Concordat (unlike the Nuffield report) does mention bullying.

"All managers of research should ensure that measures exist at every institution through which discrimination, bullying or harassment can be reported and addressed without adversely affecting the careers of innocent parties. "

That sounds good, but it’s very obvious that there are many places simply ignore it. All universities subscribe to the Concordat. But signing is as far as it goes in too many places.   It was signed by Imperial College London, the institution with perhaps the worst record for pressurising its employees, but official reports would not dream of naming names or looking at publicly available documentation concerning bullying tactics. For that, you need bloggers.

On the first day, the (soon-to-depart) Dean of Medicine at Imperial, Dermot Kelleher, was there. He seemed a genial man, but he would say nothing about the death of Stefan Grimm. I find that attitude incomprehensible. He didn’t reappear on the second day of the meeting.

The San Francisco Declaration on Research Assessment (DORA) is a stronger statement than the Concordat, but its aims are more limited.  DORA states that the impact factor is not to be used as a substitute “measure of the quality of individual research articles, or in hiring, promotion, or funding decisions”. That’s something that I wrote about in 2003, in Nature. In 2007 it was still rampant, including at Imperial College. It still is in many places.  The Nuffield Council report says that DORA has been signed by “over 12,000 individuals and 500 organisations”, but fails to mention the fact that only three UK universities have signed up to DORA (oneof them, I’m happy to say, is UCL).  That’s a pretty miserable record. And, of course, it remains to be seen whether the signatories really abide by the agreement.  Most such worthy agreements are ignored on the shop floor.

The recommendations of the Nuffield Council report are all worthy, but they are bland and we’ll be lucky if they have much effect. For example

“Ensure that the track record of researchers is assessed broadly, without undue reliance on journal impact factors”

What on earth is “undue reliance”?  That’s a far weaker statement than DORA. Why?

And

“Ensure researchers, particularly early career researchers, have a thorough grounding in research ethics”

In my opinion, what we should say to early career researchers is “avoid the bad example that’s set by your elders (but not always betters)”. It’s the older generation which has produced the problems and it’s unbecoming to put the blame on the young.  It’s the late career researchers who are far more in need of a thorough grounding in research ethics than early-career researchers.

Although every talk was more or less interesting, the one I enjoyed most was the first one, by Marcus Munafo.  It assessed the scale of the problem (though with a strong emphasis on psychology, plus some genetics and epidemiology),  and he had good data on under-powered studies.  It also made a fleeting mention of the problem of the false discovery rate.  Since the meeting was essentially about the publication of results that aren’t true, I would have expected the statistical problem of the false discovery rate to have been given much more prominence than it was. Although Ioannidis’ now-famous paper “Why most published research is wrong” got the occasional mention, very little attention (apart from Munafo and Button) was given to the problems which he pointed out. 

I’ve recently convinced myself that, if you declare that you’ve made a discovery when you observe P = 0.047 (as is almost universal in the biomedical literature) you’ll be wrong 30 – 70%  of the time (see full paper, "An investigation of the false discovery rate and the misinterpretation of p-values".and simplified versions on Youtube and on this blog).  If that’s right, then surely an important way to reduce the publication of false results is for journal editors to give better advice about statistics.  This is a topic that was almost absent from the meeting.  It’s also absent from the Nuffield Council report (the word “statistics” does not occur anywhere).

In summary, the meeting was very timely, and it was fun.  But I ended up thinking it had a bit too much of preaching good intentions to the converted. It failed to grasp some of the nettles firmly enough. There was no mention of what’s happening at Imperial, or Warwick, or Queen Mary, or at Kings College London. Let’s hope that when it’s written up, the conclusion will be a bit less bland than those of most official reports. 

It’s overdue that we set our house in order, because the public has noticed what’s going on. The New York Times was scathing in 2006. This week’s Economist said

"Modern scientists are doing too much trusting and not enough verifying -to the detriment of the whole of science, and of humanity.
Too many of the findings that fill the academic ether are the result of shoddy experiments or poor analysis"

"Careerism also encourages exaggeration and the cherry­picking of results."

This is what the public think of us. It’s time that vice-chancellors did something about it, rather than willy-waving about rankings.

Conclusions

After criticism of the conclusions of official reports, I guess that I have to make an attempt at recommendations myself.  Here’s a first attempt.

  1. The heart of the problem is money. Since the total amount of money is not likely to increase in the short term, the only solution is to decrease the number of applicants.  This is a real political hot-potato, but unless it’s tackled the problem will persist.  The most gentle way that I can think of doing this is to restrict research to a subset of universities. My proposal for a two stage university system might go some way to achieving this.  It would result in better postgraduate education, and it would be more egalitarian for students. But of course universities that became “teaching only” would see (wrongly) as demotion, and it seems that UUK is unlikely to support any change to the status quo (except, of course, for increasing fees).
  2. Smaller grants, smaller groups and fewer papers would benefit science.
  3. Ban completely the use of impact factors and discourage use of all metrics. None has been shown to measure future quality.  All increase the temptation to “game the system” (that’s the usual academic euphemism for what’s called cheating if an undergraduate does it).
  4. “Performance management” is the method of choice for bullying academics.  Don’t allow people to be fired because they don’t achieve arbitrary targets for publications or grant income. The criteria used at Queen Mary London, and Imperial, and Warwick and at Kings, are public knowledge.  They are a recipe for employing spivs and firing Nobel Prize winners: the 1991 Nobel Laureate in Physiology or Medicine would have failed Imperial’s criteria in 6 years out of 10 years when he was doing the work which led to the prize.
  5. Universities must learn that if you want innovation and creativity you have also to tolerate a lot of failure.
  6. The ranking of universities by ranking businesses or by the REF encourages bad behaviour by encouraging vice-chancellors to improve their ranking, by whatever means they can. This is one reason for bullying behaviour.  The rankings are totally arbitrary and a huge waste of money.  I’m not saying that universities should be unaccountable to taxpayers. But all you have to do is to produce a list of publications to show that very few academics are not trying. It’s absurd to try to summarise a whole university in a single number. It’s simply statistical illiteracy
  7. Don’t waste money on training courses in research ethics. Everyone already knows what’s honest and what’s dodgy (though a bit more statistics training might help with that).  Most people want to do the honest thing, but few have the nerve to stick to their principles if the alternative is to lose your job and your home.  Senior university people must stop behaving in that way.
  8. University procedures for protecting the young are totally inadequate. A young student who reports bad behaviour of his seniors is still more likely to end up being fired than being congratulated (see, for example, a particularly bad case at the University of Sheffield).  All big organisations close ranks to defend themselves when criticised.  Even extreme cases, as when an employee commits suicide after being bullied, universities issue internal reports which blame nobody
  9. Universities must stop papering over the cracks when misbehaviour is discovered. It seems to be beyond the wit of PR people to realise that often it’s best (and always the cheapest) to put your hands up and say “sorry, we got that wrong”
  10. There an urgent need to get rid of the sort of statistical illiteracy that allows P = 0.06 to be treated as failure and P = 0.04 as success. This is almost universal in biomedical papers, and given the hazards posed by the false discovery rate, could well be a major contribution to false claims. Journal editors need to offer much better statistical advice than is the case at the moment.

Follow-up

Jump to follow-up

The last email of Stefan Grimm, and its follow-up post, has been read over 195,000 times now.

After Grimm’s death, Imperial announced that it would investigate itself The report is now available.

Performance Management: Review of policies, procedures and support available to staff

Following the tragic death of a member of the College’s staff community, Professor Stefan Grimm, the Provost invited the Senior Consul, Professor Richard Thompson, and the Director of Human Resources, Mrs Louise Lindsay, to consider the relevant College policies, procedures and the support available to all staff during performance review.

The report is even worse than I expected. It can be paraphrased as saying ‘our bullying was not done sufficiently formally -we need more forms and box-ticking’.

At the heart of the problem is Imperial’s Personal Review and Development Plan (PRDP). Here is an extract.

"Professor Grimm had been under review in the informal process for nearly two years. His line manager was using this period to help Professor Grimm obtain funding or alternative work (the review panel saw evidence of the efforts made in this regard). The subsequent formal process would have involved a minimum of two formal meetings with time to improve in-between formal meetings before consideration would have been given to the termination of Professor Grimm’s employment. Understandably there is a reluctance to move into formal hearings, particularly when the member of staff is hard working and diligent, but the formal stages would have provided more clarity to Professor Grimm on process and support through the written documentation, representation at meetings and HR involvement."

"It is recommended that the new capability procedure and ordinance include greater clarity on timescales for informal action and how this might operate in different roles."

It seems to be absurd to describe Wilkins’ letter has an attempt to "help" Professor Grimm, It was a direct threat to the livelihood of a competent 51 year-old full professor. Having flow charts for the bullying would not have helped. Neither would the provision by HR of "resilience" courses (what I’ve seen of such classes makes me feel suicidal at the thought of how far universities have sunk into pseudo-scientific HR babble).

I’ll skip straight to the conclusions, with my comments on them in italic.

1. Expand the Harassment Support Contact Programme to train volunteers, academic staff, who can be matched with individuals going through informal processes.

Looks like a charade to me. If they want to fire people without enough grants, they’ll do it.

2. Refresh and re-launch information on the employee assistance services widespread distribution and regular update of promotional material.

Ditto

3. Ensure regular training is given to new and experienced managers in core HR procedures.

Train senior people to bully properly.

4. Create a separate guidance and support document for staff to supplement document. The document to include a clear and concise summary of the informal formal process, a flowchart, the support available to staff and frequently asked questions

Pretend that staff are being helped by threatening to fire them.

5. Direct managers to inform HR before commencing the informal stage of performance management. All managers to have a briefing from their local HR representative of the instigation of performance management.

Make sure you’ve filled in the forms and ticked the boxes before you start bullying. HR don’t understand performance and should have no role in the process.

6. Create a separate policy for performance management in the form of procedure, which includes clear definitions for informal and formal performance
management and further guidance on the timescales and correspondence in stages. Provide clarity on the role of the PRDP appraisal in performance management.

The role PRDP is to increase the status of Imperial College, but pretend it’s to benefit its victims.

7. Create template documentation for performance management correspondence and formal stages of the process. Direct managers to ensure all correspondence reviewed by an HR representative before it is sent to a member of staff.

Bullying is OK if you’ve filled in enough forms.

In summary, these proposals merely add more bureaucracy. They won’t change anything. As one supposed, they are merely a smokescreen for carrying on as at present.

There is only one glimmer of hope in the whole report.

Additional recommendation

Although this was not within the remit of the current review, a number of concerns were raised with the reviewers about the application and consistency of approach in the use of performance metrics in academia and in the College. The reviewers recommend that the College undertake a wider consultation and review of the application of performance metrics within Imperial College with recommendations to be considered by the Provost’s Board in the summer term.

I’ve been telling them since 2007 that the metrics they use to judge people are plain silly [download the paper]. So have many other people. Could the message have sunk in at last? We’ll see.

What should be done about performance?

I’ve been very critical of the metrics that are used by Imperial (and some other places) to harass even quite senior people. So, it might well be asked how I think that standards should be maintained. If people are paid by the taxpayers, it isn’t unreasonable to expect them to work to the best of their abilities. The following observations come to mind.

  • Take a lesson from Bell Labs in its heyday (before performance managers got power) . "First, management had to be technically competent; at Bell Labs, all managers were former researchers. Second, no researchers should have to raise funds. They should be free of that pressure. Third, research should and would be supported for years – if you want your company to last, take the long view. And finally, a project could be terminated without damning the researcher. There should be no fear of failure."
  • Take a lesson from the great Max Perutz about how to run a successful lab."Max had the knack of picking extraordinary talent. But he also had the vision of creating a working environment where talented people were left alone to pursue their ideas. This philosophy lives on in the LMB and has been adopted by other research institutes as well. Max insisted that young scientists should be given full responsibility and credit for their work. There was to be no hierarchy, and everybody from the kitchen ladies to the director were on first-name terms. The groups were and still are small, and senior scientists work at the bench."
  • Read Gus John "The results of the Guardian higher education network’s survey on bullying in higher education should give the entire sector cause to worry about the competence and style of leaders and managers in the sector"
  • The vast majority of scientists whom I know work absurdly long hours. They are doing their best without any harassment from "performance managers". Some are more successful, and/or lucky, than others. That’s how it is. Get used to it.
  • Rankings of universities are arbitrary and silly, but worse, they provide an incentive to vice-chancellors to justify their vast salaries by pushing their institution up the rankings by fair means or foul. It’s no exaggeration to suspect that things like the Times Higher Education rankings and the REF contributed to the death of Stefan Grimm.
  • Realise that HR know nothing about science: their "performance management" kills original science, and it leads to corruption. It must bear some of the blame for the crisis in the reproducibility of published work.
  • If you want innovation, you have to tolerate lots and lots of failure

Follow-up

Stop press On April 7th, the coroner said the Grimm had asphyxiated himself on 25 September, 2014. He described the death as "needless"/ And Imperial’s HR director, Louise Lindsay, when asked if the new procedures would have saved his life, said "not clear it would have resulted in a different outcome.". So we have it from the horse’s mouth. Imperial has done nothing to prevent more tragedies happening.

10 April 2015

King’s College London has just issued a draft for its "performance management" system. You can read all about it here.

"Performance management is a direct incentive to do shoddy short-cut science."

17 April 2015

Alice Gast declines to apologise

At 06.22 on Radio 4’s Today Programme, Tanya Beckett interviewed Alice Gast. President of Imperial College London. After a 4-minute commercial for Imperial, Gast is asked about the death of Stefan Grimm. Her reply doesn’t even mention Grimm. “professors are under a lot of pressure . . .”. Not a word of apology or explanation is offered. I find it hard to comprehend such a heartless approach to her employees.

Listen to the interview  sl

1 May 2015

The Imperial students’ newspaper, Felix Online, carried a description of the internal report and the inquest: Review in response to Grimm’s death completed. Results criticised by external academics: “Imperial doesn’t get it.”, It’s pretty good..

I wonder what undergraduates feel about being taught by people who write letters like Martin Wilkins‘ did?

Jump to follow-up

The University of Warwick seems determined to wrest the title of worst employer from Imperial College London and Queen Mary College London. In little over a year, Warwick has had four lots of disastrous publicity, all self-inflicted.

logo

First came the affair of Thomas Docherty.

Thomas Docherty

Professor of English and Comparative Literature, Thomas Docherty was suspended in January 2014 by Warwick because of "inappropriate sighing", "making ironic comments" and "projecting negative body language". Not only was Docherty punished, but also his students.

"As well as being banned from campus, from the library, and from email contact with his colleagues, Docherty was prohibited from supervising his graduate students and from writing references. Indiscriminate, disproportionate, and unjust measures against the professor were also deeply unfair to his students."

Ludicrously, rather than brushing the matter aside, senior management at Warwick hired corporate lawyers to argue that his behaviour was grounds for dismissal.

That cost the university at least £43,000.

The story appeared in every UK newspaper and rapidly spread abroad. It must have been the most ham-fisted bit of PR ever. But rather than firing the HR department, The University of Warwick let the matter fester for a full nine months before reinstating Docherty in September 2014.

The university managed to get the worst possible outcome. The suspension provoked world-wide derision and in the end they admitted they’d been wrong.  Jeremy Treglown, a professor emeritus of Warwick (and former editor of The Times Literary Supplement) described the episode as being like “something out of Kafka”.

And guess what, nobody was blamed and nobody resigned.

The firing people of doing cheap research

Warwick has followed the bad example set by Queen Mary College London, Kings College London and Imperial College London , If you don’t average an external grant income of at least £75,000 a year over the past four years, you job is at risk. Apart from its cruelty, the taxpayer is likely to take a dim view of academics being compelled to make research as expensive as possible. Some people need no more than a paper and pencil to do brilliant work. If you are one of them, don’t go to any of these universities.

It’s simply bad management. They shouldn’t have taken on so many people if they can’t pay the bills. Many universities took on extra staff in order to cheat on the REF. Now they have to cast some aside like worn-out old boots..

The tone of voice

Warwick University has very recently issued a document "Warwick tone of voice: Full guidelines. March 2015". It’s a sign of their ham-fisted management style that it wasn’t even hidden behind a password. They seem to be proud of it. Of course it provoked a storm of hilarity on social media. Documents like that are designed to instruct people not to give truthful opinions but to act as advertising agents for their university. The actual effect is, of course, exactly the opposite. They reduce the respect for the institution that issues such documents.

Here are some quotations (try not to laugh -you might get fired).

"What is tone of voice and why do we need a ‘Warwick’ tone of voice?
The tone of our language defines the way people respond to us. By writing in a tone that’s true to our brand, we can express what it is that makes University of Warwick unique."

"Our brand: defined by possibility

What is it that makes us unique? We’re a university with modern values and a formidable record of academic and commercial achievement — but not the only one. So what sets us apart?

The difference lies in our approach to everything we do. Warwick is a place that fundamentally rejects the notion of obstacles — a place where the starting point is always ‘anything is possible’. "

Then comes the common thread. It’s all to do with rankings.

“What if we raised our research profile to even higher levels of international excellence? Then we could be ranked as one of the world’s top fifty universities."

The people who sell university rankings (and the REF) have much to answer for,

There’s a good post about this fiasco, from people whose job is branding. "How not to write guidelines".

Outsourcing teaching

As if all this were not enough, on April 5th 2015, we heard that "Warwick Uni to outsource hourly paid academics to subsidiary". Universities already rely totally on people on people on short-term contracts. Most research is done by PhD students and post-doctoral students on three (or sometimes five) year contracts. They are supervised (not always very well) by people who spend most of their time writing grant applications. Science must be one of the most insecure jobs going.

Increasingly we are seeing casualisation of academics. A three year contract looks like luxury compared with being hired by the hour. It’s rapidly approaching zero-hours contracts for PhDs. In fact it’s reported that people hired by TeachHigher won’t even have a contract: "staff hired under TeachHigher will be working explicitly not on a contract, but rather, an ‘agreement’ ".

The organisation behind this is called TeachHigher. And guess who owns it? The University of Warwick. It is a subsidiary of the Warwick Employment Group which already runs several other employment agencies, including Unitemps which deals with cleaners, security and catering staff.

The university claims that it isn’t "outsourcing" because TeachHigher is part of the university. For now, anyway. It’s reported that "The university plans to turn the project into a commercial franchise, similar to another subsidiary used to pay cleaners and catering staff, it can sell to other institutions."

The Warwick students’ newspaper "spoke to a PhD student who was fired last year from a teaching job with Unitemps after participating in strike action, who felt one of the aims of creating TeachHigher may “to prevent collective action from taking place.”" 

Bringing the university into disrepute is something for which you can be fired. The vice-chancellor, Nigel Thrift, has allowed Warwick to become a laughing stock four times in a single year. Perhaps it is time that the chair of Council, George Cox, did something about it?

Universities don’t have to be run like that. UCL isn’t, for one.

Follow-up

9 April 2015 It seems that TeachHigher was proposing to pay a lecturer £5 per hour. This may not be accurate but it’s certainly caused a stir.

Laurie Taylor, ever-topical, was on the Docherty case in Times Higher Education.

Riga, Riga, roses

I’ve nothing against Latvia per se, but I can’t in all honesty see any real parallels between a university in such a faraway and somewhat desolate place as Riga and our own delightful campus.”

That was how Jamie Targett, our Director of Corporate Affairs, responded to the news that the European Court of Human Rights had found that a professor at Riga Stradiņš University had been unfairly sacked for criticising senior management. University staff, the court ruled, must be free to criticise management without fear of dismissal or disciplinary action.

Targett “thoroughly rejected” the suggestion from our reporter Keith Ponting (30) that there might be “a parallel” between what happened at Riga and our own university’s decision to ban Professor Busby of our English Department from campus for nine months for a disciplinary offence.

This, insisted Targett, was a “wholly inappropriate parallel”. For whereas the Latvian professor had been disciplined for speaking out against “alleged nepotism, plagiarism, corruption and mismanagement” in his department, Professor Busby had been banned from campus and from contact with students and colleagues for nine months for the “far more heinous offence” of “sighing” during an appointments interview.

Targett said he “trusted that any fair-minded person, whether from Latvia or indeed the Outer Caucasus, would be able to see the essential difference in the scale of offence”.

10 April 2015

The London Review of Books has a rather similar piece, Mind Yout Tone, by Glen Newey.

"It’s tough to pick winners amid the textureless blather that has lately seeped from campus PR outfits".

"In a keen field, though, it’s Warwick’s drill-sheet that takes the jammie dodger".

17 April 2015

Anyone would have thought that Laurie Taylor had read this post. His inimitable Poppletonian column this week was entirely devoted to Warwick.

Nothing to laugh about!

16 APRIL 2015 | BY LAURIE TAYLOR

Our Director of Corporate Affairs, Jamie Targett, has roundly criticised all those members of the Poppleton academic staff who have responded to the new University of Warwick “Tone of Voice” guidelines with what he described as “wholly inappropriate sniggering”.

Targett said that he saw “nothing at all funny” in Warwick’s new insistence that its staff should always apply the “What if” linguistic principle in all their communications.

He particularly praised the manner in which the application of the What if principle helped to make communications optimistic, leaving “the reader to feel that you’re there to help them”. So instead of writing “This is only for”, Warwick staff under the influence of the What if principle would write “This is for everyone who”.

But there were many other advantages that could be derived from consistent application of What if. It also inclined writers to be “proactive”. So instead of writing “Your application was received”, Warwick staff imbued with the What if ethic would always write “We’ve read your application”.

Targett said that he also failed to find any humour whatsoever in the further What if insistence that academic staff should always avoid using such tentative words as “possibly”, “hopefully” or “maybe”. So, under the What if linguistic principle, staff would never write “We hope to become a top 50 world-ranked university” but always “Our aim is to become a top 50 world-ranked university”.

In what was being described as “an unexpected move”, Targett received support for his views on the What if principle from Mr Ted Odgers of our Department of Media and Cultural Studies, who thought that the principle made “particularly good sense” in the Warwick context. He went so far as to provide the following example of its application:

“What if the University of Warwick had not recently banned an academic from its campus for nothing more serious than sighing, projecting negative body language and making ironic comments when interviewing candidates for a job? And What if this ban had not been complemented with a ban on the said academic contacting his own undergraduates and tutoring his own PhD students and speaking to his former colleagues? And What if the whole case against the said academic had not then been pursued with the use of a team of high-powered barristers costing the university at least £43,000?”

If all these What ifs had been met, then, added Mr Odgers, Warwick might possibly, hopefully or maybe have managed to retain its former position as an institution that respected the principles of academic freedom.

Targett told The Poppletonian that while he appreciated Mr Odgers’ application of the What if principle, he felt that it did not “at some points” fully capture the essence of its guidelines.