
David Colquhoun
Today, in desperation, we sent a letter to our MP. Usually such letters get a boiler plate response, dictated by Conservative Central Office. We aren’t hopeful about changing the mind of our MP. Before he was elected he ran a shockingly misogynist web site, now deleted. But it’s essential to do what one can to about democracy while we still have some.
Dear Mike Penning
I doubt whether this will be the only letter you’ll get along these lines.
1. The Conservative party used to be the party of business. Brexit has destituted many small businesses and impoverished the UK. That’s not just my opinion, but that of the OBR. And it hasn’t been the will of the people except for a few weeks around the time of the referendum. The vote was bought by Russian (and far-right US) money. Putin was a strong advocate of brexit. He got his way.
2. The Conservative party used to be the party of law and order. Now, it seems, it thinks it’s enough to say sorry, let’s move on. It has become the party of law-breaking.
3. The Rwanda scheme is intolerable. Government ministers appear regularly on TV and radio referring to people arriving illegally. It is NOT illegal to ask for asylum however you arrive in the country. Ministers must know this, so they are lying to the public. This should not happen. It breaks the ministerial code and in the recent past it would have led to resignation. That honourable tradition is ignored by the Conservative party.
4. As a backbencher I hope that you are as appalled as we are that there will be no debate in parliament about a matter as controversial as the Rwanda proposals. That is more what one might expect in a dictatorship than in a democracy. The Rwanda proposals cost a fortune, are probably illegal and they won’t work anyway.
5. Ministers repeatedly claim that the UK’s record for taking refugees is world-beating. You yourself made a similar claim in your recent letter to us. Surely you must know that this is simply not true, Even the famous Kindertransport in WW2 refused to take the parents of the children who arrived in the UK – they were left to be murdered by the Nazis. More recently, Germany accepted a million Syrians. In comparison, we took only a handful.
6. Most recently, the Ukraine war has again showed the UK in a bad light. Poland has accepted millions of people fleeing from Putin’s war. We have accepted only a handful. Priti Patel’s scheme, Homes for Ukraine, verges on being a sham. We signed up for it as soon as it opened, but nothing has happened. We are trying to find a refugee to sponsor. Apparently thousands have asked, but the government has done nothing to help. We are among the 100,000 British people who wanted to do something to help, only to find ourselves thwarted by the government.
7. Although sanctions were (very belatedly) imposed on some uber-wealthy Russians when Putin invaded Ukraine, they were given plenty of time to move their assets before the sanctions were enforced. Was the government concerned that the large contributions that Russians made to Conservative party funds might dry up?
In summary, the Conservative party now bears no resemblance to the Conservative of even 10 years ago. It has morphed into a far-right populist party with scant regard for honesty, or even democracy itself. More like Orban’s Hungary than the country we were born in. We are sad and ashamed that the UK is laughed at and pitied round the world (do you read foreign newspapers? – their view of Johnson’s government is chastening).
We hope this government falls before it destroys totally the England in which
we were born.
David & Margaret Colquhoun
This is a transcript of the talk that I gave to the RIOT science club on 1st October 2020. The video of the talk is on YouTube . The transcript was very kindly made by Chris F Carroll, but I have modified it a bit here to increase clarity. Links to the original talk appear throughout.
My title slide is a picture of UCL’s front quad, taken on the day that it was the starting point for the second huge march that attempted to stop the Iraq war. That’s a good example of the folly of believing things that aren’t true.
“Today I speak to you of war. A war that has pitted statistician against statistician for nearly 100 years. A mathematical conflict that has recently come to the attention of the normal people and these normal people look on in fear, in horror, but mostly in confusion because they have no idea why we’re fighting.”
Kristin Lennox (Director of Statistical Consulting, Lawrence Livermore National Laboratory)
That sums up a lot of what’s been going on. The problem is that there is near unanimity among statisticians that p values don’t tell you what you need to know but statisticians themselves haven’t been able to agree on a better way of doing things.
This talk is about the probability that if we claim to have made a discovery we’ll be wrong. This is what people very frequently want to know. And that is not the p value. You want to know the probability that you’ll make a fool of yourself by claiming that an effect is real when in fact it’s nothing but chance.
Just to be clear, what I’m talking about is how you interpret the results of a single unbiased experiment. Unbiased in the sense the experiment is randomized, and all the assumptions made in the analysis are exactly true. Of course in real life false positives can arise in any number of other ways: faults in the randomization and blinding, incorrect assumptions in the analysis, multiple comparisons, p hacking and so on, and all of these things are going to make the risk of false positives even worse. So in a sense what I’m talking about is your minimum risk of making a false positive even if everything else were perfect.
The conclusion of this talk will be:
If you observe a p value close to 0.05 and conclude that you’ve discovered something, then the chance that you’ll be wrong is not 5%, but is somewhere between 20% and 30% depending on the exact assumptions you make. If the hypothesis was an implausible one to start with, the false positive risk will be much higher.
There’s nothing new about this at all. This was written by a psychologist in 1966.
The major point of this paper is that the test of significance does not provide the information concerning phenomena characteristically attributed to it, and that a great deal of mischief has been associated with its use.
Bakan, D. (1966) Psychological Bulletin, 66 (6), 423 – 237
Bakan went on to say this is already well known, but if so it’s certainly not well known, even today, by many journal editors or indeed many users.
The p value
Let’s start by defining the p value. An awful lot of people can’t do this but even if you can recite it, it’s surprisingly difficult to interpret it.
I’ll consider it in the context of comparing two independent samples to make it a bit more concrete. So the p value is defined thus:
If there were actually no effect -for example if the true means of the two samples were equal, so the difference was zero -then the probability of observing a value for the difference between means which is equal to or greater than that actually observed is called the p value.
Now there’s at least five things that are dodgy with that, when you think about it. It sounds very plausible but it’s not.
- “If there are actually no effect …”: first of all this implies that the denominator for the probability is the number of cases in which there is no effect and this is not known.
- “… or greater than…” : why on earth should we be interested in values that haven’t been observed? We know what the effect size that was observed was, so why should we be interested in values that are greater than that which haven’t been observed?
- It doesn’t compare the hypothesis of no effect with anything else. This is put well by Sellke et al in 2001, “knowing that the data are rare when there is no true difference [that’s what the p value tells you] is of little use unless one determines whether or not they are also rare when there is a true difference”. In order to understand things properly, you’ve got to have not only the null hypothesis but also an alternative hypothesis.
- Since the definition assumes that the null hypothesis is true, it’s obvious that it can’t tell us about the probability that the null hypothesis is true.
- The definition invites users to make the error of the transposed conditional. That sounds a bit fancy but it’s very easy to say what it is.
- The probability that you have four legs given that you’re a cow is high but the probability that you’re a cow given that you’ve got four legs is quite low many animals that have four legs that aren’t cows.
- Take a legal example. The probability of getting the evidence given that you’re guilty may be known. (It often isn’t of course — but that’s the sort of thing you can hope to get). But it’s not what you want. What you want is the probability that you’re guilty given the evidence.
- The probability you’re catholic given that you’re the pope is probably very high, but the probability you’re a pope given that you’re a catholic is very low.
So now to the nub of the matter.
- The probability of the observations given that the null hypothesis is the p value. But it’s not what you want. What you want is the probability that the null hypothesis is true given the observations.
The first statement is a deductive process; the second process is inductive and that’s where the problems lie. These probabilities can be hugely different and transposing the conditional simply doesn’t work.
The False Positive Risk
The false positive risk avoids these problems. Define the false positive risk as follows.
If you declare a result to be “significant” based on a p value after doing a single unbiased experiment, the False Positive Risk is the probability that your result is in fact a false positive.
That, I maintain, is what you need to know. The problem is that in order to get it, you need Bayes’ theorem and as soon as that’s mentioned, contention immediately follows.
Bayes’ theorem
Suppose we call the null-hypothesis H0, and the alternative hypothesis H1. For example, H0 can be that the true effect size is zero and H1 can be the hypothesis that there’s a real effect, not just chance. Bayes’ theorem states that the odds on H1 being true, rather than H0 , after you’ve done the experiment are equal to the likelihood ratio times the odds on there being a real effect before the experiment:
In general we would want a Bayes’ factor here, rather than the likelihood ratio, but under my assumptions we can use the likelihood ratio, which is a much simpler thing [explanation here].
The likelihood ratio represents the evidence supplied by the experiment. It’s what converts the prior odds to the posterior odds, in the language of Bayes’ theorem. The likelihood ratio is a purely deductive quantity and therefore uncontentious. It’s the probability of the observations if there’s a real effect divided by the probability of the observations if there’s no effect.
Notice a simplification you can make: if the prior odds equal 1, then the posterior odds are simply equal to the likelihood ratio. “Prior odds of 1” means that it’s equally probable before the experiment that there was an effect or that there’s no effect. Put another way, prior odds of 1 means that the prior probability of H0 and of H1 are equal: both are 0.5. That’s probably the nearest you can get to declaring equipoise.
Comparison: Consider Screening Tests
I wrote a statistics textbook in 1971 [download it here] which by and large stood the test of time but the one thing I got completely wrong was the limitations of p values. Like many other people I came to see my errors through thinking about screening tests. These are very much in the news at the moment because of the COVID-19 pandemic. The illustration of the problems they pose which follows is now quite commonplace.
Suppose you test 10,000 people and that 1 in a 100 of those people have the condition, e.g. Covid-19, and 99 don’t have it. The prevalence in the population you’re testing is 1 in a 100. So you have 100 people with the condition and 9,900 who don’t. If the specificity of the test is 95%, you get 5% false positives.
This is very much like a null-hypothesis test of significance. But you can’t get the answer without considering the alternative hypothesis, which null-hypothesis significance tests don’t do. So now add the upper arm to the Figure above.
You’ve got 1% (so that’s 100 people) who have the condition, so if the sensitivity of the test is 80% (that’s like the power of a significance test) then you get to the total number of positive tests is 80 plus 495 and the proportion of tests that are false is 495 false positives divided by the total number of positives, which is 86%. A test that gives 86% false positives is pretty disastrous. It is not 5%! Most people are quite surprised by that when they first come across it.
Now look at significance tests in a similar way
Now we can do something similar for significance tests (though the parallel is not exact, as I’ll explain).
Suppose we do 1,000 tests and in 10% of them there’s a real effect, and in 90% of them there is no effect. If the significance level, so-called, is 0.05 then we get 5% false positive tests, which is 45 false positives.
But that’s as far as you can go with a null-hypothesis significance test. You can’t tell what’s going on unless you consider the other arm. If the power is 80% then we get 80 true positive tests and 20 false negative tests, so the total number of positive tests is 80 plus 45 and the false positive risk is the number of false positives divided by the total number of positives which is 36 percent.
So the p value is not the false positive risk. And the type 1 error rate is not the false positive risk.
The difference between them lies not in the numerator, it lies in the denominator. In the example above, of the 900 tests in which the null-hypothesis was true, there were 45 false positives. So looking at it from the classical point of view, the false positive risk would turn out to be 45 over 900 which is 0.05 but that’s not what you want. What you want is the total number of false positives, 45, divided by the total number of positives (45+80), which is 0.36.
The p value is NOT the probability that your results occurred by chance. The false positive risk is.
A complication: “p-equals” vs “p-less-than”
But now we have to come to a slightly subtle complication. It’s been around since the 1930s and it was made very explicit by Dennis Lindley in the 1950s. Yet it is unknown to most people which is very weird. The point is that there are two different ways in which we can calculate the likelihood ratio and therefore two different ways of getting the false positive risk.
A lot of writers including Ioannidis and Wacholder and many others use the “p less than” approach. That’s what that tree diagram gives you. But it is not what is appropriate for interpretation of a single experiment. It underestimates the false positive risk.
What we need is the “p equals” approach, and I’ll try and explain that now.
Suppose we do a test and we observe p = 0.047 then all we are interested in is, how tests behave that come out with p = 0.047. We aren’t interested in any other different p value. That p value is now part of the data. The tree diagram approach we’ve just been through gave a false positive risk of only 6%, if you assume that the prevalence of true effects was 0.5 (prior odds of 1). 6% isn’t much different from 5% so it might seem okay.
But the tree diagram approach, although it is very simple, still asks the wrong question. It looks at all tests that gives p ≤ 0.05, the “p-less-than” case. If we observe p = 0.047 then we should look only at tests that give p = 0.047 rather than looking at all tests which come out with p ≤ 0.05. If you’re doing it with simulations of course as in my 2014 paper then you can’t expect any tests to give exactly 0.047; what you can do is look at all the tests that come out with p in a narrow band around there, say 0.045 ≤ p ≤ 0.05.
This approach gives a different answer from the tree diagram approach. If you look at only tests that give p values between 0.045 and 0.05, the false positive risk turns out to be not 6% but at least 26%.
I say at least, because that assumes a prior probability of there being a real effect of 50:50. If only 10% of the experiments had a real effect of (a prior of 0.1 in the tree diagram) this rises to 76% of false positives. That really is pretty disastrous. Now of course the problem is you don’t know this prior probability.
The problem with Bayes theorem is that there exists an infinite number of answers. Not everyone agrees with my approach, but it is one of the simplest.
The likelihood-ratio approach to comparing two hypotheses
The likelihood ratio -that is to say, the relative probabilities of observing the data given two different hypotheses, is the natural way to compare two hypotheses. For example, in our case one hypothesis is the zero effect (that’s the null-hypothesis) and the other hypothesis is that there’s a real effect of the observed size. That’s the maximum likelihood estimate of the real effect size. Notice that we are not saying that the effect size is exactly zero; but rather we are asking whether a zero effect explains the observations better than a real effect.
Now this amounts to putting a “lump” of probability on there being a zero effect. If you put a prior probability of 0.5 for there being a zero effect, you’re saying the prior odds are 1. If you are willing to put a lump of probability on the null-hypothesis, then there are several methods of doing that. They all give similar results to mine within a factor of two or so.
Putting a lump of probability on their being a zero effect, for example a prior probability of 0.5 of there being zero effect, is regarded by some people as being over-sceptical (though others might regard 0.5 as high, given that most bright ideas are wrong).
E.J. Wagenmakers summed it up in a tweet:
“at least Bayesians attempt to find an approximate answer to the right question instead of struggling to interpret an exact answer to the wrong question [that’s the p value]”.
Some results.
The 2014 paper used simulations, and that’s a good way to see what’s happening in particular cases. But to plot curves of the sort shown in the next three slides we need exact calculations of FPR and how to do this was shown in the 2017 paper (see Appendix for details).
Comparison of p-equals and p-less-than approaches
The slide at slide at 26:05 is designed to show the difference between the “p-equals” and the “p-less than” cases.
On each diagram the dashed red line is the “line of equality”: that’s where the points would lie if the p value were the same as the false positive risk. You can see that in every case the blue lines -the false positive risk -is greater than the p value. And for any given observed p value, the p-equals approach gives a bigger false positive risk than the p-less-than approach. For a prior probability of 0.5 then the false positive risk is about 26% when you’ve observed p = 0.05.
So from now on I shall use only the “p-equals” calculation which is clearly what’s relevant to a test of significance.
The false positive risk as function of the observed p value for different sample sizes
Now another set of graphs (slide at 27:46), for the false positive risk as a function of the observed p value, but this time we’ll vary the number in each sample. These are all for comparing two independent samples.
The curves are red for n = 4 ; green for n = 8 ; blue for n = 16.
The top row is for an implausible hypothesis with a prior of 0.1, the bottom row for a plausible hypothesis with a prior of 0.5.
The left column shows arithmetic plots; the right column shows the same curves in log-log plots, The power these lines correspond to is:
- n = 4 (red) has power 22%
- n = 8 (green) has power 46%
- n = 16 (blue) one has power 78%
Now you can see these behave in a slightly curious way. For most of the range it’s what you’d expect: n = 4 gives you a higher false positive risk than n = 8 and that still higher than n = 16 the blue line.
The curves behave in an odd way around 0.05; they actually begin to cross, so the false positive risk for p values around 0.05 is not strongly dependent on sample size.
But the important point is that in every case they’re above the line of equality, so the false positive risk is much bigger than the p value in any circumstance.
False positive risk as a function of sample size (i.e. of power)
Now the really interesting one (slide at 29:34). When I first did the simulation study I was challenged by the fact that the false positive risk actually becomes 1 if the experiment is a very powerful one. That seemed a bit odd.
The plot here is the false positive risk FPR50 which I define as “the false positive risk for prior odds of 1, i.e. a 50:50 chance of being a real effect or not a real effect.
Let’s just concentrate on the p = 0.05 curve (blue). Notice that, because the number per sample is changing, the power changes throughout the curve. For example on the p = 0.05 curve for n = 4 (that’s the lowest sample size plotted), power is 0.22, but if we go to the other end of the curve, n = 64 (the biggest sample size plotted), the power is 0.9999. That’s something not achieved very often in practice.
But how is it that p = 0.05 can give you a false positive risk which approaches 100%? Even with p = 0.001 the false positive risk will eventually approach 100% though it does so later and more slowly.
In fact this has been known for donkey’s years. It’s called the Jeffreys-Lindley paradox, though there’s nothing paradoxical about it. In fact it’s exactly what you’d expect. If the power is 99.99% then you expect almost every p value to be very low. Everything is detected if we have a high power like that. So it would be very rare, with that very high power, to get a p value as big as 0.05. Almost every p value will be much less than 0.05, and that’s why observing a p value as big as 0.05 would, in that case, provide strong evidence for the null-hypothesis. Even p = 0.01 would provide strong evidence for the null hypothesis when the power is very high because almost every p value would be much less than 0.01.
This is a direct consequence of using the p-equals definition which I think is what’s relevant for testing hypotheses. So the Jeffreys-Lindley phenomenon makes absolute sense.
In contrast, if you use the p-less-than approach, the false positive risk would decrease continuously with the observed p value. That’s why, if you have a big enough sample (high enough power), even the smallest effect becomes “statistically significant”, despite the fact that the odds may favour strongly the null hypothesis. [Here, ‘the odds’ means the likelihood ratio calculated by the p-equals method.]
A real life example
Now let’s consider an actual practical example. The slide shows a study of transcranial electromagnetic stimulation published in Science magazine (so a bit suspect to begin with).
The study concluded (among other things) that an improved associated memory performance was produced by transcranial electromagnetic stimulation, p = 0.043. In order to find out how big the sample sizes were I had to dig right into the supplementary material. It was only 8. Nonetheless let’s assume that they had an adequate power and see what we make of it.
In fact it wasn’t done in a proper parallel group way, it was done as ‘before and after’ the stimulation, and sham stimulation, and it produces one lousy asterisk. In fact most of the paper was about functional magnetic resonance imaging, memory was mentioned only as a subsection of Figure 1, but this is what was tweeted out because it sounds more dramatic than other things and it got a vast number of retweets. Now according to my calculations p = 0.043 means there’s at least an 18% chance that it’s false positive.
How better might we express the result of this experiment?
We should say, conventionally, that the increase in memory performance was 1.88 ± 0.85 (SEM) with confidence interval 0.055 to 3.7 (extra words recalled on a baseline of about 10). Thus p = 0.043. But then supplement this conventional statement with
This implies a false positive risk, FPR50, (i.e. the probability that the results occurred by chance only) of at least 18%, so the result is no more than suggestive.
There are several other ways you can put the same idea. I don’t like them as much because they all suggest that it would be helpful to create a new magic threshold at FPR50 = 0.05, and that’s as undesirable as defining a magic threshold at p = 0.05. For example you could say that the increase in performance gave p = 0.043, and in order to reduce the false positive risk to 0.05 it would be necessary to assume that the prior probability of there being a real effect was 81%. In other words, you’d have to be almost certain that there was a real effect before you did the experiment before that result became convincing. Since there’s no independent evidence that that’s true, the result is no more than suggestive.
Or you could put it this way: the increase in performance gave p = 0.043. In order to reduce the false positive risk to 0.05 it would have been necessary to observe p = 0.0043, so the result is no more than suggestive.
The reason I now prefer the first of these possibilities is because the other two involve an implicit threshold of 0.05 for the false positive risk and that’s just as daft as assuming a threshold of 0.05 for the p value.
The web calculator
Scripts in R are provided with all my papers. For those who can’t master R Studio, you can do many of the calculations very easily with our web calculator [for latest links please go to http://www.onemol.org.uk/?page_id=456]. There are three options : if you want to calculate the false positive risk for a specified p value and prior, you enter the observed p value (e.g. 0.049), the prior probability that there’s a real effect (e.g. 0.5), the normalized effect size (e.g. 1 standard deviation) and the number in each sample. All the numbers cited here are based on an effect size if 1 standard deviation, but you can enter any value in the calculator. The output panel updates itself automatically.
We see that the false positive risk for the p-equals case is 0.26 and the likelihood ratio is 2.8 (I’ll come back to that in a minute).
Using the web calculator or using the R programs which are provided with the papers, this sort of table can be very quickly calculated.
The top row shows the results if we observe p = 0.05. The prior probability that you need to postulate to get a 5% false positive risk would be 87%. You’d have to be almost ninety percent sure there was a real effect before the experiment, in order to to get a 5% false positive risk. The likelihood ratio comes out to be about 3; what that means is that your observations will be about 3 times more likely if there was a real effect than if there was no effect. 3:1 is very low odds compared with the 19:1 odds which you might incorrectly infer from p = 0.05. The false positive risk for a prior of 0.5 (the default value) which I call the FPR50, would be 27% when you observe p = 0.05.
In fact these are just directly related to each other. Since the likelihood ratio is a purely deductive quantity, we can regard FPR50 as just being a transformation of the likelihood ratio and regard this as also a purely deductive quantity. For example, 1 / (1 + 2.8) = 0.263, the FPR50. But in order to interpret it as a posterior probability then you do have to go into Bayes’ theorem. If the prior probability of a real effect was only 0.1 then that would correspond to a 76% false positive risk when you’ve observed p = 0.05.
If we go to the other extreme, when we observe p = 0.001 (bottom row of the table) the likelihood ratio is 100 -notice not 1000, but 100 -and the false positive risk, FPR50 , would be 1%. That sounds okay but if it was an implausible hypothesis with only a 10% prior chance of being true (last column of Table), then the false positive risk would be 8% even when you observe p = 0.001: even in that case it would still be above 5%. In fact, to get the FPR down to 0.05 you’d have to observe p = 0.00043, and that’s good food for thought.
So what do you do to prevent making a fool of yourself?
- Never use the words significant or non-significant and then don’t use those pesky asterisks please, it makes no sense to have a magic cut off. Just give a p value.
- Don’t use bar graphs. Show the data as a series of dots.
- Always remember, it’s a fundamental assumption of all significance tests that the treatments are randomized. When this isn’t the case, you can still calculate a test but you can’t expect an accurate result. This is well-illustrated by thinking about randomisation tests.
- So I think you should still state the p value and an estimate of the effect size with confidence intervals but be aware that this tells you nothing very direct about the false positive risk. The p value should be accompanied by an indication of the likely false positive risk. It won’t be exact but it doesn’t really need to be; it does answer the right question. You can for example specify the FPR50, the false positive risk based on a prior probability of 0.5. That’s really just a more comprehensible way of specifying the likelihood ratio. You can use other methods, but they all involve an implicit threshold of 0.05 for the false positive risk. That isn’t desirable.
So p = 0.04 doesn’t mean you discovered something, it means it might be worth another look. In fact even p = 0.005 can under some circumstances be more compatible with the null-hypothesis than with there being a real effect.
We must conclude, however reluctantly, that Ronald Fisher didn’t get it right. Matthews (1998) said,
“the plain fact is that 70 years ago Ronald Fisher gave scientists a mathematical machine for turning boloney into breakthroughs and flukes into funding”.
Robert Matthews Sunday Telegraph, 13 September 1998.
But it’s not quite fair to blame R. A. Fisher because he himself described the 5% point as a “quite a low standard of significance”.
Questions & Answers
Q: “There are lots of competing ideas about how best to deal with the issue of statistical testing. For the non-statistician it is very hard to evaluate them and decide on what is the best approach. Is there any empirical evidence about what works best in practice? For example, training people to do analysis in different ways, and then getting them to analyze data with known characteristics. If not why not? It feels like we wouldn’t rely so heavily on theory in e.g. drug development, so why do we in stats?
A: The gist: why do we rely on theory and statistics? Well, we might as well say, why do we rely on theory in mathematics? That’s what it is! You have concrete theories and concrete postulates. Which you don’t have in drug testing, that’s just empirical.
Q: Is there any empirical evidence about what works best in practice, so for example training people to do analysis in different ways? and then getting them to analyze data with known characteristics and if not why not?
A: Why not: because you never actually know unless you’re doing simulations what the answer should be. So no, it’s not known which works best in practice. That being said, simulation is a great way to test out ideas. My 2014 paper used simulation, and it was only in the 2017 paper that the maths behind the 2014 results was worked out. I think you can rely on the fact that a lot of the alternative methods give similar answers. That’s why I felt justified in using rather simple assumptions for mine, because they’re easier to understand and the answers you get don’t differ greatly from much more complicated methods.
In my 2019 paper there’s a comparison of three different methods, all of which assume that it’s reasonable to test a point (or small interval) null-hypothesis (one that says that treatment effect is exactly zero), but given that assumption, all the alternative methods give similar answers within a factor of two or so. A factor of two is all you need: it doesn’t matter if it’s 26% or 52% or 13%, the conclusions in real life are much the same.
So I think you might as well use a simple method. There is an even simpler one than mine actually, proposed by Sellke et al. (2001) that gives a very simple calculation from the p value and that gives a false positive risk of 29 percent when you observe p = 0.05. My method gives 26%, so there’s no essential difference between them. It doesn’t matter which you use really.
Q: The last question gave an example of training people so maybe he was touching on how do we teach people how to analyze their data and interpret it accurately. Reporting effect sizes and confidence intervals alongside p values has been shown to improve interpretation in teaching contexts. I wonder whether in your own experience that you have found that this helps as well? Or can you suggest any ways to help educators, teachers, lecturers, to help the next generation of researchers properly?
A: Yes I think you should always report the observed effect size and confidence limits for it. But be aware that confidence intervals tell you exactly the same thing as p values and therefore they too are very suspect. There’s a simple one-to-one correspondence between p values and confidence limits. So if you use the criterion, “the confidence limits exclude zero difference” to judge whether there’s a real effect you’re making exactly the same mistake as if you use p ≤ 0.05 to to make the judgment. So they they should be given for sure, because they’re sort of familiar but you do need, separately, some sort of a rough estimate of the false positive risk too.
Q: I’m struggling a bit with the “p equals” intuition. How do you decide the band around 0.047 to use for the simulations? Presumably the results are very sensitive to this band. If you are using an exact p value in a calculation rather than a simulation, the probability of exactly that p value to many decimal places will presumably become infinitely small. Any clarification would be appreciated.
A: Yes, that’s not too difficult to deal with: you’ve got to use a band which is wide enough to get a decent number in. But the result is not at all sensitive to that: if you make it wider, you’ll get larger numbers in both numerator and denominator so the result will be much the same. In fact, that’s only a problem if you do it by simulation. If you do it by exact calculation it’s easier. To do a 100,000 or a million t-tests with my R script in simulation, doesn’t take long. But it doesn’t depend at all critically on the width of the interval; and in any case it’s not necessary to do simulations, you can do the exact calculation.
Q: Even if an exact calculation can’t be done—it probably can—you can get a better and better approximation by doing more simulations and using narrower and narrower bands around 0.047?
A: Yes, the larger the number of simulated tests that you do, the more accurate the answer. I did check it with a million occasionally. But once you’ve done the maths you can get exact answers much faster. The slide at 53:17 shows how you do the exact calculation.
• The Student’s t value along the bottom
• Probability density at the side
• The blue line is the distribution you get under the null-hypothesis, with a mean of 0 and a standard deviation of 1 in this case.
• So the red areas are the rejection areas for a t-test.
• The green curve is the t distribution (it’s a non-central t-distribution which is what you need in this case) for the alternative hypothesis.
• The yellow area is the power of the test, which here is 78%
• The orange area is (1 – power) so it’s 22%
The p-less-than calculation considers all values in the red area or in the yellow area as being positives. The p-equals calculation uses not the areas, but the ordinates here, the probability densities. The probability (density) of getting a t value of 2.04 under the null hypothesis is y0 = 0.053. And the probability (density) under the alternative hypothesis is y1 = 0.29. It’s true that the probability of getting t = 2.04 exactly is infinitesimally small (the area of an infinitesimally narrow band around t = 2.04) but the ratio if the two infinitesimally small probabilities is perfectly well-define). so for the p-equals approach, the likelihood ratio in favour of the alternative hypothesis would be L10 = y1 / 2y0 (the factor of 2 arises because of the two red tails) and that gives you a likelihood ratio of 2.8. That corresponds to an FPR50 of 26% as we explained. That’s exactly what you get from simulation. I hope that was reasonably clear. It may not have been if you aren’t familiar with looking at those sorts of things.
Q: To calculate FPR50 -false positive risk for a 50:50 prior -I need to assume an effect size. Which one do you use in the calculator? Would it make sense to calculate FPR50 for a range of effect sizes?
A: Yes if you use the web calculator or the R scripts then you need to specify what the normalized effect size is. You can use your observed one. If you’re trying to interpret real data, you’ve got an estimated effect size and you can use that. For example when you’ve observed p = 0.05 that corresponds to a likelihood ratio of 2.8 when you use the true effect size (that’s known when you do simulations). All you’ve got is the observed effect size. So they’re not the same of course. But you can easily show with simulations, that if you use the observed effect size in place of the the true effect size (which you don’t generally know) then that likelihood ratio goes up from about 2.8 to 3.6; it’s around 3, either way. You can plug your observed normalised effect size into the calculator and you won’t be led far astray. This shown in section 5 of the 2017 paper (especially section 5.1).
Q: Consider hypothesis H1 versus H2 which is the interpretation to go with?
A: Well I’m not quite clear still what the two interpretations the questioner is alluding to are but I shouldn’t rely on the p value. The most natural way to compare two hypotheses is the calculate the likelihood ratio.
You can do a full Bayesian analysis. Some forms of Bayesian analysis can give results that are quite similar to the p values. But that can’t possibly be generally true because are defined differently. Stephen Senn produced an example where there was essentially no problem with p value, but that was for a one-sided test with a fairly bizarre prior distribution.
In general in Bayes, you specify a prior distribution of effect sizes, what you believe before the experiment. Now, unless you have empirical data for what that distribution is, which is very rare indeed, then I just can’t see the justification for that. It’s bad enough making up the probability that there’s a real effect compared with there being no real effect. To make up a whole distribution just seems to be a bit like fantasy.
Mine is simpler because by considering a point null-hypothesis and a point alternative hypothesis, what in general would be called Bayes’ factors become likelihood ratios. Likelihood ratios are much easier to understand than Bayes’ factors because they just give you the relative probability of observing your data under two different hypotheses. This is a special case of Bayes’ theorem. But as I mentioned, any approach to Bayes’ theorem which assumes a point null hypothesis gives pretty similar answers, so it doesn’t really matter which you use.
There was edition of the American Statistician last year which had 44 different contributions about “the world beyond p = 0.05″. I found it a pretty disappointing edition because there was no agreement among people and a lot of people didn’t get around to making any recommendation. They said what was wrong, but didn’t say what you should do in response. The one paper that I did like was the one by Benjamin & Berger. They recommended their false positive risk estimate (as I would call it; they called it something different but that’s what it amounts to) and that’s even simpler to calculate than mine. It’s a little more pessimistic, it can give a bigger false positive risk for a given p value, but apart from that detail, their recommendations are much the same as mine. It doesn’t really matter which you choose.
Q: If people want a procedure that does not too often lead them to draw wrong conclusions, is it fine if they use a p value?
A: No, that maximises your wrong conclusions, among the available methods! The whole point is, that the false positive risk is a lot bigger than the p value under almost all circumstances. Some people refer to this as the p value exaggerating the evidence; but it only does so if you incorrectly interpret the p value as being the probability that you’re wrong. It certainly is not that.
Q: Your thoughts on, there’s lots of recommendations about practical alternatives to p values. Most notably the Nature piece that was published last year—something like 400 signatories—that said that we should retire the p value. Their alternative was to just report effect sizes and confidence intervals. Now you’ve said you’re not against anything that should be standard practice, but I wonder whether this alternative is actually useful, to retire the p value?
A: I don’t think the 400 author piece in Nature recommended ditching p values at all. It recommended ditching the 0.05 threshold, and just stating a p value. That would mean abandoning the term “statistically significant” which is so shockingly misleading for the reasons I’ve been talking about. But it didn’t say that you shouldn’t give p values, and I don’t think it really recommended an alternative. I would be against not giving p values because it’s the p value which enables you to calculate the equivalent false positive risk which would be much harder work if people didn’t give the p value.
If you use the false positive risk, you’ll inevitably get a larger false negative rate. So, if you’re using it to make a decision, other things come into it than the false positive risk and the p value. Namely, the cost of missing an effect which is real (a false negative), and the cost of getting a false positive. They both matter. If you can estimate the costs associated with either of them, then then you can draw some sort of optimal conclusion.
Certainly the costs of getting false positives or rather low for most people. In fact, there may be a great advantage to your career to publish a lot of false positives, unfortunately. This is the problem that the RIOT science club is dealing with I guess.
Q: What about changing the alpha level? To tinker with the alpha level has been popular in the light of the replication crisis, to make it even a more difficult test pass when testing your hypothesis. Some people have said that it should be 0.005 should be the threshold.
A: Daniel Benjamin said that and a lot of other authors. I wrote to them about that and they said that they didn’t really think it was very satisfactory but it would be better than the present practice. They regarded it as a sort of interim thing.
It’s true that you would have fewer false positives if you did that, but it’s a very crude way of treating the false positive risk problem. I would much prefer to make a direct estimate, even though it’s rough, of the false positive risk rather than just crudely reducing to p = 0.005. I do have a long paragraph in one of the papers discussing this particular thing {towards the end of Conclusions in the 2017 paper).
If you were willing to assume a 50:50 prior chance of there being a real effect the p = 0.005 would correspond to FPR50 = 0.034, which sounds satisfactory (from Table, above, or web calculator).
But if, for example, you are testing a hypothesis about teleportation or mind-reading or homeopathy then you probably wouldn’t be willing to give a prior of 50% to that being right before the experiment. If the prior probability of there being a real effect were 0.1, rather than 0.5, the Table above shows that observation of p = 0.005 would suggest, in my example, FPR = 0.24 and a 24% risk of a false positive would still be disastrous. In this case you would have to have observed p = 0.00043 in order to reduce the false positive risk to 0.05.
So no fixed p value threshold will cope adequately with every problem.
Links
- For up-to-date links to the web calculator, and to papers, start at http://www.onemol.org.uk/?page_id=456
- Colquhoun, 2014, An investigation of the false discovery rate and the
misinterpretation of p-values
https://royalsocietypublishing.org/doi/full/10.1098/rsos.140216 - Colquhoun, 2017, The reproducibility of research and the misinterpretation
of p-values https://royalsocietypublishing.org/doi/10.1098/rsos.171085 - Colquhoun, 2019, The False Positive Risk: A Proposal Concerning What to Do About p-Values
https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1529622 - Benjamin & Berger, Three Recommendations for Improving the Use of p-Values
https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1543135 - Sellke, T., Bayarri, M. J., and Berger, J. O. (2001), “Calibration of p Values for Testing Precise Null Hypotheses,” The American Statistician, 55, 62–71. DOI: 10.1198/000313001300339950. [Taylor & Francis Online],
I am going to set out my current views about the transgender problem. It’s something that has caused a lot of discussion on twitter, much of it unpleasantly vituperative. When I refer to ‘problem’ I’m referring to the vituperation, not, of course, the existence of transgender people. Short posts on twitter don’t allow nuance, so I thought it might be helpful to lay out my views here in the (doubtless vain) hope of being able to move on to talk about other things. This will be my last word on it, because I feel that the time spent on this single problem has become counterproductive.
- The problem is very complicated and nobody knows the answers. Why, for example has the number of people referred to the Tavistock clinic increased 25-fold since 2009? Nobody knows. There has been a great deal of disagreement within the Gender Identity Development Service (GIDS) at the Tavistock about whether and when to refer children for treatment with puberty blockers or surgery. There was a good report by Deborah Cohen about this: https://www.youtube.com/watch?v=zTRnrp9pXHY
- There’s also a good report from BBC Newsnight about people who have chosen to detransition: https://www.youtube.com/watch?v=fDi-jFVBLA8. It shows how much is not known, even by experts.
- Anyone who pretends that it’s a simple problem that can be solved with slogans just isn’t listening. The long term effects of hormone treatments are simply not known.
- This poses a real problem for doctors who are asked for advice by people who feel that they were born in the wrong sex. There is an empathetic discussion from the front line in a recent paper
- I’m very conscious that trans people have often been subjected to discrimination and abuse. That’s totally unacceptable. It’s also unacceptable to vilify women whose views are a bit different.
- Most of the arguments have centred on the meanings of the words ‘woman’, ‘female’, ‘gender’ and ‘sex’. Many of the bitter rows about this topic might be avoided if people defined these words before using them.
- ‘Sex’ and ‘gender’ are relatively easy. When I was growing up, ‘gender’ was a grammatical term, unrelated to sex. Then it evolved to be used as a euphemism for ‘sex’ by those who were too squeamish to use the word ‘sex’. The current use of these words is quite different. It’s discussed at https://www.merriam-webster.com/dictionary/gender#usage-1.
“Sex as the preferred term for biological forms, and gender limited to its meanings involving behavioral, cultural, and psychological traits.“.
This is a sensible distinction, I think. But beware that it’s by no means universally agreed. The meanings are changing all the time and you can get pilloried if you use the ‘wrong’ word.
- The words ‘male’, ‘female’, ‘women’ are much more contentious. Some people say that they refer to biology, having XX chromosomes. This is certainly the definition used in every dictionary I’ve seen. The vast majority of people are born male or female. Apart from the small number of people who are born with chromosomal abnormalities, it’s unambiguous and can’t change.
- But other people now insist, often stridently, the ‘woman’ now refers to gender rather than sex. It would certainly help to avoid misapprehensions if, when using slogans like “trans women are women”, they made clear that they are using this new and unconventional definition of ‘woman’.
- Someone on twitter said that someone had said “transwomen are not women. That is transphobic. If she’d said that transwomen are not female, she’d have just been correct.” I doubt that this distinction is widely accepted. Both statements seem to me to mean much the same thing, but again it’s a matter of definitions.
- If someone who is biologically male feels happier as a woman, that’s fine. They should be able to live as a woman safely, and without discrimination. They should be treated as though they were women. This I take to be the intention of the tweet from J.K. Rowling:
- It seems to me that there is a wafer-thin distinction between “trans women are women” and “trans women should be treated as though they were women”. Yet if you say the wrong one you can be pilloried.
- Many of my friends in what’s known loosely as the skeptical movement have been quite unreasonably exercised about this fine distinction. Many of today’s problems arise from the extreme polarisation of views (on almost everything). This seems to me to be deeply unhelpful.
- I was pilloried by some people when I posted this tweet: “I’ve just finished reading the whole of the post by @jk_rowling. It only increases my admiration for her -a deeply empathetic human. The attacks on her are utterly unjustified.” It’s true that I gained several hundred followers after posting it (though I suspect that not all of them were followers that I would wish to have).
- The problems arise when a small minority of people who have male genitalia (whether they are trans women or predatory males) have used their access to spaces that have been traditionally reserved for women as an opportunity of voyeurism or even rape. In such cases the law should take its course. The existence of a few such cases shouldn’t be used as an excuse to discriminate against all trans women.
- Another case that’s often cited is sports. Being biologically male gives advantages in many sports. Given the huge advances that women have made in sports since the 1960s, it would be very unfortunate if they were to be beaten regularly by people who were biologically male (this has actually happened in sprinting and in weightlifting). In contact sports it could be dangerous. The Rugby Football Union has a policy which will have the effect of stopping most trans women from joining their women’s teams. That seems fair to me. Sports themselves should make the rules to ensure fair play. Some of the rules are summarised in https://en.wikipedia.org/wiki/Transgender_people_in_sports.The problem is to weigh the discrimination against trans women against the discrimination against biological women. In this case, you can’t have both.
- The trans problem has been particularly virulent in the Green Party. I recently endorsed Dr Rosi Sexton for leadership of the Green Party, because she has committed to having better regard for evidence than the other candidates, and because she’s committed to inclusion of minority groups. They are both good things. She has also said “trans women are women”, and that led to prolonged harassment from some of my best skeptical friends. She’s undoubtedly aware of X and Y chromosomes so I take it that she’s using ‘woman’ in the sense of gender rather than sex. Although I’d prefer slightly different words, such as “trans women should be treated as though they were women”, the difference between these two forms of wording seems to be far too small to justify the heat, and even hate, generated on both sides of the argument. Neither form of wording is “transphobic”. To say that they are is, in my opinion, absurd.
- All that I ask is that there should be less stridency and a bit more tolerance of views that don’t differ as much as people seem to think. Of course, anyone who advocates violence should be condemned. Be clear about definitions and don’t try to get people fired because their definitions are different from yours. Be kind to people.
It seems to me to be totally unfair, and deeply misogynist, to pillory Rowling as a ‘transphobe’ on the basis of this (or anything else) she’s said. She’s had some pretty vile abuse. There’s already a problem of women getting abuse on social media, and that’s only added to by the way she’s been treated because of this tweet.
Postcript
The fairness and safety of sports is very often raised in this context. The answer isn’t as obvious as I thought at first, This is a very thoughtful article on that topic: MMA pioneer Rosi Sexton once opposed Fallon Fox competing. Now she explains why she supports trans athletes. The following quotation from it seems totally sensible to me.
“The International Olympic Committee has had a trans-inclusive policy since 2003. In that time, there have been no publicly out trans Olympic athletes (though that will likely change in 2021).
The idea that trans women would make women’s sport meaningless by easily dominating the competition has not, so far, materialized at any level.
If trans women do have an unfair advantage over cis women, then it’s a hard one to spot.”
There will soon be an election for the leader of the Green Party of England and Wales. I support Dr Sexton for the job. Here’s my endorsement. I’ll say why below.
I support Dr Sexton as a candidate to lead the Green Party (England and Wales). She said
“The Green Party is a political party, not a lifestyle movement”.
That’s perceptive. For too long the Green party in the UK has been regarded as marginal, even as tree-huggers. That’s the case despite their success in local government and in other European countries which have fairer voting systems.
She continued
“We need to be serious about inclusion, serious about evidence, and serious about winning elections.”
They are all good aims. As somebody with three degrees in mathematics, she’s better qualified to evaluate evidence than just about any of our members of parliament, for most of whom science is a closed book.
Her breadth of experience is unrivalled. As well as mathematics, she has excelled in music and has been a champion athlete. Winning is her speciality. I believe that she has what it takes to win in politics too.
Here is her first campaign video.
Why am I endorsing Dr Sexton?
Like many people I know, I’ve been politically homeless for a while. I could obviously never vote Conservative, especially now that they’ve succumbed to a far-right coup. In the past, I’ve voted Labour mostly but I couldn’t vote for Jeremy Corbyn. I’ve voted Lib Dem in some recent elections, but it’s hard to forgive what they did during the coalition. So what about the Green Party? I voted for them in the European elections because they have a fair voting system. I would have voted for them more often if it were not for our appallingly unfair first-past-the-post system. So why now?
Firstly, the Greens are growing. They are well represented in the European parliament, and increasingly in local government. Secondly the urgency of doing something about climate change gets ever more obvious. The Greens are also growing up. They are no longer as keen on alternative pseudo- medicine as they once were. Their anti-scientific wing is in retreat. And Dr Sexton, as a person who is interested in evidence, is just the sort of leader that they need to cement that advance.
That’s why I decided to join the Green Party to vote for her to be its leader.
If you want to know more about her, check her Wikipedia page. Or watch this video of an interview that I did with her in 2018.
You can also check her campaign website.
During the Black Lives Matter demonstrations on Sunday 7th June, the statue of Edward Colston was pulled down and dumped in the harbour in Bristol.
I think that it was the most beautiful thing that happened yesterday.
Colston made his money from the slave trade. 84,000 humans were transported on his ships. 19,000 of them died because of the appalling conditions on slave ships.
The statue was erected 174 years after he died, and, astonishingly, 62 years after the abolition of slavery.
According to Historic England, the plaque on the statue read thus.
Edward Colston Born 1636 Died 1721.
Erected by citizens of Bristol as a memorial
of one of the most
virtuous and wise sons of their city
AD 1895
(https://historicengland.org.uk/…/the-list/list-entry/1202137 )
Over the years, many attempts have been made to change the wording on the plaque, but it has never happened.
https://en.wikipedia.org/wiki/Statue_of_Edward_Colston
Would Priti Patel also condemn the removal of statues of Jimmy Saville, the notorious paedophile, as “utterly disgraceful” because he gave money to charities?
Would she condemn the toppling of the statues of Saddam Hussein and of Stalin as vandalism? I very much doubt it.
To those who say that removal of the statue erases history, there is a simple response. There are no statues to Hitler. And he most certainly hasn’t been forgotten.
Quite on the contrary, a lot more people are aware of Colston than was the case 24 hours ago.
The people who pulled the statue down deserve a medal. If they are prosecuted it would bring shame on us.
Please listen to the great historian, David Olusoga. He explains the matter perfectly.
Statues aren’t about history they are about adoration. This man was not great, he was a slave trader and a murderer.
Historian @DavidOlusoga brilliantly explains why BLM protestors were right to tear down the statue of Edward Colston. pic.twitter.com/F1Zn1G8LVn
— Michael Walker (@michaeljswalker) June 7, 2020
“Statues aren’t about history they are about adoration. This man was not great, he was a slave trader and a murderer. Historian @DavidOlusoga brilliantly explains why BLM protestors were right to tear down the statue of Edward Colston. https://t.co/F1Zn1G8LVn”
This example illustrates just how fast exponential growth is. It was proposed on twitter by Charles Arthur (@charlesarthur) who attributes the idea to Simon Moores. The stadium version is a variant of the better known ‘grains of wheat (or rice) on a chessboard‘ problem. The stadium example is better, I think, because the time element gives it a sense of urgency, and that’s what we need right now.
Here’s Wembley Stadium. The watering system develops a fault: in minute 1 one drop of water is released; minute 2, two drops, minute 3 four drops, and so on. Every minute the number of drops doubles. How long will it take to fill Wembley stadium?
The answer is that after 44 minutes, before half-time, the stadium would overflow.
Here’s why.
The sequence is 1, 2, 4, 8, 16, . . . so the nth term in the sequence is 2n – 1. For example, the 4th term is 23 = 8.
Next we need to know how many drops are needed to fill the stadium. Suppose a drop of water has a volume of 0.07 ml. This is 0.00000007, or 7 x 10-8, cubic metres. Wembley Stadium has a volume of 1.1 million cubic metres. So the stadium holds 15,714,285,714,285 drops. Or about 15.7×1012 drops. How many minutes does it take to get to this volume of water?
After n minutes, the total volume of water will be the sum of all the drips up to that point. This turns out to be 2n-1. If this baffles you, check this video (in our case a =1 and r = 2).
We want to solve for n, the number of steps (minutes), 2n = 1 + 15.7×1012. The easiest way to do this is to take the logarithm of both sides.
n log(2) = log(1 + 15.7×1012).
So
n = log(1 + 15.7×1012) / log(2) = 44.8 minutes
At the 43rd minute the stadium would be more than half full: (243 – 1) = 8.80 x 1012, i.e. 56% of capacity.
By the 44th minute the stadium would have overflowed: (244 – 1) = 17.6 x 1012, i.e. 112% of capacity.
Notice that (after the first few minutes) in the nth minute the volume released is equal to the total volume that’s already in, so at the 44th minute an extra 8.80 x 1012 drops are dumped in. And at the 45th minute more than another stadium-full would appear.
The speed of the rise is truly terrifying.
Relationship of this to COVID-19
The rise in the number of cases, and of deaths, rises at first in a way similar to the rise in water level in Wembley stadium. The difference is that the time taken for the number to double is not one minute, but 2 – 3 days.
As of today, Monday 23rd March, both the number of diagnosed cases, and the number of deaths, in the UK are still rising exponentially. The situation in the UK now is almost exactly what it was in Italy 15 days ago. This from Inigo Martincorena (@imartincorena), shows it beautifully.
Boris Johnson’s weak and woolly approach will probably cost many thousands of lives and millions of pounds.
Up to now I’ve resisted the temptation to suggest that Whitty and Vallance might have been influenced by Dominic Cummings. After this revelation, in yesterday’s Sunday times, it’s getting progressively harder to believe that.
We have been self-isolated since March 12th, well before Johnson advised us it. It was obvious common sense.
Please stay at home and see nobody if you possible can.. This cartoon, by Toby Morris of thespinoff.co.nz, shows why.
Some good reading
The report from Imperial College, 16 March 2020, that seems to have influenced the government:
Tomas Pueyo. His piece on “Coronavirus: Why You Must Act Now“, dated March 10th, had 40 million views in a week
Tomas Pueyo. March 19th; What the Next 18 Months Can Look Like, if Leaders Buy Us Time.
“Some countries, like France, Spain or Philippines, have since ordered heavy lockdowns. Others, like the US, UK, or Switzerland, have dragged their feet, hesitantly venturing into social distancing measures.”
David Spiegelhalter. March 21st.
“So, roughly speaking, we might say that getting COVID-19 is like packing a year’s worth of risk into a week or two. Which is why it’s important to spread out the infections to avoid the NHS being overwhelmed.”
Washington Post. Some excellent animations March 14th
Up to date statistics. Worldometer is good (allows semi-log plots too).
The inquiry into UCL’s historical role in eugenics was set up a year ago. Its report was delivered on Friday 28 February 2020.
Nine (the MORE group) of the 16 members of the inquiry commission refused to sign the final report and issued their own recommendations.
The reasons for this lack of consensus included the fact that the final report did not look beyond the 1930s. It failed to deal with the science, and, in particular, it failed to investigate the London Conference on Intelligence, which was one of the reasons the inquiry was set up. That is a topic that I addressed at the time.
Firstly I should say that I agree entirely with all the recommendations, including those of the MORE group.
I’ve thought for a while now that the Galton and Pearson buildings/theatres should be renamed with a prominent plaque saying why.
But I was disappointed by the scope of the inquiry, and by the fact that it failed entirely to engage with the science. This was dealt with much better in the excellent podcast by Subhadra Das which came out at the same time. She had also made an excellent podcast, “Bricks + Mortals, A history of eugenics told through buildings“.
The inquiry did some surveys by email. This was a laudable attempt, but they only got about 1200 responses, from 50,000 UCL staff and students and 200,000 alumni. With such a low, self-selected, response rate these can’t be taken seriously. The author of this report said “I believe some of the ontological assumptions of scientists who researched eugenics are still firmly embedded in the fabric of UCL”. No further details were given and I’m baffled by this statement. It contradicts directly my own experience.
I was also disappointed by some passages in the official report. For example, referring to the ‘London Conference on Intelligence’, it says
“Occurring in the midst of activism to decolonise UCL, it suggested a ‘Janus-faced’ institution, with one face promoting equality in line with its statutory duty of care12 and the other a quiet acquiescence and ambivalence to UCL’s historical role in eugenics and its consequences for those Galton theorised as being unworthy.”
This seems to me to be totally unfair. I have been at UCL since 1964, and in all that time I have never once heard anyone with an “ambivalent” attitude to eugenics. In fact ever since Lionel Penrose took over the Galton chair in 1946, every UCL person whom I have read or met has condemned eugenics. In his 1946 inaugural lecture, Penrose said
“In the light of knowledge of its frequent misuse, inclusion of the term “racial” in the definition seems unfortunate. A racial quality is presumably any character which differs in frequency or which (when it is metrical) differs in average value in two or more large groups of people. No qualities have been found to occur in every member of one race and in no member of another.”
The inquiry stops in the 1930s. There is no acknowledgment of the fact that work done in the Lab for Human Genetics at UCL, ever since the end of WW2, has contributed hugely to the knowledge we now have about topics like genetics and race. They have done as much as anyone to destroy the 19th and early 20th century myths about eugenics.
London Conference on Intelligence
I think the allusion, quoted above, to the London Conference on Intelligence (LCI) was totally unfair. The only, very tenuous, connection between LCI and UCL was that a room was booked for the conferences in secret by a James Thompson. He was an honorary lecturer in psychology. He filled in the forms dishonestly as shown in the report of the investigation of them.
As shown in appendix 5 of this report, the questions about “Is speaker or topic likely to be controversial?” were not filled in. In fact much of the application form was left blank. This should have resulted in the room bookings being referred to someone who understood the topic. They were not. As a result of this mistake by a booking clerk, Thompson succeeded in holding a poisonous conference four times on UCL property, without anyone at UCL being aware of it.
The existence of the conference came to light only when it was discovered by Ben Van Der Merwe, of the London Student newspaper. He contacted me two days before it was published, for comment, and I was able to alert UCL, and write about it myself, in Eugenics, UCL and Freedom of Speech.
As everyone knows, the rise of alt-right populism across the world has given rise to a lunatic fringe of pseudoscientific people who once again give credence to eugenics. This has been documented in Angela Saini’s recent book, Superior. Thompson is one of them. The report on his conferences fails to tell us how and when he came to be an honorary lecturer and whether he ever taught at UCL, and, if he did, what did he teach. It should have done.
Although the honorary title for James Thompson has now been revoked, this has, as far as I know, never been announced publicly. It should have been.
It’s very unfortunate that the Inquiry didn’t go into any of this.
One small problem
I started this blog by saying that I agreed with all of the recommendations of both the main report and that of the MORE group. But there is one recommendation which I can’t understand how to implement in practice.
“Departments must devise action plans for all teaching programmes to engage critically with the history and legacy of eugenics at UCL”
After the question of ‘decolonising the curriculum’ came up, I took the problem seriously and spoke, among others, to UCL’s diversity officer. My teaching at the time was largely about the stochastic theory of single molecule kinetics, and about non-linear curve fitting.
The reason for talking to these people was to seek advice about how I could decolonise these topics. Sad to say, I didn’t get any helpful advice from these discussions. I still don’t understand how to do it. If you have any ideas, please tell me in the comments.
Follow-up
I have just been given some more information about James Thompson, the person behind the London Conference on Intelligence.
“Dr Thompson was made an honorary lecturer in 2007, following his retirement from UCL. As a clinical psychologist he was a member of staff from 1987, joining UCL by transfer when the UCH and Middlesex Hospital departments of psychiatry merged.
We do not have detailed records of Dr Thompson’s teaching at UCL. He was a Senior Lecturer in Psychology with primary responsibility for teaching medical students. He was given honorary status in 2007 as he had agreed to deliver 2 lectures to students on a neuroscience and behaviour module – one in 2007 on the placebo effect and one in 2008 on depression. There is no record of any involvement in teaching at UCL after the second lecture.
His honorary appointment was approved by the Head of Department.”
I hope to have a bit more information soon.
On Sunday 23 September, we recorded an interview with Rosi Sexton. Ever since I got to know her, I’ve been impressed by her polymathy. She’s a musician, a mathematician and a champion athlete, and now an osteopath: certainly an unusual combination. You can read about her on her Wikipedia page: https://en.wikipedia.org/wiki/Rosi_Sexton.
The video is long and wide-ranging, so I’ll give some bookmarks, in case you don’t want to watch it all. (And please excuse my garish London marathon track suit.)
Rosi recently started to take piano lessons again, after a 20 year break. She plays Chopin in the introduction, and Prokofiev and Schubert at 17:37 – 20:08. They are astonishingly good, given the time that’s elapsed since she last played seriously.
We started corresponding in 2011, about questions concerning evidence and alternative medicine as well as sports. Later we talked about statistics too: her help is acknowledged in my 2017 paper about p values. And discussions with her gave rise to the slide at 26:00 in my video on that topic.
Rosi’s accomplishments in MMA have been very well-documented and my aim was to concentrate on her other achievements. Nonetheless we inevitably had to explore the reasons why a first class mathematician chose to spend 14 years of her life in such a hard sport. I’m all for people taking risks if they want to. I have more sympathy for her choice than many of my friends, having myself spent time doing boxing, rugby, flying, sailing, long distance running, and mountain walking. I know how they can provide a real relief from the pressures of work.
The interview starts by discussing when she started music (piano, age 6) and how she became interested in maths. In her teens, she was doing some quite advanced maths: she relates later (at 1:22:50) how she took on holiday some of Raymond Smullyan’s books on mathematical logic at the age of 15 or 16. She was also playing the piano and the cello in the Reading Youth Orchestra, and became an Associate of the London College of music at 17. And at 14 she started Taekwondo, which she found helpful in dealing with teenage demons.
She was so good at maths that she was accepted at Trinity College, Cambridge where she graduated with 1st class hons. And then went on to a PhD, at Manchester. It was during her PhD that she became interested in MMA. We talk at 23:50 about why she abandoned maths (there’s a glimpse of some of her maths at 24:31), and devoted herself to MMA until she retired from that in 2014. In the meantime she took her fifth degree, in osteopathy, in 2010. She talks about some of her teenage demons at 28:00.
Many of my sceptical friends regard all osteopaths as quacks. Some certainly are. I asked Rosi about this at 38:40 and her responses can’t be faulted. She agrees that it’s rarely possible to know whether the treatments she uses are effective or whether the patient would have improved anyway. She understands regression to the mean. We discussed the problem of responders and non-responders. She appreciates that it’s generally not possible to tell whether or not they exist (for more on this, see Stephen Senn’s work. . Even the best RCT tells us only about the average response. Not all osteopath’s are the same.
We talk about the problems of doping and of trans competitors in sports at 49:30, and about the perception of contact sports at 59:32. Personally I have no problem with people competing in MMA, boxing or rugby, if that’s what they want to do. Combat sports are the civilised alternative to war. It isn’t the competitors that I worry about, it’s the fans.
At 1:14:28 we discussed how little is known about the long-term dangers of contact sports. The possible dangers of concussion led to a discussion of Russell’s paradox at 1:20:40.
I asked why she’s reluctant to criticise publicly things like acupuncture or “craniosacral therapy” (at 1:25:00). I found her answers quite convincing.
At 1:43:50, there’s a clip taken from a BBC documentary of Rosi’s father speaking about his daughter’s accomplishments, her perfectionism and her search for happiness.
Lastly, at 1:45:27, there’s a section headed “A happy new beginning”. It documents Rosi’s 40th birthday treat, when she with her new partner, Stephen Caudwell, climbed the highest climbing wall in the world, the Luzzone dam. After they walked down at the end of the climb, they got engaged.
I wish them both a very happy future.
Postcript. Rosi now runs the Combat Sports Clinic. The have recently produced a video about neck strength training, designed to help people who do contact sports -things like rugby, boxing, muay thai and MMA. I’ve seen only the preview, but there is certainly nothing quackish about it. It’s about strength training.
If you are not a pharmacologist or physiologist, you may never have heard of Bernard Ginsborg. I first met him in 1960. He was a huge influence on me and a great friend. I’m publishing this here because the Physiological Society has published only a brief obituary.
Bernard with his wife, Andy (Andrina).
You can download the following documents.
- Biography written by one of his daughters, Jane Ginsborg.
- Bernard’s scientific work, written by Donald H. Jenkinson (who knew him from his time in Bernard Katz’s Department of Biophysics).
- A tribute by Randall House, who collaborated with Bernard in Edinburgh.
- An obituary by Professor A. Mark Evans, Chair of Cellular Pharmacology, University of Edinburgh.
- Bernard’s obituary in The Times.
- Bernard’s Wikipedia entry.
I’ll post here my own recollections of Bernard here.
Bernard Ginsborg was a lecturer in the Pharmacology department in Edinburgh when I joined that department in 1960, as a PhD student.
I recall vividly our first meeting in the communal tea room: smallish in stature, large beard and umbrella, My first reaction was ‘is this chap ever going to stop talking?’. My second reaction followed quickly: this chap has an intellect like nobody I’d encountered before.
I’d been invited to Edinburgh by Walter Perry, who had been external examiner for my first degrees in Leeds. In my 3rd year viva, he’d asked me to explain the difference between confidence limits and fiducial limits. Of course I couldn’t answer, and spent much of my 4th year trying to find out. I didn’t succeed but produced a paper that most have impressed him. He, together with W.E. Brocklehurst, were my PhD supervisors. I saw Perry only when he dropped into my lab for a cigarette between committee meetings., but he treated me very well. He got me a Scottish Hospitals Endowment Trust scholarship which paid twice the usual MRC salary for a PhD student, and he made me an honorary lecturer so that I could join the magnificent staff club on Chambers Street (now gone), where I met, among many others, Peter Higgs, of boson fame.
I very soon came to wish that Bernard was my supervisor rather than Perry. I loved his quantitative approach. A physicist was more appealing to me than a medic. We spent a lot of time talking and I learnt a huge amount from him. I had encountered some of Bernard Katz’s papers in my 4th undergraduate year, and realised they were something special, but I didn’t know enough about electrophysiology to appreciate them fully. Bernard explained it all to me. His 1967 review, Ion movements in junctional transmission, is a classic: still worth reading by anyone interested in electrophysiology. Bernard’s mathematical ability was invaluable to me when, during my PhD, I was wrestling with the equations for diffusion in a cylinder with binding (see appendix here).
The discussions in the communal tea room were hugely educational. Dick Barlow and R.P. Stephenson were especially interesting, I soon came to realise that Bernard had a better grasp on quantitative ideas about receptors than either of them. His use of Laplace transforms to solve simultaneous differential equations in a 1974 paper was my first introduction to them, and that proved very useful to me later. Those discussions laid the ground for a life-long interest in the topic for me.
After I left the pharmacology department in 1964, contact became more intermittent for a while. I recall vividly a meeting held in Crans sur Sierre, Switzerland in 1977, The meetings there were good, despite have started as golfing holidays for J. Murdoch Ritchie and Joe Hoffman. There was a certain amount of tension between Bernard and Charles F Stevens, the famous US electrophysiologist. Alan Hawkes and I had just postulated that the unitary event in ion channel opening at the neuromuscular junction was a short burst of openings rather than single openings. This contradicted the postulate by Anderson & Stevens (1973) that binding of the agonist was very fast compared with the channel opening and shutting. At the time our argument was theoretical –it wasn’t confirmed experimentally until the early 80s. Bernard was chairing a session and he tried repeatedly to get Stevens to express an opinion on our ideas, but failed.
At dinner, Stevens was holding court: he expressed the view that rich people shouldn’t pay tax because there were too few of them and it cost more to get them pay up than it was worth. He sat back to wait for the angry protestations of the rest of the people at the table, He hadn’t reckoned with Bernard. He said how much he agreed, and by the same token, the police shouldn’t waste time trying to catch murderers. There were too few of them and it wasted too much police time. The argument was put eloquently as only Bernard could do. Stevens, who I suspect had not met Bernard before, was uncharacteristically speechless. He had no idea what hit him. It was a moment to savour.
May 1977, Crans sur Sierre, Switzerland.
For those who knew Bernard, it was another example of his ability to argue eloquently for any proposition whatsoever. I’d been impressed by his speech on how the best way to teach about drugs was to teach them in alphabetical order: it would make as much sense as any other way of categorising them. Usually there was just enough truth in these propositions to make the listener who hadn’t heard him in action before, wonder, for a moment, if he was serious. The picture shows him with my wife, Margaret, at the top of a mountain during a break in the meeting. He’d walked up, putting those of us who’d taken the train to shame.
In 1982, Alan Hawkes and I published a paper with the title “On the stochastic properties of bursts of single ion channel openings and of clusters of bursts.” It was 59 pages long with over 400 equations, most of which used matrix notation. After it had been accepted, I discovered that Bernard had taken on the heroic job of reviewing itl This came to light when I got a letter from him that consisted of two matrices which, when multiplied out, revealed his role.
For many years Bernard invited me to Edinburgh to talk to undergraduates about receptors and electrophysiology. (I’ve often wondered if that’s why most of our postdocs came from Glasgow than from Edinburgh during that time.) It was on one such visit in 1984 that I got a phone call to say that my wife, Margaret, had collapsed on the railway station at Walton-on-Thames while 6 months pregnant, and had been taken to St Peter’s Hospital in Chertsey. The psychodrama of our son’s birth has been documented elsewhere, A year later we came to Edinburgh once again. The pictures taken then show Bernard looking impishly happy, as he very often did, in his Edinburgh house in Magdala Crescent. The high rooms were lined with books, all of which he seemed to have read. His intellect was simply dazzling.
December 19th 1985. Magdala Crescent, Edinburgh
The following spring we visited again, this time with our son Andrew, aged around 15 months. We went with Bernard and Andy to the Edinburgh Botanic gardens. Andrew who was still not walking, crawled away rapidly up a grassy slope. Andy said don’t worry, when he gets to the tope he’ll stop and look back for you. She was a child psychologist so we believed her. Andrew, however, disappeared from sight over the brow of the hill.
During these visits, we stayed with Bernard and Andy at their Edinburgh house.
The experience of staying with them was like being exposed to an effervescent intellectual fountain. It’s hard to think of a better matched couple.
After Bernard retired in 1985, he took no further interest in science. For him, it was a chance to spend time on his many other interests. After he went to live in France, contact became more intermittent. Occasional emails were exchanged. It was devastating to hear about the death of Andy in 2013. The last time that I saw both of them was in 2008, at John Kelly’s house. He was barely changed from the day that I met him in 1960.
Bernard was a legend. It’s hard to believe that he’s no longer here.
Bernard in 2008 at John Kelly’s house.
Lastly, here is a picture taken at the 2009 meeting of the British Pharmacological Society, held in Edinburgh.
At the British Pharm. Soc meeting, 2009.Left to right: DC, BLG, John Kelly, Mark Evans, Anthony Harmer
Follow-up
See also The history of eugenics at UCL: the inquiry report.
On Monday evening (8th January 2018), I got an email from Ben van der Merwe, a UCL student who works as a reporter for the student newspaper, London Student. He said
“Our investigation has found a ring of academic psychologists associated with Richard Lynn’s journal Mankind Quarterly to be holding annual conferences at UCL. This includes the UCL psychologist professor James Thompson”.
He asked me for comment about the “London Conference on Intelligence”. His piece came out on Wednesday 10th January. It was a superb piece of investigative journalism. On the same day, Private Eye published a report on the same topic.
I had never heard about this conference, but it quickly became apparent that it was a forum for old-fashioned eugenicists of the worst kind. Perhaps it isn’t surprising that neither I, nor anyone else at UCL that I’ve spoken to had heard of these conferences because they were surrounded by secrecy. According to the Private Eye report:
“Attendees were only told the venue at the last minute and asked not to share the information”
The conference appears to have been held at least twice before. The programmes for the 2015 conference [download pdf] and the 2016 conference [download pdf] are now available, but weren’t public at the time. They have the official UCL logo across the top despite the fact that Thompson has been only an honorary lecturer since 2007.
A room was booked for the conference through UCL’s external room booking service. The abstracts are written in the style of a regular conference. It’s possible that someone with no knowledge of genetics (as is likely to be the case for room-booking staff) might have not spotted the problem.
The huge problems are illustrated by the London Student piece, which identifies many close connections between conference speakers and far-right, and neo-nazi hate groups.
“[James Thompson’s] political leanings are betrayed by his public Twitter account, where he follows prominent white supremacists including Richard Spencer (who follows him back), Virginia Dare, American Renaissance, Brett Stevens, the Traditional Britain Group, Charles Murray and Jared Taylor.”
“Thompson is a frequent contributor to the Unz Review, which has been described as “a mix of far-right and far-left anti-Semitic crackpottery,” and features articles such as ‘America’s Jews are Driving America’s Wars’ and ‘What to do with Latinos?’.
His own articles include frequent defences of the idea that women are innately less intelligent than men (1, 2, 3,and 4), and an analysis of the racial wage gap which concludes that “some ethnicities contribute relatively little,” namely “blacks.”
“By far the most disturbing of part of Kirkegaard’s internet presence, however, is a blog-post in which he justifies child rape. He states that a ‘compromise’ with paedophiles could be:
“having sex with a sleeping child without them knowing it (so, using sleeping medicine. If they don’t notice it is difficult to see how they cud be harmed, even if it is rape. One must distinguish between rape becus the other was disconsenting (wanting to not have sex), and rape becus the other is not consenting, but not disconsenting either.”
The UCL Students’ Union paper, Cheesegrater, lists some of James Thompson’s tweets,including some about brain size in women.
Dr Alice Lee
It’s interesting that these came to light on the same day that I learned that the first person to show that there was NO correlation between brain size and intelligence was Dr Alice Lee, in 1901: A First Study of the Correlation of the Human Skull. Phil. Trans. Roy. Soc A https://doi.org/10.1098/rsta.1901.0005 [download pdf].
Alice Lee published quite a lot, much of it with Pearson. In 1903, for example, On the correlation of the mental and physical characters in man. Part II Alice Lee, Marie A. Lewenz and Karl Pearson https://doi.org/10.1098/rspl.1902.0070 [download pdf]. She shows herself to be quite feisty in this paper -she says of a paper with conclusions that differs from hers
“Frankly, we consider that the memoir is a good illustration of how little can be safely argued from meagre data and a defective statistical theory.”
She also published a purely mathematical paper, “On the Distribution of the Correlation Coefficient in Small Samples”, H. E. Soper, A. W. Young, B. M. Cave, A. Lee and K. Pearson, Biometrika, 11, 1917, pp. 328-413 (91 pages) [download pdf]. There is interesting comment on this paper in encyclopedia.com.
Alice Lee was the first woman to get a PhD in mathematics from UCL and she was working in the Galton laboratory, under Karl Pearson. Pearson was a great statistician but also an extreme eugenicist. It was good to learn that he supported women in science at a time when that was almost unknown. The Dictionary of National Biography says
“He considered himself a supporter of equal rights and opportunities for women (later in his capacity as a laboratory director he hired many female assistants), yet he also expressed a willingness to subordinate these ideals to the greater good of the race.”
But it must never be forgotten that Karl Pearson said, in 1934,
” . . . that lies rather in the future, perhaps with Reichskanzler Hitler and his proposals to regenerate the German people. In Germany a vast experiment is in hand, and some of you may live to see its results. If it fails it will not be for want of enthusiasm, but rather because the Germans are only just starting the study of mathematical statistics in the modern sense!”
And if you think that’s bad, remember that Ronald Fisher, after World War 2, said, in 1948,
“I have no doubt also that the [Nazi] Party sincerely wished to benefit the German racial stock, especially by the elimination of manifest defectives,
such as those deficient mentally, and I do not doubt that von Verschuer gave, as I should have done, his support to such a movement.”
For the context of this comment, see Weiss (2010).
That’s sufficient reason for the removal of their names from buildings at UCL.
What’s been done so far?
After I’d warned UCL of the impending scandal, they had time to do some preliminary investigation. An official UCL announcement appeared on the same day (10 Jan, 2018) as the articles were published.
“Our records indicate the university was not informed in advance about the speakers and content of the conference series, as it should have been for the event to be allowed to go ahead”
“We are an institution that is committed to free speech but also to combatting racism and sexism in all forms.”
“We have suspended approval for any further conferences of this nature by the honorary lecturer and speakers pending our investigation into the case.”
That is about as good as can be expected. It remains to be seen why the true nature of the conferences was not spotted, and it remains to be seen why someone like James Thompson was an honorary senior lecturer at UCL. Watch this space.
How did it happen
Two videos that feature Thompson are easily found. One, from 2010, is on the UCLTV channel. And in March 2011, a BBC World News video featured Thompson.
But both of these videos are about his views on disaster psychology (Chilean miners, and Japanese earthquake, respectively). Neither gives any hint of his extremist political views. To discover them you’d have to delve into his twitter account (@JamesPsychol) or his writings on the unz site. It’s not surprising that they were missed.
I hope we’ll know more soon about how these meetings slipped under the radar. Until recently, they were very secret. But then six videos of talks at the 2017 meeting were posted on the web, by the organisers themselves. Perhaps they were emboldened by the presence of an apologist for neo-nazis in the White House, and by the government’s support for Toby Young, who wrote in support of eugenics. The swing towards far-right views in the UK, in the USA and in Poland, Hungary and Turkey, has seen a return to public discussions of views that have been thought unspeakable since the 1930s. See, for example, this discussion of eugenics by Spectator editor Fraser Nelson with Toby Young, under the alarming heading “Eugenics is back“.
The London Conference on Intelligence channel used the UCL logo, and it was still public on 10th January. It had only 49 subscribers. By 13th January it had been taken down (apparently by its authors). But it still has a private playlist with four videos which have been viewed only 36 times (some of which were me). Before it vanished, I made a copy of Emil Kirkegard’s talk, for the record.
Freedom of speech
Incidents like this pose difficult problems, especially given UCL’s past history. Galton and Pearson supported the idea of eugenics at the beginning of the 20th century, as did George Bernard Shaw. But modern geneticists at the Galton lab have been at the forefront in showing that these early ideas were simply wrong.
UCL has, in the past, rented rooms for conferences of homeopaths. Their ideas are deluded and sometimes dangerous, but not illegal. I don’t think they should be arrested, but I’d much prefer that their conferences were not at UCL.
A more serious case occurred on 26 February 2008. The student Islamic Society invited representatives of the radical Islamic creationist, Adnan Oktar, to speak at UCL. They were crowing that the talk would be held in the Darwin lecture theatre (built in the place formerly occupied by Charles Darwin’s house on Gower Street). In the end, the talk was allowed to go ahead, but it was moved by the then provost to the Gustave Tuck lecture theatre, which is much smaller, and which was built from a donation by the former president of the Jewish Historical Society. See more accounts here, here and here. It isn’t known what was said, so there is no way to tell whether it was illegal, or just batty.
It is very hard to draw the line between hate talk and freedom of speech. There was probably nothing illegal about what was said at the Intelligence Conferences. It was just bad science, used to promote deeply distasteful ideas..
Although, in principle, renting a room doesn’t imply any endorsement, in practice all crackpot organisations love to use the name of UCL to promote their cause. That alone is sufficient reason to tell these people to find somewhere else to promote their ideas.
Follow up in the media
For a day or two the media were full of the story. It was reported, for example, in the Guardian and in the Jewish Chronicle,
On 11th January I was asked to talk about the conference on BBC World Service. The interview can be heard here.
The real story
Recently some peope have demanded that the names of Galton and Pearson should be expunged from UCL.
There would be a case for that if their 19th century ideas were still celebrated, just as there is a case for removing statues that celebrate confederate generals in the southern USA. Their ideas about measurement and statistics are justly celebrated. But their ideas about eugenics are not celebrated.
On the contrary, it is modern genetics, done in part by people in the Galton lab, that has shown the wrongness of 19th century views on race. If you want to know the current views of the Galton lab, try these. They could not be further from Thompson’s secretive pseudoscience.
Steve Jones’ 2015 lecture “Nature, nurture or neither: the view from the genes”,
Or check the writing of UCL alumnus, Adam Rutherford: “Why race is not a thing, according to genetics”,
or, from Rutherford’s 2017 article
“We’ve known for many years that genetics has profoundly undermined the concept of race”
“more and more these days, racists and neo-Nazis are turning to consumer genetics to attempt to prove their racial purity and superiority. They fail, and will always fail, because no one is pure anything.”
“the science that Galton founded in order to demonstrate racial hierarchies had done precisely the opposite”
Or read this terrific account of current views by Jacob A Tennessen “Consider the armadillos“.
These are accounts of what geneticists now think. Science has shown that views expressed at the London Intelligence Conference are those of a very small lunatic fringe of pseudo-scientists. But they are already being exploited by far-right politicians.
It would not be safe to ignore them.
Follow-up
15 January 2018. The involvement of Toby Young
The day after this was posted, my attention was drawn to a 2018 article by the notorious Toby Young. In it he confirms the secretiveness of the conference organisers.
“I discovered just how cautious scholars in this field can be when I was invited to attend a two-day conference on intelligence at University College London by the academic and journalist James Thompson earlier this year. Attendees were only told the venue at the last minute – an anonymous antechamber at the end of a long corridor called ‘Lecture
Room 22’ – and asked not to share the information with anyone else.”
More importantly, it shows that Toby Young has failed utterly to grasp the science.
“You really have to be pretty stubborn to dispute that general cognitive ability is at least partly genetically based.”
There is nobody who denies this.
The point is that the interaction of nature and nurture is far more subtle than Young believes, and that makes attempts to separate them quantitatively futile. He really should educate himself by looking at the accounts listed above (The real story)
16 January 2018. How UCL has faced its history
Before the current row about the “London Intelligence Conference”, UCL has faced up frankly to its role in the development of eugenics. It started at the height of Empire, in the 19th century and continued into the early part of the 20th century. The word “eugenics” has not been used at UCL since it fell into the gravest disrepute in the 1930s, and has never been used since WW2. Not, that is, until Robert Thompson and Toby Young brought it back. The history has been related by curator and science historian, Subhadra Das. You can read about it, and listen to episodes of her podcast, at “Bricks + Mortals, A history of eugenics told through buildings“. Or you can listen to her whole podcast.
Although Subhadra Das describes Galton as the Victorian scientist that you’ve never heard of. I was certainly well aware of his ideas before I first came to UCL (in 1964). But at that time. I thought of Karl Pearson only as a statistician, and I doubt if I’d even heard of Flinders Petrie. Learning about their roles was a revelation.
17 January 2018.
Prof Semir Zeki has been pointed out to me that it’s not strictly to say “the word “eugenics” has not been used at UCL since it fell into the gravest disrepute in the 1930s”. It’s true to say that nobody advocated it but the chair of Eugenics was not renamed the chair of Human Genetics until 1963. This certainly didn’t imply approval. Zeki tells me that it’s holder “Lionel Penrose, when he mentioned his distaste for the title, saying that it was a hangover from the past, and should be changed”.
Today we went to see the film Goodbye Christopher Robin. It was very good. I, like most children, read Pooh books as a child.
Image from Wikipedia
I got interested in their author, A.A. Milne, when I discovered that he’d done a mathematics degree at Cambridge. So had my scientific hero A.V. Hill, and (through twitter) I met AV’s granddaughter, Alison Hill. I learned that AV loved to quote A.A.Milne’s poem, OBE.
O.B.E. I know a Captain of Industry, I know a Lady of Pedigree, I know a fellow of twenty-three, I had a friend; a friend, and he |
This poem clearly reflects Milne’s experience in WW1. He was at the Battle of the Somme, despite describing himself as a pacifist. In the film he’s portrayed as suffering from PTSD (shell shock as it used to be called). The sound of a balloon popping could trigger a crisis. He was from a wealthy background. He, and his wife Daphne, employed a nanny and maid.
The first Pooh book, When We Were Very Young, came out in 1924, when Milne’s son, Christopher Robin, was four. The nanny is, in some ways, the hero of the film. It was she, not his parents, who looked after Christopher Robin, and the child loved her. In contrast, his parents were distant and uncommunicative.
By today’s standards, Christopher Robin’s upbringing looks almost like child neglect. One can only speculate about how much his father’s PTSD was to blame for this. But his mother had no such excuse. It seems likely to me that part of the blame attaches to the fact that Milne was brought up as an “English gentleman”. Looking after children was a job for nannies, not parents. Milne went to a private school (Westminster), and Christopher Robin was sent to private schools. At 13 he was sent away from his parents, to Stowe school, where he suffered a lot of bullying. That is a problem that’s endemic and it’s particularly bad in private boarding schools.
I have seen it at first hand. I went to what was known at the time as a direct grant school, and I was a day boy. But the school did its best to ape a private school. It was a cold and cruel place. Once, I came off my bike and went head first into a sandstone wall. While recovering in the matron’s room I looked at some of the books there. They were mostly ancient boys’ stories that lauded the virtues of the British Empire. Even at 13, I was horrified.
After he reached the age of 9, Christopher Robin resented increasingly what he came to see as his parents’ exploitation of his childhood. After WW2, Christopher Robin got married but his parents didn’t approve of his choice. He became estranged from his parents, and went to run a bookshop in Dartmouth (Devon). Once his father died, he did not see his mother during the 15 years that passed before her death. Even when she was on her deathbed, she refused to see her son.
It’s a sad story, and the film conveys that well. I wonder whether it might have been different if it were not for the horrors of WW1 and the horrors of the upbringing of English gentlemen.
It would be good to think that things were better now. They are better, but the old problems haven’t vanished. The UK is still ruled largely by graduates from Oxford and Cambridge. They take mostly white kids from expensive private schools. These institutions specialise in giving people confidence that exceeds their abilities. Now the UK is mocked across the world for its refusal to modernise and for the delusions of empire that are brexit. The New York Times commented
if what the Brexiteers want is to return Britain to a utopia they have devised by splicing a few rose-tinted memories of the 1950s together with an understanding of imperial history derived largely from images on vintage biscuit tins,
Just look at at this recent New Yorker cover. Look at Jacob Rees-Mogg. And look at Brexit.
Follow-up
This piece is almost identical with today’s Spectator Health article.
This week there has been enormously wide coverage in the press for one of the worst papers on acupuncture that I’ve come across. As so often, the paper showed the opposite of what its title and press release, claimed. For another stunning example of this sleight of hand, try Acupuncturists show that acupuncture doesn’t work, but conclude the opposite: journal fails, published in the British Journal of General Practice).
Presumably the wide coverage was a result of the hyped-up press release issued by the journal, BMJ Acupuncture in Medicine. That is not the British Medical Journal of course, but it is, bafflingly, published by the BMJ Press group, and if you subscribe to press releases from the real BMJ. you also get them from Acupuncture in Medicine. The BMJ group should not be mixing up press releases about real medicine with press releases about quackery. There seems to be something about quackery that’s clickbait for the mainstream media.
As so often, the press release was shockingly misleading: It said
Acupuncture may alleviate babies’ excessive crying Needling twice weekly for 2 weeks reduced crying time significantly
This is totally untrue. Here’s why.
Luckily the Science Media Centre was on the case quickly: read their assessment. The paper made the most elementary of all statistical mistakes. It failed to make allowance for the jelly bean problem. The paper lists 24 different tests of statistical significance and focusses attention on three that happen to give a P value (just) less than 0.05, and so were declared to be "statistically significant". If you do enough tests, some are bound to come out “statistically significant” by chance. They are false postives, and the conclusions are as meaningless as “green jelly beans cause acne” in the cartoon. This is called P-hacking and it’s a well known cause of problems. It was evidently beyond the wit of the referees to notice this naive mistake. It’s very doubtful whether there is anything happening but random variability. And that’s before you even get to the problem of the weakness of the evidence provided by P values close to 0.05. There’s at least a 30% chance of such values being false positives, even if it were not for the jelly bean problem, and a lot more than 30% if the hypothesis being tested is implausible. I leave it to the reader to assess the plausibility of the hypothesis that a good way to stop a baby crying is to stick needles into the poor baby. If you want to know more about P values try Youtube or here, or here. |
One of the people asked for an opinion on the paper was George Lewith, the well-known apologist for all things quackish. He described the work as being a "good sized fastidious well conducted study ….. The outcome is clear". Thus showing an ignorance of statistics that would shame an undergraduate.
On the Today Programme, I was interviewed by the formidable John Humphrys, along with the mandatory member of the flat-earth society whom the BBC seems to feel obliged to invite along for "balance". In this case it was professional acupuncturist, Mike Cummings, who is an associate editor of the journal in which the paper appeared. Perhaps he’d read the Science media centre’s assessment before he came on, because he said, quite rightly, that
"in technical terms the study is negative" "the primary outcome did not turn out to be statistically significant"
to which Humphrys retorted, reasonably enough, “So it doesn’t work”. Cummings’ response to this was a lot of bluster about how unfair it was for NICE to expect a treatment to perform better than placebo. It was fascinating to hear Cummings admit that the press release by his own journal was simply wrong.
Listen to the interview here
Another obvious flaw of the study is that the nature of the control group. It is not stated very clearly but it seems that the baby was left alone with the acupuncturist for 10 minutes. A far better control would have been to have the baby cuddled by its mother, or by a nurse. That’s what was used by Olafsdottir et al (2001) in a study that showed cuddling worked just as well as another form of quackery, chiropractic, to stop babies crying.
Manufactured doubt is a potent weapon of the alternative medicine industry. It’s the same tactic as was used by the tobacco industry. You scrape together a few lousy papers like this one and use them to pretend that there’s a controversy. For years the tobacco industry used this tactic to try to persuade people that cigarettes didn’t give you cancer, and that nicotine wasn’t addictive. The main stream media obligingly invite the representatives of the industry who convey to the reader/listener that there is a controversy, when there isn’t.
Acupuncture is no longer controversial. It just doesn’t work -see Acupuncture is a theatrical placebo: the end of a myth. Try to imagine a pill that had been subjected to well over 3000 trials without anyone producing convincing evidence for a clinically useful effect. It would have been abandoned years ago. But by manufacturing doubt, the acupuncture industry has managed to keep its product in the news. Every paper on the subject ends with the words "more research is needed". No it isn’t.
Acupuncture is pre-scientific idea that was moribund everywhere, even in China, until it was revived by Mao Zedong as part of the appalling Great Proletarian Revolution. Now it is big business in China, and 100 percent of the clinical trials that come from China are positive.
if you believe them, you’ll truly believe anything.
Follow-up
29 January 2017
Soon after the Today programme in which we both appeared, the acupuncturist, Mike Cummings, posted his reaction to the programme. I thought it worth posting the original version in full. Its petulance and abusiveness are quite remarkable.
I thank Cummings for giving publicity to the video of our appearance, and for referring to my Wikipedia page. I leave it to the reader to judge my competence, and his, in the statistics of clinical trials. And it’s odd to be described as a "professional blogger" when the 400+ posts on dcscience.net don’t make a penny -in fact they cost me money. In contrast, he is the salaried medical director of the British Medical Acupuncture Society.
It’s very clear that he has no understanding of the error of the transposed conditional, nor even the mulltiple comparison problem (and neither, it seems, does he know the meaning of the word ‘protagonist’).
I ignored his piece, but several friends complained to the BMJ for allowing such abusive material on their blog site. As a result a few changes were made. The “baying mob” is still there, but the Wikipedia link has gone. I thought that readers might be interested to read the original unexpurgated version. It shows, better than I ever could, the weakness of the arguments of the alternative medicine community. To quote Upton Sinclair:
“It is difficult to get a man to understand something, when his salary depends upon his not understanding it.”
It also shows that the BBC still hasn’t learned the lessons in Steve Jones’ excellent “Review of impartiality and accuracy of the BBC’s coverage of science“. Every time I appear in such a programme, they feel obliged to invite a member of the flat earth society to propagate their make-believe.
Acupuncture for infantile colic – misdirection in the media or over-reaction from a sceptic blogger?26 Jan, 17 | by Dr Mike Cummings So there has been a big response to this paper press released by BMJ on behalf of the journal Acupuncture in Medicine. The response has been influenced by the usual characters – retired professors who are professional bloggers and vocal critics of anything in the realm of complementary medicine. They thrive on oiling up and flexing their EBM muscles for a baying mob of fellow sceptics (see my ‘stereotypical mental image’ here). Their target in this instant is a relatively small trial on acupuncture for infantile colic.[1] Deserving of being press released by virtue of being the largest to date in the field, but by no means because it gave a definitive answer to the question of the efficacy of acupuncture in the condition. We need to wait for an SR where the data from the 4 trials to date can be combined. So what about the research itself? I have already said that the trial was not definitive, but it was not a bad trial. It suffered from under-recruiting, which meant that it was underpowered in terms of the statistical analysis. But it was prospectively registered, had ethical approval and the protocol was published. Primary and secondary outcomes were clearly defined, and the only change from the published protocol was to combine the two acupuncture groups in an attempt to improve the statistical power because of under recruitment. The fact that this decision was made after the trial had begun means that the results would have to be considered speculative. For this reason the editors of Acupuncture in Medicine insisted on alteration of the language in which the conclusions were framed to reflect this level of uncertainty. DC has focussed on multiple statistical testing and p values. These are important considerations, and we could have insisted on more clarity in the paper. P values are a guide and the 0.05 level commonly adopted must be interpreted appropriately in the circumstances. In this paper there are no definitive conclusions, so the p values recorded are there to guide future hypothesis generation and trial design. There were over 50 p values reported in this paper, so by chance alone you must expect some to be below 0.05. If one is to claim statistical significance of an outcome at the 0.05 level, ie a 1:20 likelihood of the event happening by chance alone, you can only perform the test once. If you perform the test twice you must reduce the p value to 0.025 if you want to claim statistical significance of one or other of the tests. So now we must come to the predefined outcomes. They were clearly stated, and the results of these are the only ones relevant to the conclusions of the paper. The primary outcome was the relative reduction in total crying time (TC) at 2 weeks. There were two significance tests at this point for relative TC. For a statistically significant result, the p values would need to be less than or equal to 0.025 – neither was this low, hence my comment on the Radio 4 Today programme that this was technically a negative trial (more correctly ‘not a positive trial’ – it failed to disprove the null hypothesis ie that the samples were drawn from the same population and the acupuncture intervention did not change the population treated). Finally to the secondary outcome – this was the number of infants in each group who continued to fulfil the criteria for colic at the end of each intervention week. There were four tests of significance so we need to divide 0.05 by 4 to maintain the 1:20 chance of a random event ie only draw conclusions regarding statistical significance if any of the tests resulted in a p value at or below 0.0125. Two of the 4 tests were below this figure, so we say that the result is unlikely to have been chance alone in this case. With hindsight it might have been good to include this explanation in the paper itself, but as editors we must constantly balance how much we push authors to adjust their papers, and in this case the editor focussed on reducing the conclusions to being speculative rather than definitive. A significant result in a secondary outcome leads to a speculative conclusion that acupuncture ‘may’ be an effective treatment option… but further research will be needed etc… Now a final word on the 3000 plus acupuncture trials that DC loves to mention. His point is that there is no consistent evidence for acupuncture after over 3000 RCTs, so it clearly doesn’t work. He first quoted this figure in an editorial after discussing the largest, most statistically reliable meta-analysis to date – the Vickers et al IPDM.[2] DC admits that there is a small effect of acupuncture over sham, but follows the standard EBM mantra that it is too small to be clinically meaningful without ever considering the possibility that sham (gentle acupuncture plus context of acupuncture) can have clinically relevant effects when compared with conventional treatments. Perhaps now the best example of this is a network meta-analysis (NMA) using individual patient data (IPD), which clearly demonstrates benefits of sham acupuncture over usual care (a variety of best standard or usual care) in terms of health-related quality of life (HRQoL).[3] |
30 January 2017
I got an email from the BMJ asking me to take part in a BMJ Head-to-Head debate about acupuncture. I did one of these before, in 2007, but it generated more heat than light (the only good thing to come out of it was the joke about leprechauns). So here is my polite refusal.
Hello Thanks for the invitation, Perhaps you should read the piece that I wrote after the Today programme Why don’t you do these Head to Heads about genuine controversies? To do them about homeopathy or acupuncture is to fall for the “manufactured doubt” stratagem that was used so effectively by the tobacco industry to promote smoking. It’s the favourite tool of snake oil salesman too, and th BMJ should see that and not fall for their tricks. Such pieces night be good clickbait, but they are bad medicine and bad ethics. All the best David |
The last email of Stephan Grimm has had more views than any other on this blog. “Publish and perish at Imperial College London: the death of Stefan Grimm“. Since then it’s been viewed more than 210,000 times. The day after it was posted, the server failed under the load.
Since than, I posted two follow-up pieces. On December 23, 2014 “Some experiences of life at Imperial College London. An external inquiry is needed after the death of Stefan Grimm“. Of course there was no external inquiry.
And on April 9, 2015, after the coroner’s report, and after Imperial’s internal inquiry, “The death of Stefan Grimm was “needless”. And Imperial has done nothing to prevent it happening again“.
On September 24th 2015, I posted a memorial on the first anniversary of his death. It included some of Grimm’s drawings that his mother and sister sent to me.
That tragedy led to two actions by Imperial, the metrics report (2015) and the bullying report (2016).
Let’s look at the outcomes.
The 2015 metrics report
In February 2015 and investigation was set up into the use of metrics to evaluate people, In December 2015 a report was produced: Application and Consistency of Approach in the Use of Performance Metrics. This was an internal enquiry so one didn’t expect very much from it. Out of 1338 academic staff surveyed at the College, 309 (23% of the total) responded
another 217 started the survey but did not submit anything). One can only speculate about the low return. It could be that 87% of staff were happy, or it could be that 87% of staff were frightened to give their opinions. It’s true that some departments use few if any metrics to assess people so one wouldn’t expect strong responses from them.
My position is clear: metrics don’t measure the quality of science, in fact they corrupt science.
This is not Imperial’s view though. The report says:
5.1 In seeking to form a view on performance metrics, we started from the premise that, whatever their benefits or deficiencies, performance metrics pervade UK universities. From REF to NSS via the THE and their attendant league tables, universities are measured and ranked in many dimensions and any view of performance metrics has to be formed in this context.
In other words, they simply acquiesce in the use of measures that demonstrably don’t do what’s claimed for them.
Furthermore the statement that “performance metrics pervade UK universities” is not entirely true. At UCL we were told in 2015.
“We will evaluate the quality of staff contributions appropriately, focusing on the quality of individual research outputs and their impact rather than quantity or journal-level metrics.” .
And one of the comments quoted in Imperial’s report says
“All my colleagues at MIT and Harvard etc tell me they reject metrics because they lead to mediocre candidates. If Imperial really wants to be a leader, it has to be bold enough to judge based on quality.”
It is rather shameful that only five UK universities (out of 114 or so) have signed the San Francisco Declaration on Research Assessment (DORA). I’m very happy that UCL is one of them, along with Sussex and Manchester, Birmingham and Liverpool. Imperial has not signed.
Imperial’s report concludes
“each department should develop profiles of its academic staff based on a series of published (ie open and transparent [perhaps on the College intranet]:”
There seems to be a word missing here. Presumably this means “open and transparent metrics“.
The gist of the report seems to be that departments can carry on doing what they want, as long as they say what it is. That’s not good enough, in my opinion.
A review of Imperial College’s institutional culture and its impact on gender equality
Unlike the metrics report, this one was external: that’s good. But, unlike the metrics report, it is secret: that’s bad.
The report was written by Alison Phipps (Director of Gender Studies and Reader in Sociology University of Sussex). But all that’s been released is an 11 page summary, written by Imperial, not by the authors of the report. When I asked Phipps for a copy of the whole report I was told
“Unfortunately we cannot share the full report – this is an internal document to Imperial, and we have to protect our research participants who told us their stories on this basis.”
It’s not surprising that the people who told their stories are afraid of repercussions. But it’s odd that their stories are concealed from everyone but the people who are in a position to punish them.
The report seems to have been commissioned because of this incident.
“The university apologised to the women’s rugby team after they were left playing to an empty stadium when the coaches ferrying spectators back to campus were allowed to leave early.”
“a member of staff was overheard saying that they did not care “how those fat girls” got home,”
But the report wasn’t restricted to sexism. It covered the whole culture at Imperial. One problem was that only 127 staff
and 85 students participated. There is no way to tell whether those who didn’t respond were happy or whether they were scared.
Here are some quotations from Imperial’s own summary of the secret report.
“For most, the meaning was restricted to excellence in research despite the fact that the College’s publicised mission statement gives equal prominence to research and education in the excellence context”
“Participants saw research excellence in metricised terms, positioning the College as a top-level player within the UK and in the world.”
Words used by those critical of Imperial’s culture included ” ‘cutthroat’, ‘intimidating’, ‘blaming’ and ‘arrogant’ “.
“Many participants in the survey and other methods felt that the external focus on excellence had emphasised internal competition rather than collaboration. This competition was noted as often being individualistic and adversarial. ”
“It was felt that there was an all-consuming focus on academic performance, and negative attitudes towards those who did not do well or who were not as driven as others. There was a reported lack of community spirit in the College’s culture including departments being ‘played off against each other’”
“The research findings noted comments that the lack of communal space on the campus had contributed to a lack of a community spirit. It was suggested that the College had ‘an impersonal culture’ and groups could therefore self-segregate in the absence of mechanisms for them to connect. ”
“There were many examples given to the researchers of bullying and discriminatory behaviour towards staff and students. These examples predominantly reflected hierarchies in work or study arrangements. ”
“The researchers reported that many of the participants linked it with the ‘elite’ white masculinity of the majority population, although a few examples of unacceptable behaviour by female staff and students were also cited. Examples of misogynistic and homophobic conduct were given and one interviewee expressed concern that the ‘ingrained misogyny’ at Imperial was so deep that it had become normal.”
“Although the College describes itself as a supportive environment, and many positive examples of that support were cited, a number of participants felt that senior management would turn a blind eye to poor behaviour if the individual involved was of value to the College.”
“Despite Imperial’s ‘no tolerance’ stance on harassment and bullying and initiatives such as ‘Have Your Say’, the researchers heard that people did not ‘speak up’ about many issues, ranging from discrimination and abuse to more subtle practices that leave people feeling vulnerable, unheard or undermined.”
“Relations between PIs and contract researchers were especially difficult, and often gendered as the PI was very often a man and the researcher a woman.”
“It was reported that there was also a clear sense of staff and students feeling afraid to speak up about issues and not receiving clear information or answers due to unclear institutional processes and one-way communication channels.”
“This representation of Imperial College as machine rather than organism resonated with observations on a culture of fear and silence, and the lack of empathy and community spirit at the College.”
“Some of the participants identified a surface commitment to diversity and representation but a lack of substantive system processes to support this. The obstacles to participation in the way of doing things at Imperial, and the associated issues of fear and insecurity, were reported as leading to feelings of hopelessness, demotivation, and low morale among some staff and students.”
“Some participants felt that Athena SWAN had merely scratched the surface of issues or had just provided a veneer which concealed continuing inequalities and that events such as the annual Athena SWAN lecture were little more than a ‘box ticking exercise.’”
The conclusions are pretty weak: e.g.
“They [the report’s authors] urged the College to implement changes that would ensure that its excellence in research is matched by excellence in other areas.”
Of course, Imperial College says that it will fix the problems. “Imperial’s provost, James Stirling, said that the institution must do better and was committed to gender equality”.
But that is exactly what they said in 2003
“The rector [then Richard Sykes] acknowledged the findings that came out of the staff audit – Imperial College – A Good Place to Work? – undertaken in August 2002.”
“He reinforced the message that harassment or bullying would not be tolerated in the College, and promised commitment from Council members and the Executive Committee for their continuing support to equal opportunities.”
This was eleven years before the pressure applied to Stefan Grimm caused him to take his own life. As always, it sounds good. But it seems that, thirteen years later, Imperial is going through exactly the same exercise.
It would be interesting to know whether Imperial’s Department of Medicine is still adopting the same cruel assessment methods as it was in 2007. Other departments at Imperial have never used such methods. It’s a continual source of bafflement to me that medicine, the caring profession, seems to care less for its employees that most other departments.
Other universities
Imperial is certainly not unique in having these problems. They are endemic. For example, Queen Mary, Kings College London and Warwick University have had similar problems, among many others.
Managers must learn that organisations function better when employees have good morale and are happy to work. Once again, I quote Scott Burkun (The myths of Innovation, 2007).
“Creation is sloppy; discovery is messy; exploration is dangerous. What’s a manager to do? The answer in general is to encourage curiosity and accept failure. Lots of failure.”
All big organisations are much the same -dissent is squashed and punished. Committees are set up. Fine-sounding statements are issued. But nothing much changes.
It should not be so.
Follow-up
The "supplement" industry is a scam that dwarfs all other forms of alternative medicine. Sales are worth over $100 billion a year, a staggering sum. But the claims they make are largely untrue: plain fraudulent. Although the industry’s advertisements like to claim "naturalness". in fact most of the synthetic vitamins are manufactured by big pharma companies. The pharmaceutical industry has not been slow to cash in on an industry in which unverified claims can be made with impunity.
When I saw advertised Hotshot, "a proprietary formulation of organic ingredients" that is alleged to cure or prevent muscle cramps, I would have assumed that it was just another scam. Then I saw that the people behind it were very highly-regarded scientists, Rod MacKinnon and Bruce Bean, both of whom I have met.
The Hotshot’s website gives this background.
"For Dr. Rod MacKinnon, a Nobel Prize-winning neuroscientist/endurance athlete, the invention of HOTSHOT was personal.
After surviving life threatening muscle cramps while deep sea kayaking off the coast of Cape Cod, he discovered that existing cramp remedies – that target the muscle – didn’t work. Calling upon his Nobel Prize-winning expertise on ion channels, Rod reasoned that preventing and treating cramps began with focusing on the nerve, not the muscle.
Five years of scientific research later, Rod has perfected HOTSHOT, the kick-ass, proprietary formulation of organic ingredients, powerful enough to stop muscle cramps where they start. At the nerve.
Today, Rod’s genius solution has created a new category in sports nutrition: Neuro Muscular Performance (NMP). It’s how an athlete’s nerves and muscles work together in an optimal way. HOTSHOT boosts your NMP to stop muscle cramps. So you can push harder, train longer and finish stronger."
For a start, it’s pretty obvious that MacKinnon has not spent the last five years developing a cure for cramp. His publications don’t even mention the topic. Neither do Bruce Bean’s.
I’d like to thank Bruce Bean for answering some questions I put to him. He said it’s "designed to be as strong as possible in activating TRPV1 and TRPA1 channels". After some hunting I found that it contains
Filtered Water, Organic Cane Sugar, Organic Gum Arabic, Organic Lime Juice Concentrate, Pectin, Sea Salt, Natural Flavor, Organic Stevia Extract, Organic Cinnamon, Organic Ginger, Organic Capsaicin
The first ingredient is sugar: "the 1.7oz shot contains enough sugar to make a can of Coke blush with 5.9 grams per ounce vs. 3.3 per ounce of Coke".[ref].
The TRP (transient receptor potential) receptors form a family of 28 related ion channels,Their physiology is far from being well understood, but they are thought to be important for mediating taste and pain, The TRPV1 channel is also known as the receptor for capsaicin (found in chilli peppers). The TRPA1 responds to the active principle in Wasabi.
I’m quite happy to believe that most cramp is caused by unsychronised activity of motor nerves causing muscle fibres to contract in an uncordinated way (though it isn’t really known that this is the usual mechanism, or what triggers it in the first place), The problem is that there is no good reason at all to think that stimulating TRP receptors in the gastro-intestinal tract will stop, within a minute or so, the activity of motor nerves in the spinal cord.
But, as always, there is no point in discussing mechanisms until we are sure that there is a phenomenon to be explained. What is the actual evidence that Hotshot either prevents of cures cramps, as claimed? The Hotshot’s web site has pages about Our Science, Its title is The Truth about Muscle Cramps. That’s not a good start because it’s well known that nobody understands cramp.
So follow the link to See our Scientific Studies. It has three references, two are to unpublished work. The third is not about Hotshot, but about pickle juice. This was also the only reference sent to me by Bruce Bean. Its title is ‘Reflex Inhibition of Electrically Induced
Muscle Cramps in Hypohydrated Humans’, Miller et al,, 2010 [Download pdf]. Since it’s the only published work, it’s worth looking at in detail.
Miller et al., is not about exercise-induced cramp, but about a cramp-like condition that can be induced by electrical stimulation of a muscle in the sole of the foot (flexor hallucis brevis). The intention of the paper was to investigate anecdotal reports that pickle juice and prevent or stop cramps. It was a small study (only 10 subjects). After getting the subjects dehydrated, they cramp was induced electrically, and two seconds after it started, they drank either pickle juice or distilled water. They weren’t asked about pain: the extent of cramp was judged by electromyograph records. At least a week later, the test was repeated with the other drink (the order in which they were given was randomised). So it was a crossover design.
There was no detectable difference between water and pickle juice on the intensity of the cramp. But the duration of the cramp was said to be shorter. The mean duration after water was 133.7 ± 15.9 s and the mean duration after pickle juice was 84.6 ± 18.5 s. A t test gives P = 0.075. However each subject had both treatments and the mean reduction in duration was 49.1 ± 14.6 s and a paired t test gives P = 0.008. This is close to the 3-standard-deviation difference which I recommended as a minimal criterion, so what could possibly go wrong?.
The result certainly suggests that pickle juice might reduce the duration of cramps, but it’s far from conclusive, for the following reasons. First, it must have been very obvious indeed to the subjects whether they were drinking water or pickle juice. Secondly, paired t tests are not the right way to analyse crossover experiments, as explained here, Unfortunately the 10 differences are not given so there is no way to judge the consistency of the responses. Thirdly, two outcomes were measured (intensity and duration), and no correction was made for multiple comparisons. Finally, P = 0.008 is convincing evidence only if you assume that there’s a roughly 50:50 chance of the pickle-juice folk-lore being right before the experiment was started. For most folk remedies, that would be a pretty implausible assumption. The vast majority of folk remedies turn out to be useless when tested properly.
Nevertheless, the results are sufficiently suggestive that it might be worth testing Hotshot properly. One might have expected that would have been done before marketing started, It wasn’t.
Bruce Bean tells me that they tried it on friends who said that it worked. Perhaps that’s not so surprising: there can be no condition more susceptible than muscle cramps to self-deception because of regression to the mean
They found a business partner, Flex Pharma, and Mackinnon set up a company. Let’s see how they are doing.
Flex Pharma
The hyperbole in the advertisements for Hotshots is entirely legal in the USA. The infamous 1994 “Dietary Supplement Health and Education Act (DSHEA)” allows almost any claim to be made for herbs etc as long as they are described as a "dietary supplement". All they have to do is add in the small print:
"These statements have not been evaluated by the Food and Drug Administration. This product is not intended to diagnose, treat, cure or prevent any disease".
Of course medical claims are made: it’s sold to prevent and treat muscle cramp (and I can’t even find the weasel words on the web site).
As well as Hotshot, Flex Pharma are also testing a drug, FLX-787, a TRP receptor agonist of undisclosed structure. It is hoping get FDA approval for treatment of nocturnal leg cramps (NLCs) and treatment of spasticity in multiple sclerosis (MS) and amyotrophic lateral sclerosis (ALS) patients. It would be great if it works, but we still don’t know whether it does,
The financial press doesn’t seem to be very optimistic. When Flex Pharma was launched on the stock market at the beginning of 2015, its initial public offering, raised $$86.4 million, at $16 per share. The biotech boom of the previous few years was still strong. In 2016, the outlook seems less rosy. The investment advice site Seeking Alpha had a scathing evaluation in June 2016. Its title was "Flex Pharma: What A Load Of Cramp". It has some remarkably astute assessments of the pharmacology, as well as of financial risks. The summary reads thus:
- We estimate FLKS will burn at least 40 million of its $84 million in cash this year on clinical trials for FLX-787 and marketing spend for its new cramp supplement called “HOTSHOT.”
- Based on its high cash burn, we expect a large, dilutive equity raise is likely over the next 12 months.
- We believe the company’s recent study on nocturnal leg cramps (NLCs) may be flawed. We also highlight risks to its lead drug candidate, FLX-787, that we believe investors are currently overlooking.
- We highlight several competitive available alternatives to FLKS’s cramp products that we believe investors have not factored into current valuation.
- Only 2.82% of drugs from companies co-founded by CEO Westphal have achieved FDA approval.
The last bullet point refers to Flex Pharma’s CEO, Christoph Westphal MD PhD (described bi Fierce Biotech as "serial biotech entrepreneur"). Only two out of his 71 requests for FDA approval were successful.
On October 13th 2016 it was reported that early trials of FLX-787 had been disappointing. The shares plunged.
On October 17th 2016, Seeking Alpha posted another evaluation: “Flex Pharma Has Another Cramp“. Also StreetInsider,com. They were not optimistic. The former made the point (see above) that crossover trials are not what should be done. In fact the FDA have required that regular parallel RCTs should be done before FLX-787 can be approved.
Summary
Drug discovery is hard and it’s expensive. The record for small molecule discovery has not been good in the last few decades. Many new introductions have, at best, marginal efficacy, and at worst may do more harm than good. In the conditions for which understanding of causes is poor or non-existent, it’s impossible to design new drugs rationally. There are only too many such conditions: from low back pain to almost anything that involves the brain, knowledge of causes is fragmentary to non-existent. This leads guidance bodies to clutch at straws. Disappointing as this is, it’s not for want of trying. And it’s not surprising. Serious medical research hasn’t been going for long and the systems are very complicated.
But this is no excuse for pretending that things work on tha basis of the flimsiest of evidence, Bruce Bean advised me to try Hotshot on friends, and says that it doesn’t work for everybody. This is precisely what one is told by homeopaths, and just about every other sort of quack. Time and time again, that sort of evidence has proved to be misleading,
I have the greatest respect for the science that’s published by both Bruce Bean and Rod MacKinnon. I guess that they aren’t familiar with the sort of evidence that’s required to show that a new treatment works. That isn’t solved by describing a treament as a "dietary supplement".
I’ll confess that I’m a bit disappointed by their involvement with Flex Pharma, a company that makes totally unjustified claims. Or should one just say caveat emptor?
Follow-up
Before posting this, I sent it to Bruce Bean to be checked. Here was his response, which I’m posting in full (hoping not to lose a friend).
"Want to be UK representative for Hotshot? Sample on the way!"
"I do not see anything wrong with the facts. I have a different opinion – that it is perfectly appropriate to have different standards of proof of efficacy for consumer products made from general-recognized-as-safe ingredients and for an FDA-approved drug. I’d be happy for the opportunity to post something like the following your blog entry (and suffer any consequent further abuse) if there is an opportunity".
" I think it would be unfair to lump Hotshot with “dietary supplements” targeted to exploit the hopes of people with serious diseases who are desperate for magic cures. Hotshot is designed and marketed to athletes who experience exercise-induced cramping that can inhibit their training or performance – hardly a population of desperate people susceptible of exploitation. It costs only a few dollars for someone to try it. Lots of people use it regularly and find it helpful. I see nothing wrong with this and am glad that something that I personally found helpful is available for others to try. "
" Independently of Hotshot, Flex Pharma is hoping to develop treatments for cramping associated with diseases like ALS, MS, and idiopathic nocturnal leg cramps. These treatments are being tested in rigorous clinical trials that will be reviewed by the FDA. As with any drug development it is very expensive to do the clinical trials and there is no guarantee of success. I give credit to the investors who are underwriting the effort. The trials are openly publicly reported. I would note that Flex Pharma voluntarily reported results of a recent trial for night leg cramps that led to a nearly 50% drop in the stock price. I give the company credit for that openness and for spending a lot of money and a lot of effort to attempt to develop a treatment to help people – if it can pass the appropriately high hurdle of FDA approval."
" On Friday, I sent along 8 bottles of Hotshot by FedEx, correctly labeled for customs as a commercial sample. Of course, I’d be delighted if you would agree to act as UK representative for the product but absent that, it should at least convince you that the TRP stimulators are present at greater than homeopathic doses. If you can find people who get exercise-induced cramping that can’t be stretched out, please share with them."
6 January 2017
It seems that more than one Nobel prizewinner is willing to sell their names to dodgy businesses. The MIT Tech Review tweeted a link to Imagine Albert Einstein getting paid to put his picture on tin of anti-wrinkle cream. No fewer than seven Nobel prizewinners have lent their names to a “supplement” pill that’s claimed to prolong your life. Needless to say, there isn’t the slightest reason to think it works. What posesses these people beats me. Here are their names.
Aaron Ciechanover (Cancer Biology, Technion – Israel Institute of Technology).
Eric Kandel (Neuroscience, Columbia University).
Jack Szostak (Origins of Life & Telomeres, Harvard University).
Martin Karplus (Complex Chemical Systems, Harvard University).
Sir Richard Roberts(Biochemistry, New England Biolabs).
Thomas Südhof (Neuroscience, Stanford University).
Paul Modrich (Biochemistry, Duke University School of Medicine).
Then there’s the Amyway problem. Watch this space.
‘We know little about the effect of diet on health. That’s why so much is written about it’. That is the title of a post in which I advocate the view put by John Ioannidis that remarkably little is known about the health effects if individual nutrients. That ignorance has given rise to a vast industry selling advice that has little evidence to support it.
The 2016 Conference of the so-called "College of Medicine" had the title "Food, the Forgotten Medicine". This post gives some background information about some of the speakers at this event. I’m sorry it appears to be too ad hominem, but the only way to judge the meeting is via the track record of the speakers.
Quite a lot has been written here about the "College of Medicine". It is the direct successor of the Prince of Wales’ late, unlamented, Foundation for Integrated Health. But unlike the latter, its name is disguises its promotion of quackery. Originally it was going to be called the “College of Integrated Health”, but that wasn’t sufficently deceptive so the name was dropped.
For the history of the organisation, see
Don’t be deceived. The new “College of Medicine” is a fraud and delusion
The College of Medicine is in the pocket of Crapita Capita. Is Graeme Catto selling out?
The conference programme (download pdf) is a masterpiece of bait and switch. It is a mixture of very respectable people, and outright quacks. The former are invited to give legitimacy to the latter. The names may not be familiar to those who don’t follow the antics of the magic medicine community, so here is a bit of information about some of them.
The introduction to the meeting was by Michael Dixon and Catherine Zollman, both veterans of the Prince of Wales Foundation, and both devoted enthusiasts for magic medicne. Zollman even believes in the battiest of all forms of magic medicine, homeopathy (download pdf), for which she totally misrepresents the evidence. Zollman works now at the Penny Brohn centre in Bristol. She’s also linked to the "Portland Centre for integrative medicine" which is run by Elizabeth Thompson, another advocate of homeopathy. It came into being after NHS Bristol shut down the Bristol Homeopathic Hospital, on the very good grounds that it doesn’t work.
Now, like most magic medicine it is privatised. The Penny Brohn shop will sell you a wide range of expensive and useless "supplements". For example, Biocare Antioxidant capsules at £37 for 90. Biocare make several unjustified claims for their benefits. Among other unnecessary ingredients, they contain a very small amount of green tea. That’s a favourite of "health food addicts", and it was the subject of a recent paper that contains one of the daftest statistical solecisms I’ve ever encountered
"To protect against type II errors, no corrections were applied for multiple comparisons".
If you don’t understand that, try this paper.
The results are almost certainly false positives, despite the fact that it appeared in Lancet Neurology. It’s yet another example of broken peer review.
It’s been know for decades now that “antioxidant” is no more than a marketing term, There is no evidence of benefit and large doses can be harmful. This obviously doesn’t worry the College of Medicine.
Margaret Rayman was the next speaker. She’s a real nutritionist. Mixing the real with the crackpots is a standard bait and switch tactic.
Eleni Tsiompanou, came next. She runs yet another private "wellness" clinic, which makes all the usual exaggerated claims. She seems to have an obsession with Hippocrates (hint: medicine has moved on since then). Dr Eleni’s Joy Biscuits may or may not taste good, but their health-giving properties are make-believe.
Andrew Weil, from the University of Arizona
gave the keynote address. He’s described as "one of the world’s leading authorities on Nutrition and Health". That description alone is sufficient to show the fantasy land in which the College of Medicine exists. He’s a typical supplement salesman, presumably very rich. There is no excuse for not knowing about him. It was 1988 when Arnold Relman (who was editor of the New England Journal of Medicine) wrote A Trip to Stonesville: Some Notes on Andrew Weil, M.D..
“Like so many of the other gurus of alternative medicine, Weil is not bothered by logical contradictions in his argument, or encumbered by a need to search for objective evidence.”
This blog has mentioned his more recent activities, many times.
Alex Richardson, of Oxford Food and Behaviour Research (a charity, not part of the university) is an enthusiast for omega-3, a favourite of the supplement industry, She has published several papers that show little evidence of effectiveness. That looks entirely honest. On the other hand, their News section contains many links to the notorious supplement industry lobby site, Nutraingredients, one of the least reliable sources of information on the web (I get their newsletter, a constant source of hilarity and raised eyebrows). I find this worrying for someone who claims to be evidence-based. I’m told that her charity is funded largely by the supplement industry (though I can’t find any mention of that on the web site).
Stephen Devries was a new name to me. You can infer what he’s like from the fact that he has been endorsed byt Andrew Weil, and that his address is "Institute for Integrative Cardiology" ("Integrative" is the latest euphemism for quackery). Never trust any talk with a title that contains "The truth about". His was called "The scientific truth about fats and sugars," In a video, he claims that diet has been shown to reduce heart disease by 70%. which gives you a good idea of his ability to assess evidence. But the claim doubtless helps to sell his books.
Prof Tim Spector, of Kings College London, was next. As far as I know he’s a perfectly respectable scientist, albeit one with books to sell, But his talk is now online, and it was a bit like a born-again microbiome enthusiast. He seemed to be too impressed by the PREDIMED study, despite it’s statistical unsoundness, which was pointed out by Ioannidis. Little evidence was presented, though at least he was more sensible than the audience about the uselessness of multivitamin tablets.
Simon Mills talked on “Herbs and spices. Using Mother Nature’s pharmacy to maintain health and cure illness”. He’s a herbalist who has featured here many times. I can recommend especially his video about Hot and Cold herbs as a superb example of fantasy science.
Annie Anderson, is Professor of Public Health Nutrition and
Founder of the Scottish Cancer Prevention Network. She’s a respectable nutritionist and public health person, albeit with their customary disregard of problems of causality.
Patrick Holden is chair of the Sustainable Food Trust. He promotes "organic farming". Much though I dislike the cruelty of factory farms, the "organic" industry is largely a way of making food more expensive with no health benefits.
The Michael Pittilo 2016 Student Essay Prize was awarded after lunch. Pittilo has featured frequently on this blog as a result of his execrable promotion of quackery -see, in particular, A very bad report: gamma minus for the vice-chancellor.
Nutritional advice for patients with cancer. This discussion involved three people.
Professor Robert Thomas, Consultant Oncologist, Addenbrookes and Bedford Hospitals, Dr Clare Shaw, Consultant Dietitian, Royal Marsden Hospital and Dr Catherine Zollman, GP and Clinical Lead, Penny Brohn UK.
Robert Thomas came to my attention when I noticed that he, as a regular cancer consultant had spoken at a meeting of the quack charity, “YestoLife”. When I saw he was scheduled tp speak at another quack conference. After I’d written to him to point out the track records of some of the people at the meeting, he withdrew from one of them. See The exploitation of cancer patients is wicked. Carrot juice for lunch, then die destitute. The influence seems to have been temporary though. He continues to lend respectability to many dodgy meetings. He edits the Cancernet web site. This site lends credence to bizarre treatments like homeopathy and crystal healing. It used to sell hair mineral analysis, a well-known phony diagnostic method the main purpose of which is to sell you expensive “supplements”. They still sell the “Cancer Risk Nutritional Profile”. for £295.00, despite the fact that it provides no proven benefits.
Robert Thomas designed a food "supplement", Pomi-T: capsules that contain Pomegranate, Green tea, Broccoli and Curcumin. Oddly, he seems still to subscribe to the antioxidant myth. Even the supplement industry admits that that’s a lost cause, but that doesn’t stop its use in marketing. The one randomised trial of these pills for prostate cancer was inconclusive. Prostate Cancer UK says "We would not encourage any man with prostate cancer to start taking Pomi-T food supplements on the basis of this research". Nevertheless it’s promoted on Cancernet.co.uk and widely sold. The Pomi-T site boasts about the (inconclusive) trial, but says "Pomi-T® is not a medicinal product".
There was a cookery demonstration by Dale Pinnock "The medicinal chef" The programme does not tell us whether he made is signature dish "the Famous Flu Fighting Soup". Needless to say, there isn’t the slightest reason to believe that his soup has the slightest effect on flu.
In summary, the whole meeting was devoted to exaggerating vastly the effect of particular foods. It also acted as advertising for people with something to sell. Much of it was outright quackery, with a leavening of more respectable people, a standard part of the bait-and-switch methods used by all quacks in their attempts to make themselves sound respectable. I find it impossible to tell how much the participants actually believe what they say, and how much it’s a simple commercial drive.
The thing that really worries me is why someone like Phil Hammond supports this sort of thing by chairing their meetings (as he did for the "College of Medicine’s" direct predecessor, the Prince’s Foundation for Integrated Health. His defence of the NHS has made him something of a hero to me. He assured me that he’d asked people to stick to evidence. In that he clearly failed. I guess they must pay well.