# DC's Improbable Science

## Some more pharmacological history: the legend of the Brocken and the statistics of purity in heart

#### August 14th, 2014 · 4 Comments

This post follows directly from "Some pharmacological history: an exam from 1959". In that post, I related how two of my teachers in Leeds, James Dare and George Mogey, had encouraged my interest in statistcs. George Mogey had worked previously at the famous Wellcome Research Labs in Beckenham, Kent. He had been there at the same time as J.W. Trevan, who pioneered accurate methods of biological assay.

Another person who overlapped with Mogey and Trevan at Beckenham was C.L. Oakley. I’m told by Audrey Mogey, George’s widow, that they were good friends of the Oakleys and that probably explains why George Mogey introduced me to Cyril Oakley, who had the chair of bacteriology at Leeds while I was an undergraduate there. Oakley’s Biographical Memoir makes no mention of statistics. The only person I’ve located who knew him is Keith Holland (professor of microbiology at Leeds). He told me

"I was trained by CLO between !961-65 and he inspired me to remain in research into aspects of anaerobic bacteriology and I attended his lectures on statistics, which were highly stimulating and humorous. He frequently used examples of magicians turning lead into gold and I can not recall examples of goats and men."

The statistical connection stems from an article that was written by Oakley in 1943, Oakley, C. L. (1943). "He-goats into young men: first steps in statistics", University College Hospital Magazine Vol 28, 16-21. Now you can download a copy of this rather obscure publication.

The action occurs on the Brocken. The paper starts by citing the Illustrated London News (the internet of its age). In 1932 an experiment was done which allegedly dispelled the legend of the Brocken. Here it is.

Oakley uses the Brocken experiment to explain the statistical method known as probit analysis. This was obviously something he’d learned from J.W. Trevan during his time at the Beckenham lab (e.g his classic 1927 paper, The Error of Determination of Toxicity) . And it was my meeting with Oakley, as an undergraduate, that caused me to use his paper as the basis of a section in Lectures on Biostatistics.

It also explains why, ever since the late 50s, I’ve wanted to visit the Brocken. It’s only about 100 km from Göttingen, where I worked often between 1980 and 1985, but at that time the Brocken was in East Germany. I remember looking across the wall at the Harz mountains, when Erwin Neher took us into the country to pick wild bilberries (blueberries, Heidelbeeren). Reunification of Germany occurred while I was working in Heidelberg in 1991 but it was not until a month ago that I got there. We took a rail tour of Germany, and spent four days in the Harz town of Wernigerode, from where we took the Harzer Schmalspurbahn, the steam powered narrow gauge railway, to the Brocken. Here are some pictures of the trip (click first picture for an album)..

All I got was a teapot stand, with a witch on a broomstick (and I don’t even drink the stuff myself).

Interestingly, although there is plenty of tourist tat about the connection with Dr Faust and Goethe, I didn’t find any German who’d heard of the he-goat conversion legend. One of the people involved in the experiment, Harry Price (1881 – 1948) of the National Laboratory of Psychical Research seems to have been behind it, and the history is described by him in the "Bloksberg Tryst" (Blocksberg is another name for the Brocken). Another person who conducted the experiment was Professor Joad (1891 – 1953). I can just remember hearing him on the BBC Home Service (radio) programme, the Brains Trust, which also featured Julian Huxley and Jacob Bronowski (1908 -1974). They were the public intellectuals of the early 1950s. (Much later, I discovered that Bronowski was the father of Lisa Jardine, who now works at UCL).

Oakley (1943) starts by citing the account in the Illustrated London News.

"The legend of the Brocken (the famous peak in the Harz Mountains, noted for its spectre and as the haunt of witches on Walpurgis Night), according to which a virgin he-goat can be converted into "a youth of surpassing beauty” by spells performed in a magic circle at midnight, was tested on June 17 by British and German scientists and investigators, including Professor Joad and Mr. Harry Price, of the National Laboratory of Psychical Research. The object was to expose the fallacy of Black Magic and also to pay a tribute to Goethe, who used the legend in Faust. Some wore evening dress. The goat was anointed with the prescribed compound of scrapings from church bells, bats’ blood, soot and honey. The necessary maiden pure in heart, who removed the white sheet from the goat at the critical moment, was Fräulein Urta Bohn, daughter of one of the German professors taking part in the test. Her mother was a Scotswoman (formerly Miss Gordon). The scene was flood-lit and filmed. As our photographs show, the goat remained a goat and the legend of the Brocken was dispelled".

Oakley then proposes a biological assay to measure purity in heart.

"It will he observed that the only incompletely controllable variables in the experiment (excluding Iocal variations in the church bells, bat’s blood, soot and honey) are the virgin he-goat and the maiden (virgin?) pure in heart. Virginity may for the present be regarded as an absolute character —purity in heart no doubt varies from person to person.. If, therefore, a reasonably uniform supply of virgin he-goats be obtained, and the percentage of he-goats converted bears
any relation to the purity in heart of the maiden used, we ought appear “>to
be
able to measure the degree of purity in heart of the virgins available."

The argument he uses is based directly on J.W. Trevan. The story reappeared in Chapter 7 (section 7.8, page 111) of Lectures on Biostatistics, where I used it to illustrate confidence intervals for a binomial proportion.

"We shall assume, as Oakley did, that the conversion of he-goats into young men is an all-or-nothing process; either complete conversion or nothing occurs. Oakley supposed, on this basis, that a comparison could be made between, on one hand, the percentage of he-goats converted by maidens of various degrees of purity in heart, and, on the other hand, the sort of pharmacological experiment that involves the measurement of the percentage of individuals showing a specified
effect in response to various doses of a drug. In conformity with the common pharmacological practice he supposed that a plot of percentage he-goat conversion against log purity in heart index (log PHI) would have the sigmoid form shown in Fig. 14.2.4. As explained in Chapter 14, this implies that log PHI required to convert individual he-goats is a normally distributed variable. Furthermore it means that infinite purity in heart is required to produce a population he-goat
conversion rate (HGCR) of 100 per cent..

Although there is a lack of experimental evidence on this point, the present author feels that the assumption of a normal distribution is, as so often happens, without foundation (see § 4.2). The implication of the normality assumption, that there exist he-goats so resistant to conversion that infinite purity in heart is needed to affect them, has
not been (and cannot be) experimentally verified. Furthermore the very idea of infinite purity in heart seems likely to cause despondency in most people, and should therefore be avoided until such time as its necessity may be demonstrated experimentally."

In the light of these remarks it appears to the present author desirable that the purity in heart index should be redefined simply as the population percentage of he-goats converted. This simple operational definition means that the PHI of all maidens will fall between 0 and 100, and confidence limits for the true PHI can be found easily from the observed conversion rate (which should be binomially distributed, see §§ 3.2-3.5) using Table A2, as explained in §7.7.

For example, if it were observed that a particular maiden caused conversion of r = 2 out of n = 4 he-goats, the estimated PHI would be 100 × 2/4 = 50 per cent, and, from Table A2, confidence limits (P = 0·95) for true PHI are 6.8 – 93.2 per cent. Clearly the information be gained from a sample of only four he-goats is so imprecise that it difficult to conceive what use it could be put to. Oakley recommended that for preliminary experiments at least n = 10 he-goats should be used. If r = 5 (50 per cent) of these were observed to be converted Table A2 would give the confidence limits (P = 0·95) for the true PHI as 18·7 — 81·3 per cent. While the most extreme forms of vice and of virtue appear to be ruled out by this result, there is still considerable uncertainty about the PHI. If a greater degree of confidence were required, as for example, if a potential husband demanded a certain minimum (or, alternatively, a certain maximum) PHI before committing himself, the P = 0.99 confidence limits could found from Table A2. They are 12.8 — 87.2 per cent. The most tolerant suitor might be forgiven for requiring a larger sample."

The statistics are pretty standard stuff. You can find out more by downloading Lectures on Biostatistics. The binomial distribution in Chapters 3, 7 and 8. Probit analysis is described in Chapter 14.

For some real statistics, please look at “An investigation of the false discovery rate and the misinterpretation of P values“, now available as a preprint on arXiv.

## What is meant by the "accuracy" of screening tests?

#### July 14th, 2014 · 2 Comments

The two posts on this blog about the hazards of s=ignificance testing have proved quite popular. See Part 1: the screening problem, and Part 2: Part 2: the false discovery rate. They’ve had over 20,000 hits already (though I still have to find a journal that will print the paper based on them).

Yet another Alzheiner’s screening story hit the headlines recently and the facts got sorted out in the follow up section of the screening post. If you haven’t read that already, it might be helpful to do so before going on to this post.

This post has already appeared on the Sense about Science web site. They asked me to explain exactly what was meant by the claim that the screening test had an "accuracy of 87%". That was mentioned in all the media reports, no doubt because it was the only specification of the quality of the test in the press release. Here is my attempt to explain what it means.

### The "accuracy" of screening tests

Anything about Alzheimer’s disease is front line news in the media. No doubt that had not escaped the notice of Kings College London when they issued a press release about a recent study of a test for development of dementia based on blood tests. It was widely hailed in the media as a breakthrough in dementia research. For example, the BBC report is far from accurate). The main reason for the inaccurate reports is, as so often, the press release. It said

"They identified a combination of 10 proteins capable of predicting whether individuals with MCI would develop Alzheimer’s disease within a year, with an accuracy of 87 percent"

The original paper says

"Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85% and specificity 88%)."

What matters to the patient is the probability that, if they come out positive when tested, they will actually get dementia. The Guardian quoted Dr James Pickett, head of research at the Alzheimer’s Society, as saying

"These 10 proteins can predict conversion to dementia with less than 90% accuracy, meaning one in 10 people would get an incorrect result."

That statement simply isn’t right (or, at least, it’s very misleading). The proper way to work out the relevant number has been explained in many places -I did it recently on my blog.

The easiest way to work it out is to make a tree diagram. The diagram is like that previously discussed here, but with a sensitivity of 85% and a specificity of 88%, as specified in the paper.

In order to work out the number we need, we have to specify the true prevalence of people who will develop dementia, in the population being tested. In the tree diagram, this has been taken as 10%. The diagram shows that, out of 1000 people tested, there are 85 + 108 = 193 with a positive test result. Out ot this 193, rather more than half (108) are false positives, so if you test positive there is a 56% chance that it’s a false alarm (108/193 = 0.56). A false discovery rate of 56% is far too high for a good test.

This figure of 56% seems to be the basis for a rather good post by NHS Choices with the title “Blood test for Alzheimer’s ‘no better than coin toss’

If the prevalence were taken as 5% (a value that’s been given for the over-60 age group) that fraction of false alarms would rise to a disastrous 73%.

How are these numbers related to the claim that the test is "87% accurate"? That claim was parroted in most of the media reports, and it is why Dr Pickett said "one in 10 people would get an incorrect result".

The paper itself didn’t define "accuracy" anywhere, and I wasn’t familiar with the term in this context (though Stephen Senn pointed out that it is mentioned briefly in the Wiikipedia entry for Sensitivity and Specificity). The senior author confirmed that "accuracy" means the total fraction of tests, positive or negative, that give the right result. We see from the tree diagram that, out of 1000 tests, there are 85 correct positive tests and 792 correct negative tests, so the accuracy (with a prevalence of 0.1) is (85 + 792)/1000 = 88%, close to the value that’s cited in the paper.

Accuracy, defined in this way, seems to me not to be a useful measure at all. It conflates positive and negative results and they need to be kept separate to understand the problem. Inspection of the tree diagram shows that it can be expressed algebraically as

accuracy = (sensitivity × prevalence) + (specificity × (1 − prevalence))

It is therefore merely a weighted mean of sensitivity and specificity (weighted by the prevalence). With the numbers in this case, it varies from 0.88 (when prevalence = 0) to 0.85 (when prevalence = 1). Thus it will inevitably give a much more flattering view of the test than the false discovery rate.

No doubt, it is too much to expect that a hard-pressed journalist would have time to figure this out, though it isn’t clear that they wouldn’t have time to contact someone who understands it. But it is clear that it should have been explained in the press release. It wasn’t.

In fact, reading the paper shows that the test was not being proposed as a screening test for dementia at all. It was proposed as a way to select patients for entry into clinical trials. The population that was being tested was very different from the general population of old people, being patients who come to memory clinics in trials centres (the potential trials population)

How best to select patients for entry into clinical trials is a matter of great interest to people who are running trials. It is of very little interest to the public. So all this confusion could have been avoided if Kings had refrained from issuing a press release at all, for a paper like this.

I guess universities think that PR is more important than accuracy.

That’s a bad mistake in an age when pretentions get quickly punctured on the web.

This post first appeared on the Sense about Science web site.

## Should metrics be used to assess research performance? A submission to HEFCE

#### June 18th, 2014 · 10 Comments

The Higher Education Funding Council England (HEFCE) gives money to universities. The allocation that a university gets depends strongly on the periodical assessments of the quality of their research. Enormous amounts if time, energy and money go into preparing submissions for these assessments, and the assessment procedure distorts the behaviour of universities in ways that are undesirable. In the last assessment, four papers were submitted by each principal investigator, and the papers were read.

In an effort to reduce the cost of the operation, HEFCE has been asked to reconsider the use of metrics to measure the performance of academics. The committee that is doing this job has asked for submissions from any interested person, by June 20th.

This post is a draft for my submission. I’m publishing it here for comments before producing a final version for submission.

### Draft submission to HEFCE concerning the use of metrics.

I’ll consider a number of different metrics that have been proposed for the assessment of the quality of an academic’s work.

Impact factors

The first thing to note is that HEFCE is one of the original signatories of DORA (http://am.ascb.org/dora/ ).  The first recommendation of that document is

:"Do not use journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual scientist’s contributions, or in hiring, promotion, or funding decisions"

.Impact factors have been found, time after time, to be utterly inadequate as a way of assessing individuals, e.g. [1], [2].  Even their inventor, Eugene Garfield, says that. There should be no need to rehearse yet again the details. If HEFCE were to allow their use, they would have to withdraw from the DORA agreement, and I presume they would not wish to do this.

Article citations

Citation counting has several problems.  Most of them apply equally to the H-index.

1. Citations may be high because a paper is good and useful.  They equally may be high because the paper is bad.  No commercial supplier makes any distinction between these possibilities.  It would not be in their commercial interests to spend time on that, but it’s critical for the person who is being judged.  For example, Andrew Wakefield’s notorious 1998 paper, which gave a huge boost to the anti-vaccine movement had had 758 citations by 2012 (it was subsequently shown to be fraudulent).
2. Citations take far too long to appear to be a useful way to judge recent work, as is needed for judging grant applications or promotions.  This is especially damaging to young researchers, and to people (particularly women) who have taken a career break. The counts also don’t take into account citation half-life. A paper that’s still being cited 20 years after it was written clearly had influence, but that takes 20 years to discover,
3. The citation rate is very field-dependent.  Very mathematical papers are much less likely to be cited, especially by biologists, than more qualitative papers.  For example, the solution of the missed event problem in single ion channel analysis [3,4] was the sine qua non for all our subsequent experimental work, but the two papers have only about a tenth of the number of citations of subsequent work that depended on them.
4. Most suppliers of citation statistics don’t count citations of books or book chapters.   This is bad for me because my only work with over 1000 citations is my 105 page chapter on methods for the analysis of single ion channels [5], which contained quite a lot of original work. It has had 1273 citations according to Google scholar but doesn’t appear at all in Scopus or Web of Science.  Neither do the 954 citations of my statistics text book [6]
5. There are often big differences between the numbers of citations reported by different commercial suppliers.  Even for papers (as opposed to book articles) there can be a two-fold difference between the number of citations reported by Scopus, Web of Science and Google Scholar.  The raw data are unreliable and commercial suppliers of metrics are apparently not willing to put in the work to ensure that their products are consistent or complete.
6. Citation counts can be (and already are being) manipulated.  The easiest way to get a large number of citations is to do no original research at all, but to write reviews in popular areas.  Another good way to have ‘impact’ is to write indecisive papers about nutritional epidemiology.  That is not behaviour that should command respect.
7. Some branches of science are already facing something of a crisis in reproducibility [7]. One reason for this is the perverse incentives which are imposed on scientists.  These perverse incentives include the assessment of their work by crude numerical indices.
8. “Gaming” of citations is easy. (If students do it it’s called cheating: if academics do it is called gaming.)  If HEFCE makes money dependent on citations, then this sort of cheating is likely to take place on an industrial scale.  Of course that should not happen, but it would (disguised, no doubt, by some ingenious bureaucratic euphemisms).
9. For example, Scigen is a program that generates spoof papers in computer science, by stringing together plausible phases.  Over 100 such papers have been accepted for publication. By submitting many such papers, the authors managed to fool Google Scholar  in to awarding the fictitious author an H-index greater than that of Albert Einstein http://en.wikipedia.org/wiki/SCIgen
10. The use of citation counts has already encouraged guest authorships and such like marginally honest behaviour.  There is no way to tell with an author on a paper has actually made any substantial contribution to the work, despite the fact that some journals ask for a statement about contribution.
11.  It has been known for 17 years that citation counts for individual papers are not detectably correlated with the impact factor of the journal in which the paper appears [1].  That doesn’t seem to have deterred metrics enthusiasts from using both. It should have done.

Given all these problems, it’s hard to see how citation counts could be useful to the REF, except perhaps in really extreme cases such as papers that get next to no citations over 5 or 10 years.

The H-index

This has all the disadvantages of citation counting, but in addition it is strongly biased against young scientists, and against women. This makes it not worth consideration by HEFCE.

Altmetrics

Given the role given to “impact” in the REF, the fact that altmetrics claim to measure impact might make them seem worthy of consideration at first sight.  One problem is that the REF failed to make a clear distinction between impact on other scientists is the field and impact on the public.

Altmetrics measures an undefined mixture of both sorts if impact, with totally arbitrary weighting for tweets, Facebook mentions and so on. But the score seems to be related primarily to the trendiness of the title of the paper.  Any paper about diet and health, however poor, is guaranteed to feature well on Twitter, as will any paper that has ‘penis’ in the title.

It’s very clear from the examples that I’ve looked at that few people who tweet about a paper have read more than the title. See Why you should ignore altmetrics and other bibliometric nightmares [8].

In most cases, papers were promoted by retweeting the press release or tweet from the journal itself.  Only too often the press release is hyped-up.  Metrics not only corrupt the behaviour of academics, but also the behaviour of journals.  In the cases I’ve examined, reading the papers revealed that they were particularly poor (despite being in glamour journals): they just had trendy titles [8]

There could even be a negative correlation between the number of tweets and the quality of the work. Those who sell altmetrics have never examined this critical question because they ignore the contents of the papers.  It would not be in their commercial interests to test their claims if the result was to show a negative correlation. Perhaps the reason why they have never tested their claims is the fear that to do so would reduce their income.

Furthermore you can buy 1000 retweets for $8.00 http://followers-and-likes.com/twitter/buy-twitter-retweets/ That’s outright cheating of course, and not many people would go that far. But authors, and journals, can do a lot of self-promotion on twitter that is totally unrelated to the quality of the work. It’s worth noting that much good engagement with the public now appears on blogs that are written by scientists themselves, but the 3.6 million views of my blog do not feature in altmetrics scores, never mind Scopus or Web of Science. Altmetrics don’t even measure public engagement very well, never mind academic merit. Evidence that metrics measure quality Any metric would be acceptable only if it measured the quality of a person’s work. How could that proposition be tested? In order to judge this, one would have to take a random sample of papers, and look at their metrics 10 or 20 years after publication. The scores would have to be compared with the consensus view of experts in the field. Even then one would have to be careful about the choice of experts (in fields like alternative medicine for example, it would be important to exclude people whose living depended on believing in it). I don’t believe that proper tests have ever been done (and it isn’t in the interests of those who sell metrics to do it). The great mistake made by almost all bibliometricians is that they ignore what matters most, the contents of papers. They try to make inferences from correlations of metric scores with other, equally dubious, measures of merit. They can’t afford the time to do the right experiment if only because it would harm their own “productivity”. The evidence that metrics do what’s claimed for them is almost non-existent. For example, in six of the ten years leading up to the 1991 Nobel prize, Bert Sakmann failed to meet the metrics-based publication target set by Imperial College London, and these failures included the years in which the original single channel paper was published [9] and also the year, 1985, when he published a paper [10] that was subsequently named as a classic in the field [11]. In two of these ten years he had no publications whatsoever. See also [12]. Application of metrics in the way that it’s been done at Imperial and also at Queen Mary College London, would result in firing of the most original minds. Gaming and the public perception of science Every form of metric alters behaviour, in such a way that it becomes useless for its stated purpose. This is already well-known in economics, where it’s know as Goodharts’s law http://en.wikipedia.org/wiki/Goodhart’s_law “"When a measure becomes a target, it ceases to be a good measure”. That alone is a sufficient reason not to extend metrics to science. Metrics have already become one of several perverse incentives that control scientists’ behaviour. They have encouraged gaming, hype, guest authorships and, increasingly, outright fraud [13]. The general public has become aware of this behaviour and it is starting to do serious harm to perceptions of all science. As long ago as 1999, Haerlin & Parr [14] wrote in Nature, under the title How to restore Public Trust in Science, “Scientists are no longer perceived exclusively as guardians of objective truth, but also as smart promoters of their own interests in a media-driven marketplace.” And in January 17, 2006, a vicious spoof on a Science paper appeared, not in a scientific journal, but in the New York Times. See http://www.dcscience.net/?p=156 The use of metrics would provide a direct incentive to this sort of behaviour. It would be a tragedy not only for people who are misjudged by crude numerical indices, but also a tragedy for the reputation of science as a whole. Conclusion There is no good evidence that any metric measures quality, at least over the short time span that’s needed for them to be useful for giving grants or deciding on promotions). On the other hand there is good evidence that use of metrics provides a strong incentive to bad behaviour, both by scientists and by journals. They have already started to damage the public perception of science of the honesty of science. The conclusion is obvious. Metrics should not be used to judge academic performance. What should be done? If metrics aren’t used, how should assessment be done? Roderick Floud was president of Universities UK from 2001 to 2003. He’s is nothing if not an establishment person. He said recently: “Each assessment costs somewhere between £20 million and £100 million, yet 75 per cent of the funding goes every time to the top 25 universities. Moreover, the share that each receives has hardly changed during the past 20 years. It is an expensive charade. Far better to distribute all of the money through the research councils in a properly competitive system.” The obvious danger of giving all the money to the Research Councils is that people might be fired solely because they didn’t have big enough grants. That’s serious -it’s already happened at Kings College London, Queen Mary London and at Imperial College. This problem might be ameliorated if there were a maximum on the size of grants and/or on the number of papers a person could publish, as I suggested at the open data debate. And it would help if univerities appointed vice-chancellors with a better long term view than most seem to have at the moment. Aggregate metrics? It’s been suggested that the problems are smaller if one looks at aggregated metrics for a whole department. rather than the metrics for individual people. Clearly looking at departments would average out anomalies. The snag is that it wouldn’t circumvent Goodhart’s law. If the money depended on the aggregate score, it would still put great pressure on universities to recruit people with high citations, regardless of the quality of their work, just as it would if individuals were being assessed. That would weigh against thoughtful people (and not least women). The best solution would be to abolish the REF and give the money to research councils, with precautions to prevent people being fired because their research wasn’t expensive enough. If politicians insist that the "expensive charade" is to be repeated, then I see no option but to continue with a system that’s similar to the present one: that would waste money and distract us from our job. 1. Seglen PO (1997) Why the impact factor of journals should not be used for evaluating research. British Medical Journal 314: 498-502. [Download pdf] 2. Colquhoun D (2003) Challenging the tyranny of impact factors. Nature 423: 479. [Download pdf] 3. Hawkes AG, Jalali A, Colquhoun D (1990) The distributions of the apparent open times and shut times in a single channel record when brief events can not be detected. Philosophical Transactions of the Royal Society London A 332: 511-538. [Get pdf] 4. Hawkes AG, Jalali A, Colquhoun D (1992) Asymptotic distributions of apparent open times and shut times in a single channel record allowing for the omission of brief events. Philosophical Transactions of the Royal Society London B 337: 383-404. [Get pdf] 5. Colquhoun D, Sigworth FJ (1995) Fitting and statistical analysis of single-channel records. In: Sakmann B, Neher E, editors. Single Channel Recording. New York: Plenum Press. pp. 483-587. 6. David Colquhoun on Google Scholar. Available: http://scholar.google.co.uk/citations?user=JXQ2kXoAAAAJ&hl=en17-6-2014 7. Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124.[full text] 8. Colquhoun D, Plested AJ Why you should ignore altmetrics and other bibliometric nightmares. Available: http://www.dcscience.net/?p=6369 9. Neher E, Sakmann B (1976) Single channel currents recorded from membrane of denervated frog muscle fibres. Nature 260: 799-802. 10. Colquhoun D, Sakmann B (1985) Fast events in single-channel currents activated by acetylcholine and its analogues at the frog muscle end-plate. J Physiol (Lond) 369: 501-557. [Download pdf] 11. Colquhoun D (2007) What have we learned from single ion channels? J Physiol 581: 425-427.[Download pdf] 12. Colquhoun D (2007) How to get good science. Physiology News 69: 12-14. [Download pdf] See also http://www.dcscience.net/?p=182 13. Oransky, I. Retraction Watch. Available: http://retractionwatch.com/18-6-2014 14. Haerlin B, Parr D (1999) How to restore public trust in science. Nature 400: 499. 10.1038/22867 [doi].[Get pdf] ### Follow-up Some other posts on this topic Why Metrics Cannot Measure Research Quality: A Response to the HEFCE Consultation Gaming Google Scholar Citations, Made Simple and Easy Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting Driving Altmetrics Performance Through Marketing Death by Metrics (October 30, 2013) Not everything that counts can be counted Using metrics to assess research quality By David Spiegelhalter “I am strongly against the suggestion that peer–review can in any way be replaced by bibliometrics” 1 July 2014 My brilliant statistical colleague, Alan Hawkes, not only laid the foundations for single molecule analysis (and made a career for me) . Before he got into that, he wrote a paper, Spectra of some self-exciting and mutually exciting point processes, (Biometrika 1971). In that paper he described a sort of stochastic process now known as a Hawkes process. In the simplest sort of stochastic process, the Poisson process, events are independent of each other. In a Hawkes process, the occurrence of an event affects the probability of another event occurring, so, for example, events may occur in clusters. Such processes were used for many years to describe the occurrence of earthquakes. More recently, it’s been noticed that such models are useful in finance, marketing, terrorism, burglary, social media, DNA analysis, and to describe invasive banana trees. The 1971 paper languished in relative obscurity for 30 years. Now the citation rate has shot threw the roof. The papers about Hawkes processes are mostly highly mathematical. They are not the sort of thing that features on twitter. They are serious science, not just another ghastly epidemiological survey of diet and health. Anybody who cites papers of this sort is likely to be a real scientist. The surge in citations suggests to me that the 1971 paper was indeed an important bit of work (because the citations will be made by serious people). How does this affect my views about the use of citations? It shows that even highly mathematical work can achieve respectable citation rates, but it may take a long time before their importance is realised. If Hawkes had been judged by citation counting while he was applying for jobs and promotions, he’d probably have been fired. If his department had been judged by citations of this paper, it would not have scored well. It takes a long time to judge the importance of a paper and that makes citation counting almost useless for decisions about funding and promotion. ## Bad financial management at Kings College London means VC Rick Trainor is firing 120 scientists #### June 7th, 2014 · 14 Comments Jump to follow-up Stop press. Financial report cast’s doubt on Trainor’s claims Science has a big problem. Most jobs are desperately insecure. It’s hard to do long term thorough work when you don’t know whether you’ll be able to pay your mortgage in a year’s time. The appalling career structure for young scientists has been the subject of much writing by the young (e.g. Jenny Rohn) and the old, e.g Bruce Alberts. Peter Lawrence (see also Real Lives and White Lies in the Funding of Scientific Research, and by me. Until recently, this problem was largely restricted to post-doctoral fellows (postdocs). They already have PhDs and they are the people who do most of the experiments. Often large numbers of them work for a single principle investigator (PI). The PI spends most of his her time writing grant applications and traveling the world to hawk the wares of his lab. They also (to variable extents) teach students and deal with endless hassle from HR. The salaries of most postdocs are paid from grants that last for three or sometimes five years. If that grant doesn’t get renewed. they are on the streets. Universities have come to exploit their employees almost as badly as Amazon does. The periodical research assessments not only waste large amounts of time and money, but they have distorted behaviour. In the hope of scoring highly, they recruit a lot of people before the submission, but as soon as that’s done with, they find that they can’t afford all of them, so some get cast aside like worn out old boots. Universities have allowed themselves to become dependent on "soft money" from grant-giving bodies. That strikes me as bad management. The situation is even worse in the USA where most teaching staff rely on research grants to pay their salaries. I have written three times about the insane methods that are being used to fire staff at Queen Mary College London (QMUL). Is Queen Mary University of London trying to commit scientific suicide? (June 2012) Queen Mary, University of London in The Times. Does Simon Gaskell care? (July 2012) and a version of it appeared th The Times (Thunderer column) In which Simon Gaskell, of Queen Mary, University of London, makes a cock-up (August 2012) The ostensible reason given there was to boost its ratings in university rankings. Their vice-chancellor, Simon Gaskell, seems to think that by firing people he can produce a university that’s full of Nobel prize-winners. The effect, of course, is just the opposite. Treating people like pawns in a game makes the good people leave and only those who can’t get a job with a better employer remain. That’s what I call bad management. At QMUL people were chosen to be fired on the basis of a plain silly measure of their publication record, and by their grant income. That was combined with terrorisation of any staff who spoke out about the process (more on that coming soon). Kings College London is now doing the same sort of thing. They have announced that they’ll fire 120 of the 777 staff in the schools of medicine and biomedical sciences, and the Institute of Psychiatry. These are humans, with children and mortgages to pay. One might ask why they were taken on the first place, if the university can’t afford them. That’s simply bad financial planning (or was it done in order to boost their Research Excellence submission?). Surely it’s been obvious, at least since 2007, that hard financial times were coming, but that didn’t dent the hubris of the people who took an so many staff. HEFCE has failed to find a sensible way to fund universities. The attempt to separate the funding of teaching and research has just led to corruption. The way in which people are to be chosen for the firing squad at Kings is crude in the extreme. If you are a professor at the Institute of Psychiatry then, unless you do a lot of teaching, you must have a grant income of at least £200,000 per year. You can read all the details in the Kings’ “Consultation document” that was sent to all employees. It’s headed "CONFIDENTIAL – Not for further circulation". Vice-chancellors still don’t seem to have realised that it’s no longer possible to keep things like this secret. In releasing it, I take ny cue from George Orwell. "Journalism is printing what someone else does not want printed: everything else is public relations.” There is no mention of the quality of your research, just income. Since in most sorts of research, the major cost is salaries, this rewards people who take on too many employees. Only too frequently, large groups are the ones in which students and research staff get the least supervision, and which bangs per buck are lowest. The university should be rewarding people who are deeply involved in research themselves -those with small groups. Instead, they are doing exactly the opposite. Women are, I’d guess, less susceptible to the grandiosity of the enormous research group, so no doubt they will suffer disproportionately. PhD students will also suffer if their supervisor is fired while they are halfway through their projects. An article in Times Higher Education pointed out "According to the Royal Society’s 2010 report The Scientific Century: Securing our Future Prosperity, in the UK, 30 per cent of science PhD graduates go on to postdoctoral positions, but only around 4 per cent find permanent academic research posts. Less than half of 1 per cent of those with science doctorates end up as professors." The panel that decides whether you’ll be fired consists of Professor Sir Robert Lechler, Professor Anne Greenough, Professor Simon Howell, Professor Shitij Kapur, Professor Karen O’Brien, Chris Mottershead, Rachel Parr & Carol Ford. If they had the slightest integrity, they’d refuse to implement such obviously silly criteria. Universities in general. not only Kings and QMUL have become over-reliant on research funders to enhance their own reputations. PhD students and research staff are employed for the benefit of the university (and of the principle investigator), not for the benefit of the students or research staff, who are treated as expendable cost units, not as humans. One thing that we expect of vice-chancellors is sensible financial planning. That seems to have failed at Kings. One would also hope that they would understand how to get good science. My only previous encounter with Kings’ vice chancellor, Rick Trainor, suggests that this is not where his talents lie. While he was president of the Universities UK (UUK), I suggested to him that degrees in homeopathy were not a good idea. His response was that of the true apparatchik. “. . . degree courses change over time, are independently assessed for academic rigour and quality and provide a wider education than the simple description of the course might suggest” That is hardly a response that suggests high academic integrity. The students’ petition is on Change.org. ### Follow-up The problems that are faced in the UK are very similar to those in the USA. They have been described with superb clarity in “Rescuing US biomedical research from its systemic flaws“, This article, by Bruce Alberts, Marc W. Kirschner, Shirley Tilghman, and Harold Varmus, should be read by everyone. They observe that ” . . . little has been done to reform the system, primarily because it continues to benefit more established and hence more influential scientists”. I’d be more impressed by the senior people at Kings if they spent time trying to improve the system rather than firing people because their research is not sufficiently expensive. 10 June 2014 Progress on the cull, according to an anonymous correspondent “The omnishambles that is KCL management 1) We were told we would receive our orange (at risk) or green letters (not at risk, this time) on Thursday PM 5th June as HR said that it’s not good to get bad news on a Friday! 2) We all got a letter on Friday that we would not be receiving our letters until Monday, so we all had a tense weekend 3) I finally got my letter on Monday, in my case it was “green” however a number of staff who work very hard at KCL doing teaching and research are “orange”, un bloody believable As you can imagine the moral at King’s has dropped through the floor” 18 June 2014 Dorothy Bishop has written about the Trainor problem. Her post ends “One feels that if KCL were falling behind in a boat race, they’d respond by throwing out some of the rowers”. The students’ petition can be found on the #KCLHealthSOS site. There is a reply to the petition, from Professor Sir Robert Lechler, and a rather better written response to it from students. Lechler’s response merely repeats the weasel words, and it attacks a few straw men without providing the slightest justification for the criteria that are being used to fire people. One can’t help noticing how often knighthoods go too the best apparatchiks rather than the best scientists. 14 July 2014 A 2013 report on Kings from Standard & Poor’s casts doubt on Trainor’s claims Download the report from Standard and Poor’s Rating Service A few things stand out. • KCL is in a strong financial position with lower debt than other similar Universities and cash reserves of £194 million. • The report says that KCL does carry some risk into the future especially that related to its large capital expansion program. • The report specifically warns KCL over the consequences of any staff cuts. Particularly relevant are the following quotations • Page p3 “Further staff-cost curtailment will be quite difficult …pressure to maintain its academic and non-academic service standards will weigh on its ability to cut costs further.” • page 4 The report goes on to say (see the section headed outlook, especially the final paragraph) that any decrease in KCL’s academic reputation (e.g. consequent on staff cuts) would be likely to impair its ability to attract overseas students and therefore adversely affect its financial position. • page 10 makes clear that KCL managers are privately aiming at 10% surplus, above the 6% operating surplus they talk about with us. However, S&P considers that ‘ambitious’. In other words KCL are shooting for double what a credit rating agency considers realistic. One can infer from this that 1. what staff have been told about the cuts being an immediate necessity is absolute nonsense 2. KCL was warned against staff cuts by a credit agency 3. the main problem KCL has is its overambitious building policy 4. KCL is implementing a policy (staff cuts) which S & P warned against as they predict it may result in diminishing income. What on earth is going on? 16 July 2014 I’ve been sent yet another damning document. The BMA’s response to Kings contains some numbers that seem to have escaped the attention of managers at Kings. ## Deadly Medicines and Organised Crime: a review #### April 16th, 2014 · 4 Comments This is a web version of a review of Peter Gotzsche’s book. It appeared in the April 2014 Healthwatch Newsletter. Read the whole newsletter. It has lots of good stuff. Their newsletters are here. Healthwatch has been exposing quackery since 1989. Their very first newsletter is still relevant.  Most new drugs and vaccines are developed by the pharmaceutical industry. The industry has produced huge benefits for mankind. But since the Thatcherite era it has come to be dominated by marketing people who appear to lack any conscience. That’s what gave rise to the Alltrials movement. It was founded in January 2013 with the aim of ensuring that all past and present clinical trials are registered before they start and that and their results are published The industry has been dragged, kicking and screaming, towards a new era of transparency, with two of the worst offenders, GSK and Roche, now promising to release all data. Let’s hope this is the beginning of real open science. This version is not quite identical with the published version in which several changes were enforced by Healthwatch’s legal adviser. They weren’t very big changes, but here is the original. ### Deadly Medicines and Organised Crime By Peter Gøtzsche, reviewed by David Colquhoun Published by Radcliffe Publishing Ltd on 1 August 2013. RRP £24.99 (320 pages, paperback) ISBN-10: 1846198844 ISBN-13: 978-1846198847 As someone who has spent a lifetime teaching pharmacology, this book is a bitter pill to swallow. It makes Goldacre’s Bad Pharma seem quite mild. In fairness, the bits of pharmacology that I’ve taught concern mostly drugs that do work quite well. Things like neuromuscular blocking agents, local anaesthetics, general anaesthetics, anticoagulants, cardiac glycosides and thyroid drugs all do pretty much what is says on the label. Peter Gøtzsche is nothing if not evidence man. He directs the Nordic Cochrane group, and he talks straight. His book is about drugs that don’t work as advertised. There is no doubt whatsoever that the pharmaceutical industry has behaved very badly indeed in the last couple of decades. You don’t have to take my word for it, nor Peter Gotzche’s, nor Ben Goldacre’s. They have told us about it themselves. Not voluntarily of course, but in internal emails that have been revealed during court proceedings, and from whistleblowers. Peter Rost was vice president marketing for the huge pharmaceutical company, Pfizer, until he was fired after the company failed to listen to his complaints about illegal marketing of human growth hormone as an anti-ageing drug. After this he said: “It is scary how many similarities there are between this industry and the mob. The mob makes obscene amounts of money, as does this industry. The side effects of organized crime are killings and deaths, and the side effects are the same in this industry. The mob bribes politicians and others, and so does the drug industry … “ The pharmaceutical industry is the biggest defrauder of the US federal government under the False Claims Act. Roche led a cartel that, according to the US Justice Department’s antitrust division, was the most pervasive and harmful criminal antitrust conspiracy ever uncovered. Multibillion dollar fines have been levied on all of the big companies (almost all in the USA, other countries have been supine), though the company’s profits are so huge they are regarded as marketing expenses. It’s estimated that adverse effects of drugs kill more people than anything but cancer and heart disease, roughly half as many as cigarettes. This horrifying statistic is announced at the beginning of the book, though you have to wait until Chapter 21 to find the data. I’d have liked to see a more critical discussion of the problems of causality in deciding why someone died, which are just as big as those in deciding why somebody recovered. Nevertheless, nobody seems to deny that the numbers who are killed by their treatments are alarmingly high. Gøtzsche’s book deals with a wide range of drugs that don’t do what it says on the label, but which have made fortunes because of corruption of the scientific process. These include non-steroidal anti-inflammatory drugs (NSAIDs), an area described as “a horror story filled with extravagant claims, bending of the rules, regulatory inaction, . . .”. Other areas where there has been major misbehaviour include diabetes (Avandia), and the great Tamiflu scandal. and the great Tamiflu scandal. It took five years of pressure before Roche released the hidden data about Tamiflu trials. It barely works. Goldacre commented “government’s Tamiflu stockpile wouldn’t have done us much good in the event of a flu epidemic” But the worst single area is psychiatry. Two of the chapters in the book deal with psychiatry. Nobody has the slightest idea how the brain works (don’t believe the neuroscience hype) or what causes depression or psychosis. Treatments are no more than guesses and none of them seems to work very well. The problems with the SSRI antidepressant, paroxetine (Seroxat in UK, Paxil in USA) were brought to public attention, not by a regulator, but by a BBC Panorama television programme. The programme revealed that a PR company, which worked for GSK, had written "Originally we had planned to do extensive media relations surrounding this study until we actually viewed the results. Essentially the study did not really show it was effective in treating adolescent depression, which is not something we want to publicise." This referred to the now-notorious study 329. It was intended to show that paroxetine should be recommended for adolescent depression. The paper that eventually appeared in 2001 grossly misrepresented the results. The conclusions stated “Paroxetine is generally well tolerated and effective for major depression in adolescents”, despite the fact that GSK already knew this wasn’t true. The first author of this paper was Martin Keller, chair of psychiatry at Brown University, RI, with 21 others. But the paper wasn’t written by them, but by ghost authors working for GSK. Keller admitted that he hadn’t checked the results properly. That’s not all. Gøtzsche comments thus. “Keller is some character. He double- billed his travel expenses, which were reimbursed both by his university and the drug sponsor. Further, the Massachusetts Department of Mental Health had paid Brown’s psychiatry department, which Keller chaired, hundreds of thousands of dollars to fund research that wasn’t being conducted. Keller himself received hundreds of thousands of dollars from drug companies every year that he didn’t disclose.” His department received$50 million in research funding. Brown University has never admitted that there was a problem.  It still boasts about this infamous paper

The extent of corruption at Brown University rivals the mob.

The infamous case of Richard Eastell at Sheffield university is no better.  He admitted in print to lying about who’d seen the data.  The university did nothing but fire the whistleblower.

Another trial, study 377, also showed that paroxetine didn’t work.  GSK suppressed it.

“There are no plans to publish data from Study 377” (Seroxat/Paxil Adolescent Depression. Position piece on the phase III clinical studies. GlaxoSmithKline document. 1998 Oct.)

Where were the regulatory agencies during all this?  The MHRA did ban use of paroxetine in adolescents in 2003, but their full investigation didn’t report until 2008.  It came to much the same conclusions as the TV programme six years earlier about the deceit. But despite that, no prosecution was brought.  GSK got away with a deferential rap on the knuckles.

Fiona Godlee (editor of the BMJ, which had turned down the paper) commented

“We shouldn’t have to rely on investigative journalists to ask the difficult questions”

Now we can add bloggers to that list of people who ask difficult questions.  The scam operated by the University of Wales, in ‘validating’ external degrees was revealed by my blog and by BBC TV Wales.  The Quality Assurance Agency came in only at the last moment.  Regulators regularly fail to regulate.

 Despite all this, the current MHRA learning module on SSRIs contains little hint that SSRIs simply don’t work for mild or moderate depression.  Neither does the current NICE guidance.   Some psychiatrists still think they do work, despite there being so many negative trials.

The psychiatrists’ narrative goes like this. You don’t expect to see improvements for many weeks (despite the fact that serotonin uptake is stopped immediately).  You may get worse before you get better. And if the first sort of pill doesn’t work, try another one.  That’s pretty much identical with what a homeopath will tell you.  The odds are that its meaning is, wait a while and you’ll get better eventually, regardless of treatment.

It’s common to be told that they must work because when you stop taking them, you get worse.  But, perhaps more likely, when you stop taking them you get withdrawal symptoms, because the treatment itself caused a chemical imbalance.   Gøtzsche makes a strong case that most psychiatric drugs do more harm than good, if taken for any length of time.  Marcia Angell makes a similar case in The Illusions of Psychiatry.

Gøtzsche will inevitably be accused of exaggerating.  Chapter 14 ends thus.

“Merck stated only 6 months before it withdrew Vioxx that ‘MSD is fully committed to the highest standards of scientific integrity, ethics, and protection of patient’s wellbeing in our research. We have a tradition of partnership with leaders in the academic research community. Great. Let’s have some more of such ethical partnerships. They often kill our patients while everyone else prospers.

Perhaps Hells Angels should consider something similar in their PR: We are fully committed to the highest standards of integrity, ethics and protection of citizens’ well- being when we push narcotic drugs. We have a tradition of partnership with leaders in the police force”.

But the evidence is there.  The book has over 900 references.  Much of the wrongdoing has been laid bare by legal actions. I grieve for the state of my subject.

The wrongdoing by pharma is a disgrace.

The corruption of universities and academics is even worse, because they are meant to be our defence against commercial corruption.

All one can do is to take consolation from the fact that academics, like Gøtzsche and Goldacre, and a host of bloggers, are the people who are revealing what’s wrong.  As a writer for the business magazine, Fortune, said

“For better or worse, the drug industry is going to have to get used to Dr. Peter Rost – and others like him.”

At a recent meeting I said that it was tragic that medicine, the caring profession, was also the most corrupt (though I’m happy to admit that other jobs might be as bad if offered as much money).

At present there is little transparency.  There is no way that I can tell whether my doctor is taking money from pharma, data are still hidden from public scrutiny by regulatory agencies (which are stuffed with people who take pharma money) as well as by companies.  Governments regard business as more important than patients. In the UK, the Government continued promotion of the fake bomb detector for many years after they’d been told it was fake.  Their attitude to fake medicines is not much different.  Business is business, right?

One side effect of the horrific corruption is that it’s used as a stick by the alternative medicine industry. That’s silly of them, because their business is more or less 100% mendacious marketing of ineffective treatments.  At least half of pharma products really do work.

Fines are useless. Nothing will change until a few CEOs, a few professors and a few vice-chancellors spend time in jail for corruption.

Read this book. Get angry. Do something.

## On the hazards of significance testing. Part 2: the false discovery rate, or how not to make a fool of yourself with P values

#### March 24th, 2014 · 26 Comments

What follows is a simplified version of part of a paper that has now appeared as a preprint on arXiv. If you find anything wrong, or obscure, please email me. Be vicious: it will improve the eventual paper.

It’s a follow-up to my very first paper, which was written in 1959 – 60, while I was a fourth year undergraduate.(the history is in a recent blog). I hope this one is better.

‘". . . before anything was known of Lydgate’s skill, the judgements on it had naturally been divided, depending on a sense of likelihood, situated perhaps in the pit of the stomach, or in the pineal gland, and differing in its verdicts, but not less valuable as a guide in the total deficit of evidence" ‘George Eliot (Middlemarch, Chap. 45)

"The standard approach in teaching, of stressing the formal definition of a p-value while warning against its misinterpretation, has simply been an abysmal failure”  Sellke et al. (2001) `The American Statistician’ (55), 62–71

The last post was about screening. It showed that most screening tests are useless, in the sense that a large proportion of people who test positive do not have the condition. This proportion can be called the false discovery rate. You think you’ve discovered the condition, but you were wrong.

Very similar ideas can be applied to tests of significance. If you read almost any scientific paper you’ll find statements like "this result was statistically significant (P = 0.047)". Tests of significance were designed to prevent you from making a fool of yourself by claiming to have discovered something, when in fact all you are seeing is the effect of random chance. In this case we define the false discovery rate as the probability that, when a test comes out as ‘statistically significant’, there is actually no real effect.

You can also make a fool of yourself by failing to detect a real effect, but this is less harmful to your reputation.

It’s very common for people to claim that an effect is real, not just chance, whenever the test produces a P value of less than 0.05, and when asked, it’s common for people to think that this procedure gives them a chance of 1 in 20 of making a fool of themselves. Leaving aside that this seems rather too often to make a fool of yourself, this interpretation is simply wrong.

The purpose of this post is to justify the following proposition.

 If you observe a P value close to 0.05, your false discovery rate will not be 5%.    It will be at least 30% and it could easily be 80% for small studies.

This makes slightly less startling the assertion in John Ioannidis’ (2005) article, Why Most Published Research Findings Are False. That paper caused quite a stir. It’s a serious allegation. In fairness, the title was a bit misleading. Ioannidis wasn’t talking about all science. But it has become apparent that an alarming number of published works in some fields can’t be reproduced by others. The worst offenders seem to be clinical trials, experimental psychology and neuroscience, some parts of cancer research and some attempts to associate genes with disease (genome-wide association studies). Of course the self-correcting nature of science means that the false discoveries get revealed as such in the end, but it would obviously be a lot better if false results weren’t published in the first place.

How can tests of significance be so misleading?

Tests of statistical significance have been around for well over 100 years now. One of the most widely used is Student’s t test. It was published in 1908. ‘Student’ was the pseudonym for William Sealy Gosset, who worked at the Guinness brewery in Dublin. He visited Karl Pearson’s statistics department at UCL because he wanted statistical methods that were valid for testing small samples. The example that he used in his paper was based on data from Arthur Cushny, the first holder of the chair of pharmacology at UCL (subsequently named the A.J. Clark chair, after its second holder)

The outcome of a significance test is a probability, referred to as a P value. First, let’s be clear what the P value means. It will be simpler to do that in the context of a particular example. Suppose we wish to know whether treatment A is better (or worse) than treatment B (A might be a new drug, and B a placebo). We’d take a group of people and allocate each person to take either A or B and the choice would be random. Each person would have an equal chance of getting A or B. We’d observe the responses and then take the average (mean) response for those who had received A and the average for those who had received B. If the treatment (A) was no better than placebo (B), the difference between means should be zero on average. But the variability of the responses means that the observed difference will never be exactly zero. So how big does it have to be before you discount the possibility that random chance is all you were seeing. You do the test and get a P value. Given the ubiquity of P values in scientific papers, it’s surprisingly rare for people to be able to give an accurate definition. Here it is.

 The P value is the probability that you would find a difference as big as that observed, or a still bigger value, if in fact A and B were identical.

If this probability is low enough, the conclusion would be that it’s unlikely that the observed difference (or a still bigger one) would have occurred if A and B were identical, so we conclude that they are not identical, i.e. that there is a genuine difference between treatment and placebo.

This is the classical way to avoid making a fool of yourself by claiming to have made a discovery when you haven’t. It was developed and popularised by the greatest statistician of the 20th century, Ronald Fisher, during the 1920s and 1930s. It does exactly what it says on the tin. It sounds entirely plausible.

What could possibly go wrong?

Another way to look at significance tests

One way to look at the problem is to notice that the classical approach considers only what would happen if there were no real effect or, as a statistician would put it, what would happen if the null hypothesis were true. But there isn’t much point in knowing that an event is unlikely when the null hypothesis is true unless you know how likely it is when there is a real effect.

We can look at the problem a bit more realistically by means of a tree diagram, very like that used to analyse screening tests, in the previous post.

In order to do this, we need to specify a couple more things.

First we need to specify the power of the significance test. This is the probability that we’ll detect a difference when there really is one. By ‘detect a difference’ we mean that the test comes out with P < 0.05 (or whatever level we set). So it’s analogous with the sensitivity of a screening test. In order to calculate sample sizes, it’s common to set the power to 0.8 (obviously 0.99 would be better, but that would often require impracticably large samples).

The second thing that we need to specify is a bit trickier, the proportion of tests that we do in which there is a real difference. This is analogous to the prevalence of the disease in the population being tested in the screening example. There is nothing mysterious about it. It’s an ordinary probability that can be thought of as a long-term frequency. But it is a probability that’s much harder to get a value for than the prevalence of a disease.

If we were testing a series of 30C homeopathic pills, all of the pills, regardless of what it says on the label, would be identical with the placebo controls so the prevalence of genuine effects, call it P(real), would be zero. So every positive test would be a false positive: the false discovery rate would be 100%. But in real science we want to predict the false discovery rate in less extreme cases.

Suppose, for example, that we test a large number of candidate drugs. Life being what it is, most of them will be inactive, but some will have a genuine effect. In this example we’d be lucky if 10% had a real effect, i.e. were really more effective than the inactive controls. So in this case we’d set the prevalence to P(real) = 0.1.

We can now construct a tree diagram exactly as we did for screening tests.

Suppose that we do 1000 tests. In 90% of them (900 tests) there is no real effect: the null hypothesis is true. If we use P = 0.05 as a criterion for significance then, according to the classical theory, 5% of them (45 tests) will give false positives, as shown in the lower limb of the tree diagram. If the power of the test was 0.8 then we’ll detect 80% of the real differences so there will be 80 correct positive tests.

The total number of positive tests is 45 + 80 = 125, and the proportion of these that are false positives is 45/125 = 36 percent. Our false discovery rate is far bigger than the 5% that many people still believe they are attaining.

In contrast, 98% of negative tests are right (though this is less surprising because 90% of experiments really have no effect).

The equation

You can skip this section without losing much.

As in the case of screening tests, this result can be calculated from an equation. The same equation works if we substitute power for sensitivity, P(real) for prevalence, and siglev for (1 – specificity) where siglev is the cut off value for "significance", 0.05 in our examples.

The false discovery rate (the probability that, if a “signifcant” result is found, there is actually no real effect) is given by

$FDR = \frac{siglev\left(1-P(real)\right)}{power.P(real) + siglev\left(1-P(real)\right) }\;$

In the example above, power = 0.8, siglev = 0.05 and P(real) = 0.1, so the false discovery rate is

$\frac{0.05 (1-0.1)}{0.8 \times 0.1 + 0.05 (1-0.1) }\; = 0.36$

So 36% of “significant” results are wrong, as found in the tree diagram.

Some subtleties

The argument just presented should be quite enough to convince you that significance testing, as commonly practised, will lead to disastrous numbers of false positives. But the basis of how to make inferences is still a matter that’s the subject of intense controversy among statisticians, so what is an experimenter to do?

It is difficult to give a consensus of informed opinion because, although there is much informed opinion, there is rather little consensus. A personal view follows.  Colquhoun (1970), Lectures on Biostatistics, pp 94-95.

This is almost as true now as it was when I wrote it in the late 1960s, but there are some areas of broad agreement.

There are two subtleties that cause the approach outlined above to be a bit contentious. The first lies in the problem of deciding the prevalence, P(real). You may have noticed that if the frequency of real effects were 50% rather than 10%, the approach shown in the diagram would give a false discovery rate of only 6%, little different from the 5% that’s embedded in the consciousness of most experimentalists.

But this doesn’t get us off the hook, for two reasons. For a start, there is no reason at all to think that there will be a real effect there in half of the tests that we do. Of course if P(real) were even bigger than 0.5, the false discovery rate would fall to zero, because when P(real) = 1, all effects are real and therefore all positive tests are correct.

There is also a more subtle point. If we are trying to interpret the result of a single test that comes out with a P value of, say, P = 0.047, then we should not be looking at all significant results (those with P < 0.05), but only at those tests that come out with P = 0.047. This can be done quite easily by simulating a long series of t tests, and then restricting attention to those that come out with P values between, say, 0.045 and 0.05. When this is done we find that the false discovery rate is at least 26%. That’s for the best possible case where the sample size is good (power of the test is 0.8) and the prevalence of real effects is 0.5. When, as in the tree diagram, the prevalence of real effects is 0.1, the false discovery rate is 76%. That’s enough to justify Ioannidis’ statement that most published results are wrong.

One problem with all of the approaches mentioned above was the need to guess at the prevalence of real effects (that’s what a Bayesian would call the prior probability). James Berger and colleagues (Sellke et al., 2001) have proposed a way round this problem by looking at all possible prior distributions and so coming up with a minimum false discovery rate that holds universally. The conclusions are much the same as before. If you claim to have found an effects whenever you observe a P value just less than 0.05, you will come to the wrong conclusion in at least 29% of the tests that you do. If, on the other hand, you use P = 0.001, you’ll be wrong in only 1.8% of cases. Valen Johnson (2013) has reached similar conclusions by a related argument.

A three-sigma rule

As an alternative to insisting on P < 0.001 before claiming you’ve discovered something, you could use a 3-sigma rule. In other words, insist that an effect is at least three standard deviations away from the control value (as opposed to the two standard deviations that correspond to P = 0.05).

The three sigma rule means using P= 0.0027 as your cut off. This, according to Berger’s rule, implies a false discovery rate of (at least) 4.5%, not far from the value that many people mistakenly think is achieved by using P = 0.05 as a criterion.

Particle physicists go a lot further than this. They use a 5-sigma rule before announcing a new discovery. That corresponds to a P value of less than one in a million (0.57 x 10−6). According to Berger’s rule this corresponds to a false discovery rate of (at least) around 20 per million. Of course their experiments can’t be randomised usually, so it’s as well to be on the safe side.

Underpowered experiments

All of the problems discussed so far concern the near-ideal case. They assume that your sample size is big enough (power about 0.8 say) and that all of the assumptions made in the test are true, that there is no bias or cheating and that no negative results are suppressed. The real-life problems can only be worse. One way in which it is often worse is that sample sizes are too small, so the statistical power of the tests is low.

The problem of underpowered experiments has been known since 1962, but it has been ignored. Recently it has come back into prominence, thanks in large part to John Ioannidis and the crisis of reproducibility in some areas of science. Button et al. (2013) said

“We optimistically estimate the median statistical power of studies in the neuroscience field to be between about 8% and about 31%”

This is disastrously low. Running simulated t tests shows that with a power of 0.2, not only do you have only a 20% chance of detecting a real effect, but that when you do manage to get a "significant" result there is a 76% chance that it’s a false discovery.

And furthermore, when you do find a "significant" result, the size of the effect will be over-estimated by a factor of nearly 2. This "inflation effect" happens because only those experiments that happen, by chance, to have a larger-than-average effect size will be deemed to be "significant".

What should you do to prevent making a fool of yourself?

The simulated t test results, and some other subtleties, will be described in a paper, and/or in a future post. But I hope that enough has been said here to convince you that there are real problems in the sort of statistical tests that are universal in the literature.

The blame for the crisis in reproducibility has several sources.

One of them is the self-imposed publish-or-perish culture, which values quantity over quality, and which has done enormous harm to science.

The mis-assessment of individuals by silly bibliometric methods has contributed to this harm. Of all the proposed methods, altmetrics is demonstrably the most idiotic. Yet some vice-chancellors have failed to understand that.

Another is scientists’ own vanity, which leads to the PR department issuing disgracefully hyped up press releases.

In some cases, the abstract of a paper states that a discovery has been made when the data say the opposite. This sort of spin is common in the quack world. Yet referees and editors get taken in by the ruse (e.g see this study of acupuncture).

The reluctance of many journals (and many authors) to publish negative results biases the whole literature in favour of positive results. This is so disastrous in clinical work that a pressure group has been started; altrials.net "All Trials Registered | All Results Reported".

Yet another problem is that it has become very hard to get grants without putting your name on publications to which you have made little contribution. This leads to exploitation of young scientists by older ones (who fail to set a good example). Peter Lawrence has set out the problems.

And, most pertinent to this post, a widespread failure to understand properly what a significance test means must contribute to the problem. Young scientists are under such intense pressure to publish, they have no time to learn about statistics.

Here are some things that can be done.

• Notice that all statistical tests of significance assume that the treatments have been allocated at random. This means that application of significance tests to observational data, e.g. epidemiological surveys of diet and health, is not valid. You can’t expect to get the right answer. The easiest way to understand this assumption is to think about randomisation tests (which should have replaced t tests decades ago, but which are still rare). There is a simple introduction in Lectures on Biostatistics (chapters 8 and 9). There are other assumptions too, about the distribution of observations, independence of measurements), but randomisation is the most important.
• Never, ever, use the word "significant" in a paper. It is arbitrary, and, as we have seen, deeply misleading. Still less should you use "almost significant", "tendency to significant" or any of the hundreds of similar circumlocutions listed by Matthew Hankins on his Still not Significant blog.
• If you do a significance test, just state the P value and give the effect size and confidence intervals (but be aware that 95% intervals may be misleadingly narrow)
• Observation of a P value close to 0.05 means nothing more than ‘worth another look’. In practice, one’s attitude will depend on weighing the losses that ensue if you miss a real effect against the loss to your reputation if you claim falsely to have made a discovery.
• If you want to avoid making a fool of yourself most of the time, don’t regard anything bigger than P < 0.001 as a demonstration that you’ve discovered something. Or, slightly less stringently, use a three-sigma rule.

Despite the gigantic contributions that Ronald Fisher made to statistics, his work has been widely misinterpreted. We must, however reluctantly, concede that there is some truth in the comment made by an astute journalist:

"The plain fact is that 70 years ago Ronald Fisher gave scientists a mathematical machine for turning baloney into breakthroughs, and °flukes into funding. It is time to pull the plug". Robert Matthews Sunday Telegraph, 13 September 1998.

### Follow-up

31 March 2014 I liked Stephen Senn’s first comment on twitter (the twitter stream is storified here). He said " I may have to write a paper ‘You may believe you are NOT a Bayesian but you’re wrong’". I maintain that the analysis here is merely an exercise in conditional probabilities. It bears a formal similarity to a Bayesian argument, but is free of more contentious parts of the Bayesian approach. This is amplified in a comment, below.

4 April 2014

I just noticed that my first boss, Heinz Otto Schild.in his 1942 paper about the statistical analysis of 2+2 dose biological assays (written while he was interned at the beginning of the war) chose to use 99% confidence limits, rather than the now universal 95% limits. The later are more flattering to your results, but Schild was more concerned with precision than self-promotion.

## On the hazards of significance testing. Part 1: the screening problem

#### March 10th, 2014 · 41 Comments

This post is about why screening healthy people is generally a bad idea. It is the first in a series of posts on the hazards of statistics.

There is nothing new about it: Graeme Archer recently wrote a similar piece in his Telegraph blog. But the problems are consistently ignored by people who suggest screening tests, and by journals that promote their work. It seems that it can’t be said often enough.

The reason is that most screening tests give a large number of false positives. If your test comes out positive, your chance of actually having the disease is almost always quite small. False positive tests cause alarm, and they may do real harm, when they lead to unnecessary surgery or other treatments.

Tests for Alzheimer’s disease have been in the news a lot recently. They make a good example, if only because it’s hard to see what good comes of being told early on that you might get Alzheimer’s later when there are no good treatments that can help with that news. But worse still, the news you are given is usually wrong anyway.

Consider a recent paper that described a test for "mild cognitive impairment" (MCI), a condition that may, but often isn’t, a precursor of Alzheimer’s disease. The 15-minute test was published in the Journal of Neuropsychiatry and Clinical Neurosciences by Scharre et al (2014). The test sounded pretty good. It had a specificity of 95% and a sensitivity of 80%.

Specificity (95%) means that 95% of people who are healthy will get the correct diagnosis: the test will be negative.

Sensitivity (80%) means that 80% of people who have MCI will get the correct diagnosis: the test will be positive.

To understand the implication of these numbers we need to know also the prevalence of MCI in the population that’s being tested. That was estimated as 1% of people have MCI. Or, for over-60s only, 5% of people have MCI. Now the calculation is easy. Suppose 10.000 people are tested. 1% (100 people) will have MCI, of which 80% (80 people) will be diagnosed correctly. And 9,900 do not have MCI, of which 95% will test negative (correctly). The numbers can be laid out in a tree diagram.

The total number of positive tests is 80 + 495 = 575, of which 495 are false positives. The fraction of tests that are false positives is 495/575= 86%.

Thus there is a 14% chance that if you test positive, you actually have MCI. 86% of people will be alarmed unnecessarily.

Even for people over 60. among whom 5% of the population have MC!, the test is gives the wrong result (54%) more often than it gives the right result (46%).

The test is clearly worse than useless. That was not made clear by the authors, or by the journal. It was not even made clear by NHS Choices.

It should have been.

It’s easy to put the tree diagram in the form of an equation. Denote sensitivity as sens, specificity as spec and prevalence as prev.

The probability that a positive test means that you actually have the condition is given by

$\frac{sens.prev}{sens.prev + \left(1-spec\right)\left(1-prev\right) }\;$

In the example above, sens = 0.8, spec = 0.95 and prev = 0.01, so the fraction of positive tests that give the right result is

$\frac{0.8 \times 0.01}{0.8 \times 0.01 + \left(1 - 0.95 \right)\left(1 - 0.01\right) }\; = 0.139$

So 13.9% of positive tests are right, and 86% are wrong, as found in the tree diagram.

The lipid test for Alzheimers’

Another Alzheimers’ test has been in the headlines very recently. It performs even worse than the 15-minute test, but nobody seems to have noticed. It was published in Nature Medicine, by Mapstone et al. (2014). According to the paper, the sensitivity is 90% and the specificity is 90%, so, by constructing a tree, or by using the equation, the probability that you are ill, given that you test positive is a mere 8% (for a prevalence of 1%). And even for over-60s (prevalence 5%), the value is only 32%, so two-thirds of positive tests are still wrong. Again this was not pointed out by the authors. Nor was it mentioned by Nature Medicine in its commentary on the paper. And once again, NHS Choices missed the point.

Why does there seem to be a conspiracy of silence about the deficiencies of screening tests? It has been explained very clearly by people like Margaret McCartney who understand the problems very well. Is it that people are incapable of doing the calculations? Surely not. Is it that it’s better for funding to pretend you’ve invented a good test, when you haven’t? Do journals know that anything to do with Alzheimers’ will get into the headlines, and don’t want to pour cold water on a good story?

Whatever the explanation, it’s bad science that can harm people.

### Follow-up

March 12 2014. This post was quickly picked up by the ampp3d blog, run by the Daily Mirror. Conrad Quilty-Harper showed some nice animations under the heading How a “90% accurate” Alzheimer’s test can be wrong 92% of the time.

March 12 2014.

As so often, the journal promoted the paper in a way that wasn’t totally accurate. Hype is more important than accuracy, I guess.

June 12 2014.

The empirical evidence shows that “general health checks” (a euphemism for mass screening of the healthy) simply don’t help. See review by Gøtzsche, Jørgensen & Krogsbøll (2014) in BMJ. They conclude

“Doctors should not offer general health checks to their patients,and governments should abstain from introducing health check programmes, as the Danish minister of health did when she learnt about the results of the Cochrane review and the Inter99 trial. Current programmes, like the one in the United Kingdom,should be abandoned.”

8 July 2014

Yet another over-hyped screening test for Alzheimer’s in the media. And once again. the hype originated in the press release, from Kings College London this time. The press release says

"They identified a combination of 10 proteins capable of predicting whether individuals with MCI would develop Alzheimer’s disease within a year, with an accuracy of 87 percent"

The term “accuracy” is not defined in the press release. And it isn’t defined in the original paper either. I’ve written to senior author, Simon Lovestone to try to find out what it means. The original paper says

"Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85% and specificity 88%)."

A simple calculation, as shown above, tells us that with sensitivity 85% and specificity 88%. the fraction of people who have a positive test who are diagnosed correctly is 44%. So 56% of positive results are false alarms. These numbers assume that the prevalence of the condition in the population being tested is 10%, a higher value than assumed in other studies. If the prevalence were only 5% the results would be still worse: 73% of positive tests would be wrong. Either way, that’s not good enough to be useful as a diagnostic method.

In one of the other recent cases of Alzheimer’s tests, six months ago, NHS Choices fell into the same trap. They changed it a bit after I pointed out the problem in the comments. They seem to have learned their lesson because their post on this study was titled “Blood test for Alzheimer’s ‘no better than coin toss’ “. That’s based on the 56% of false alarms mention above.

The reports on BBC News and other media totally missed the point. But, as so often, their misleading reports were based on a misleading press release. That means that the university, and ultimately the authors, are to blame.

I do hope that the hype has no connection with the fact that Conflicts if Interest section of the paper says

"SL has patents filed jointly with Proteome Sciences plc related to these findings"

What it doesn’t mention is that, according to Google patents, Kings College London is also a patent holder, and so has a vested interest in promoting the product.

Is it really too much to expect that hard-pressed journalists might do a simple calculation, or phone someone who can do it for them? Until that happens, misleading reports will persist.

9 July 2014

It was disappointing to see that the usually excellent Sarah Boseley in the Guardian didn’t spot the problem either. And still more worrying that she quotes Dr James Pickett, head of research at the Alzheimer’s Society, as saying

These 10 proteins can predict conversion to dementia with less than 90% accuracy, meaning one in 10 people would get an incorrect result.

That number is quite wrong. It isn’t 1 in 10, it’s rather more than 1 in 2.

A resolution

After corresponding with the author, I now see what is going on more clearly.

The word "accuracy" was not defined in the paper, but was used in the press release and widely cited in the media. What it means is the ratio of the total number of true results (true positives + true negatives) to the total number of all results. This doesn’t seem to me to be useful number to give at all, because it conflates false negatives and false positives into a single number. If a condition is rare, the number of true negatives will be large (as shown above), but this does not make it a good test. What matters most to patients is not accuracy, defined in this way, but the false discovery rate.

The author makes it clear that the results are not intended to be a screening test for Alzheimer’s. It’s obvious from what’s been said that it would be a lousy test. Rather, the paper was intended to identify patients who would eventually (well, within only 18 months) get dementia. The denominator (always the key to statistical problems) in this case is the highly atypical patients that who come to memory clinics in trials centres (the potential trials population). The prevalence in this very restricted population may indeed be higher that the 10 percent that I used above.

Reading between the lines of the press release, you might have been able to infer some of thus (though not the meaning of “accuracy”). The fact that the media almost universally wrote up the story as a “breakthrough” in Alzeimer’s detection, is a consequence of the press release and of not reading the original paper.

I wonder whether it is proper for press releases to be issued at all for papers like this, which address a narrow technical question (selection of patients for trials). That us not a topic of great public interest. It’s asking for misinterpretation and that’s what it got.

I don’t suppose that it escaped the attention of the PR people at Kings that anything that refers to dementia is front page news, whether it’s of public interest or not. When we had an article in Nature in 2008, I remember long discussions about a press release with the arts graduate who wrote it (not at our request). In the end we decides that the topic was not of sufficient public interest to merit a press release and insisted that none was issued. Perhaps that’s what should have happened in this case too.

This discussion has certainly illustrated the value of post-publication peer review. See, especially, the perceptive comments, below, from Humphrey Rang and from Dr Aston and from Dr Kline.

14 July 2014. Sense about Science asked me to write a guest blog to explain more fully the meaning of "accuracy", as used in the paper and press release. It’s appeared on their site and will be reposted on this blog soon.

## Some pharmacological history: an exam from 1959

#### February 6th, 2014 · 3 Comments

Last year, I was sent my answer paper for one of my final exams, taken in 1959. This has triggered a bout of shamelessly autobiographical nostalgia.

 The answer sheets that I wrote had been kept by one of my teachers at Leeds, Dr George Mogey. After he died in 2003, aged 86, his widow, Audrey, found them and sent them to me. And after a hunt through the junk piled high in my office, I found the exam papers from that year too. George Mogey was an excellent teacher and a kind man. He gave most of the lectures to medical students, which we, as pharmacy/pharmacology students attended. His lectures were inspirational. Photo from his daughter, Nora Mogey

 Today, 56 years on, I can still recall vividly his lecture on anti-malarial drugs. At the end he paused dramatically and said "Since I started speaking, 100 people have died from malaria" (I don’t recall the exact number). He was the perfect antidote to people who say you learn nothing from lectures. Straight after the war (when he had seen the problem of malaria at first hand) he went to work at the Wellcome Research Labs in Beckenham, Kent. The first head of the Wellcome Lab was Henry Dale. It had a distinguished record of basic research as well as playing a crucial role in vaccine production and in development of the safe use of digitalis. In the 1930s it had an important role in the development of proper methods for biological standardisation. This was crucial for ensuring that, for example, each batch of tincture ot digitalis had the same potency (it has been described previously on this blog in Plants as Medicines.

 When George Mogey joined the Wellcome lab, its head was J.W. Trevan (1887 – 1956) (read his Biographical Memoir, written by J.H. Gaddum). Trevan’s most memorable contributions were in improving the statistics of biological assays. The ideas of individual effective dose and median effective dose were developed by him. His 1927 paper The Error of Determination of Toxicity is a classic of pharmacology. His advocacy of the well-defined quantity, median effective dose as a replacement for the ill-defined minimum effective dose was influential in the development of proper statistical analysis of biological assays in the 1930s.

Trevan is something of hero to me. And he was said to be very forgetful. Gaddum, in his biographical memoir, recounts this story

"One day when he had lost something and suspected that it had been tidied away by his secretary, he went round muttering ‘It’s all due to this confounded tidiness. It always leads to trouble. I won’t have it in my lab.’ "

 Trevan coined the abbreviation LD50 for the median lethal dose of a drug. George Mogey later acquired the car number plate LD50, in honour of Trevan, and his widow, Audrey, still has it (picture on right).

Mogey wrote several papers with Trevan. In 1948 he presented one at a meeting of the Physiological Society. The programme included also A.V. Hill. E.J Denton, Bernhard [sic] Katz, J.Z. Young and Richard Keynes (Keynes was George Henry Lewes Student at Cambridge: Lewes was the Victorian polymath with whom the novelist George Eliot lived, openly unmarried, and a founder of the Physiological Society. He probably inspired the medical content of Eliot’s best known novel, Middlemarch).

 Mogey may not have written many papers, but he was the sort of inspiring teacher that universities need. He had a letter in Nature on Constituents of Amanita Muscaria, the fly agaric toadstool, which appeared in 1965. That might explain why we went on a toadstool-hunting field trip. Amanita muscaria DC picture, 2005

The tradition of interest in statistics and biological assay must have rubbed off on me, because the answers I gave in the exam were very much in that tradition. Here is a snippet (click to download the whole answer sheet).

A later answer was about probit analysis, an idea introduced by statistician Chester Bliss (1899–1979) in 1934, as an direct extension of Trevan’s work. (I met Bliss in 1970 or 1971 when I was in Yale -we had dinner, went to a theatre -then back to his apartment where he insisted on showing me his collection of erotic magazines!)

This paper was a pharmacology paper in my first final exam at the end of my third year. The external examiner was Walter Perry, head of pharmacology in Edinburgh (he went on to found the Open University). He had previously been head of Biological Standards at the National Institute for Medical Research, a job in which he had to know some statistics. In the oral exam he asked me a killer question "What is the difference between confidence limits and fiducial limits?". I had no real idea (and, as I discovered later, neither did he). After that, I went on to do the 4th year where we specialised in pharmacology, and I spent quite a lot of time trying to answer that question. The result was my first ever paper, published in the University of Leeds Medical Journal. I hinted, obliquely, that the idea of fiducial inference was probably Ronald Fisher‘s only real mistake. I think that is the general view now, but Fisher was such a towering figure in statistics that nobody said that straight out (he was still alive when this was written -he died in 1962).

It is well-worth looking at a paper that Fisher gave to the Royal Statistical Society in 1935, The Logic of Inductive Inference. Then, as now, it was the custom for a paper to be followed by a vote of thanks, and a seconder. These, and the subsequent discussion, are all printed, and they could be quite vicious in a polite way. Giving the vote of thanks, Professor A.L. Bowley said

"It is not the custom, when the Council invites a member to propose a vote of thanks on a paper, to instruct him to bless it. If to some extent I play the inverse role of Balaam, it is not without precedent;"

And the seconder, Dr Isserlis, said

"There is no doubt in my mind at all about that, but Professor Fisher, like other fond parents, may perhaps see in his offspring qualities which to his mind no other children possess; others, however, may consider that the offspring are not unique."

Post-publication peer review was already alive and well in 1935.

I was helped enormously in writing this paper by Dr B.L.Welch (1911 – 1989), whose first year course in statistics for biologists was a compulsory part of the course. Welch was famous particularly for having extended Student’s t distribution to the case where the variances in two samples being compared are unequal (Welch, 1947). He gave his whole lecture with his back to the class while writing what he said on a set of blackboards that occupied the whole side of the room. No doubt he would have failed any course about how to give a lecture. I found him riveting. He went slowly, and you could always check your notes because it was all there on the blackboards.

Walter Perry seemed to like my attempt to answer his question, despite the fact that it failed. After the 4th year final (a single 3 hour essay on drugs that affect protein synthesis) he offered me a PhD place in Edinburgh. He was one of my supervisors, though I never saw him except when he dropped into the lab for a cigarette between committee meetings. While in Edinburgh I met the famous statistician. David Finney, whose definitive book on the Statistics of Biological Assay was an enormous help when I later wrote Lectures on Biostatistics and a great help in getting my first job at UCL in 1964. Heinz Otto Schild. then the famous head of department, had written a paper in 1942 about the statistical analysis of 2+2 dose biological assays, while interned at the beginning of the war. He wanted someone to teach it to students, so he gave me a job. That wouldn’t happen now, because that sort of statistics would be considered too difficult Incidentally, I notice that Schild uses 99% confidence limits in his paper, not the usual 95% limits which make your results look better

It was clear even then, that the basis of statistical inference was an exceedingly contentious matter among statisticians. It still is, but the matter has renewed importance in view of the crisis of reproducibility in science. The question still fascinates me, and I’m planning to update my first paper soon. This time I hope it will be a bit better.

Postscript: some old pictures

While in nostalgic mood, here are a few old pictures. First, the only picture I have from undergraduate days. It was taken on a visit to May and Baker (of sulphonamide fame) in February 1957 (so I must have been in my first year). There were 15 or so in the class for the first three years (now, you can get 15 in a tutorial group). I’m in the middle of the back row (with hair!). The only names that I recall are those of the other two who went into the 4th year with me, Ed Abbs (rightmost on back row) and Stella Gregory (2nd from right, front row). Ed died young and Stella went to Australia. Just in front of me are James Dare (with bow tie) and Mr Nelson (who taught old fashioned pharmacognosy).

 James Dare taught pharmaceutics, but he also had a considerable interest in statistics and we did lots of calculations with electromechanical calculators -the best of them was a Monroe (here’s a picture of one with the case removed to show the amazingly intricate mechanism). > Monroe 8N-213 from http://www.science.uva.nl/museum/calclist.php

The history of UCL’s pharmacology goes back to 1905. For most of that time, it’s been a pretty good department. It got top scores in all the research assessments until it was abolished by Malcolm Grant in 2007. That act of vandalism is documented in my diary section.

For most of its history, there was one professor who was head of the department. That tradition ended in 1983,when Humphrey Rang left for Novartis. The established chair was then empty for two years, until Donald Jenkinson, then head of department, insisted with characteristic modesty, that I rather than he should take the chair. Some time during the subsequent reign of David Brown, it was decided to name the chairs, and mine became the A.J. Clark chair. It was decided that the headship of the department would rotate, between Donald, David Brown and me. But when it came to my turn, I decided I was much too interested in single ion channels to spend time pushing paper, and David Brown nobly extended his term. The A.J. Clark chair was vacant after I ‘retired’ in 2004, but in 2014, Lucia Sivilotti was appointed to the chair, a worthy successor in its quantitative tradition.

The first group picture of UCL’s Pharmacology department was from 1972. Heinz Schild is in the middle of the front row, with Desmond Laurence on his left. Between them they dominated the textbook market: Schild edited A.J. Clark’s Pharmacology (now known as Rang and Dale). Laurence wrote a very successful text, Clinical Pharmacology. Click on the picture for a bigger version, with names, as recalled by Donald Jenkinson: (DHJ). I doubt whether many people now remember Ada Corbett (the tea lady) or Frank Ballhatchet from the mechanical workshop. He could do superb work, though the price was to spent 10 minutes chatting about his Land Rover, or listening to reminiscences of his time working on Thames barges. I still have a beautiful 8-way tap that he made. with a jerk-free indexing mechanism.

The second Departmental picture was taken in June 1980. Humphrey Rang was head of department then. My colleagues David Ogden and Steven Siegelbaum are there. In those days we had a tea lady too, Joyce Mancini. (Click pictures to enlarge)

## La Trobe University (Melbourne) takes money to promote quackery

This post is the original version of a post by Michael Vagg. It was posted at the Conversation but taken down within hours, on legal advice. Sadly, the Conversation has a track record for pusillanimous behaviour of this sort. It took minutes before the cached version reappeared on freezepage.com. I’m reposting it from there in the interests of free speech. La Trobe "university" should be ashamed that it’s prostituted itself for the sake of 15 m. La Trobe’s deputy vice-chancellor, Keith Nugent, gives a make-believe response to the resignation of Ken Harvey in a video. It is, in my opinion, truly pathetic. Update, The next day, the article was reposted at the Conversation. The changes they’d made can be seen in a compare document. The biggest change was removal of "has just decided to join the ranks of the spivs and hucksters of the vitamin industry". This seems to me to be perfectly fair comment. It should not have been censored by the Conversation. The recent memorandum of understanding signed between supplement company Swisse and La Trobe University to establish a Complementary Medicine Evidence Centre (CMEC) looks to me like the latest effort by a corporation to cloak their business interests in a veil of science. Unlike the UTS Sydney Australian Research Centre in Complementary and Integrative Medicine (ARCCIM), which at least has significant NHMRC funding, the La Trobe version will undertake “independent research” into complementary and alternative medicine (CAM) products that are made by the major (and so far only) donor to the Centre. Southern Cross University also has a very close relationship with the Blackmores brand of CAM products, due to the personal interest of Marcus Blackmore, the company Chairman. Blackmores claims to spend a lazy couple of million a year on their branded research centre. The Blackmores Research Centre studies Blackmores products. Presumably this situation (so similar to the proposed La Trobe model) is a coincidence since the research centre is providing completely “independent” research. The conflict of interest in such research centres is so laughably obvious that A/Prof Ken Harvey, a leading campaigner against shonky health products, a life member of Choice andThe Conversation contributor, has resigned his appointment at La Trobe in protest. Ken clearly points out in his letter of resignation that by accepting the money from Swisse, he believes La Trobe has unacceptably compromised its integrity. His letter cites multiple instances of non-compliance with TGA regulations by Swisse, as well as their disrespect for the regulatory process that governs corporate truth-telling in their industry.This story from last year gives a bit of background to the quixotic battle Harvey has fought against the massive coffers and unscrupulous business practices of Big Supplement. He has been more effective than the TGA itself at hindering the rampant gaming of the TGA Register of Therapeutic Goods by supplement and vitamin manufacturers. Clearly as a man of principle, he could not be expected to continue his association with a university that has a close relationship to a company with such a history of regulatory infringements. The untenability of Ken’s position is underlined by the fact that La Trobe itself republished on their website one of his TC articles about Swisse’s regulatory tapdancing only the previous year! Ken has been sued, traduced and generally railed against by a multi-billion dollar industry for the hideous crime of insisting that they tell the truth about their products and not mislead consumers. We need another hundred like him. That his own university has decided to take the money on offer from Swisse must be a bitter blow to him. It would be interesting to know whether any other universities were approached by Swisse in a similar way and had the courage to decline the offer. The infiltration of academia by privately funded CAM institutes is old news in the United States. The Science Based Medicine blog has christened the phenomenon “quackademic medicine” and written about it at some length. It seems the Australian CAM industry has no need to hide behind astroturfing organisations like the American group the Bravewell Collaborative to get its agenda attended to. Companies like Blackmores and Swisse can seemingly just offer to fund research institutes and cash-strapped tertiary institutions can’t resist. Friends of Science in Medicine and others have had a bit to say about the irresponsibility of educational institutions lending credibility to pseudoscience and how this practice damages universities’ standing as exemplars of scholarship and intellectual leaders within their communities. I can say without qualification that none of the much-maligned Big Pharma companies have their own fully-funded research centres at any university. Let alone a branded one where the studies are restricted to a single company’s products. It would be utterly unacceptable for the integrity of any university for such an outrageously conflicted institution to be given any support. What would it be like if GSK or Pfizer founded a research institute at a university and forced the researchers to only study their own products? Imagine the outrage. Imagine what a laughing stock such a research centre would be. That’s medical research in clown shoes. That’s academic credibility in a cheap suit trying to sell you steak knives. Vitamin and supplement companies will always be profitable because their sales pitch is based on psychological flaws that everyone has. Just ask the gaming, alcohol and tobacco companies. All of them are massively profitable. Sometimes their cash can even do good, but there’s always an angle by which they profit. Look at these guys up close, and the warts appear. All of them seek to improve their image by splashing money on hanging around with the glamorous, the successful, the smart and the credible. They hope that the magic dust of celebrity and academia will disguise the stench of the swamp they crawled out from. La Trobe Uni has just decided to join the ranks of the spivs and hucksters of the vitamin industry, and they will now have to live with having a research centre with the academic and professional credibility of the Ponds Institute. Sadly for La Trobe, they won’t have Ken Harvey to keep things reality-based. ### Follow-up 8 February 2014. Deputy vice-chancellor, Keith Nugent, tried to defend the university’s decision to take money from the "spivs and hucksters of the vitamin industry" in The Age. I sent the following letter to The Age. Let’s hope they publish it.  Keith Nugent, deputy vice-chancellor of La Trobe University, has offered a defence of the university’s decision to take a large amount of money from vitamin and herb company, Swisse. He justifies this by saying that we need to know whether or not the products work. Nugent seems to be unaware that we already know. There have been many good double-blind randomized trials and they have just about all shown that dosing yourself with vitamins and minerals does most people no good at all. Some have shown that high doses actually harm you. Perhaps the university should have checked what’s already known before taking the money. Perhaps Nugent is also unaware that trials with industry sponsorship tend to come out favourable to the companies’ product. For that reason, the results are treated with scepticism by the scientific community. If the research is worth doing, then it will be funded from the normal sources. There should be no need to take money from a company with a very strong financial interest in the outcome. D. Colquhoun FRS Professor of Pharmcology University College London ## Why you should ignore altmetrics and other bibliometric nightmares #### January 16th, 2014 · 15 Comments Jump to follow-up This discussion seemed to be of sufficient general interest that we submitted is as a feature to eLife, because this journal is one of the best steps into the future of scientific publishing. Sadly the features editor thought that " too much of the article is taken up with detailed criticisms of research papers from NEJM and Science that appeared in the altmetrics top 100 for 2013; while many of these criticisms seems valid, the Features section of eLife is not the venue where they should be published". That’s pretty typical of what most journals would say. It is that sort of attitude that stifles criticism, and that is part of the problem. We should be encouraging post-publication peer review, not suppressing it. Luckily, thanks to the web, we are now much less constrained by journal editors than we used to be. Here it is. ### Scientists don’t count: why you should ignore altmetrics and other bibliometric nightmares David Colquhoun1 and Andrew Plested2 1 University College London, Gower Street, London WC1E 6BT 2 Leibniz-Institut für Molekulare Pharmakologie (FMP) & Cluster of Excellence NeuroCure, Charité Universitätsmedizin,Timoféeff-Ressowsky-Haus, Robert-Rössle-Str. 10, 13125 Berlin Germany. Jeffrey Beall is librarian at Auraria Library, University of Colorado Denver. Although not a scientist himself, he, more than anyone, has done science a great service by listing the predatory journals that have sprung up in the wake of pressure for open access. In August 2012 he published “Article-Level Metrics: An Ill-Conceived and Meretricious Idea. At first reading that criticism seemed a bit strong. On mature consideration, it understates the potential that bibliometrics, altmetrics especially, have to undermine both science and scientists. Altmetrics is the latest buzzword in the vocabulary of bibliometricians. It attempts to measure the “impact” of a piece of research by counting the number of times that it’s mentioned in tweets, Facebook pages, blogs, YouTube and news media. That sounds childish, and it is. Twitter is an excellent tool for journalism. It’s good for debunking bad science, and for spreading links, but too brief for serious discussions. It’s rarely useful for real science. Surveys suggest that the great majority of scientists do not use twitter (7 — 13%). Scientific works get tweeted about mostly because they have titles that contain buzzwords, not because they represent great science. What and who is Altmetrics for? The aims of altmetrics are ambiguous to the point of dishonesty; they depend on whether the salesperson is talking to a scientist or to a potential buyer of their wares. At a meeting in London , an employee of altmetric.com said “we measure online attention surrounding journal articles” “we are not measuring quality …” “this whole altmetrics data service was born as a service for publishers”, “it doesn’t matter if you got 1000 tweets . . .all you need is one blog post that indicates that someone got some value from that paper”. These ideas sound fairly harmless, but in stark contrast, Jason Priem (an author of the altmetrics manifesto) said one advantage of altmetrics is that it’s fast “Speed: months or weeks, not years: faster evaluations for tenure/hiring”. Although conceivably useful for disseminating preliminary results, such speed isn’t important for serious science (the kind that ought to be considered for tenure) which operates on the timescale of years. Priem also says “researchers must ask if altmetrics really reflect impact” . Even he doesn’t know, yet altmetrics services are being sold to universities, before any evaluation of their usefulness has been done, and universities are buying them. The idea that altmetrics scores could be used for hiring is nothing short of terrifying. The problem with bibliometrics The mistake made by all bibliometricians is that they fail to consider the content of papers, because they have no desire to understand research. Bibliometrics are for people who aren’t prepared to take the time (or lack the mental capacity) to evaluate research by reading about it, or in the case of software or databases, by using them. The use of surrogate outcomes in clinical trials is rightly condemned. Bibliometrics are all about surrogate outcomes. If instead we consider the work described in particular papers that most people agree to be important (or that everyone agrees to be bad), it’s immediately obvious that no publication metrics can measure quality. There are some examples in How to get good science (Colquhoun, 2007). It is shown there that at least one Nobel prize winner failed dismally to fulfil arbitrary biblometric productivity criteria of the sort imposed in some universities (another example is in Is Queen Mary University of London trying to commit scientific suicide?). Schekman (2013) has said that science “is disfigured by inappropriate incentives. The prevailing structures of personal reputation and career advancement mean the biggest rewards often follow the flashiest work, not the best.” Bibliometrics reinforce those inappropriate incentives. A few examples will show that altmetrics are one of the silliest metrics so far proposed. The altmetrics top 100 for 2103 The superficiality of altmetrics is demonstrated beautifully by the list of the 100 papers with the highest altmetric scores in 2013 For a start, 58 of the 100 were behind paywalls, and so unlikely to have been read except (perhaps) by academics. The second most popular paper (with the enormous altmetric score of 2230) was published in the New England Journal of Medicine. The title was Primary Prevention of Cardiovascular Disease with a Mediterranean Diet. It was promoted (inaccurately) by the journal with the following tweet: Many of the 2092 tweets related to this article simply gave the title, but inevitably the theme appealed to diet faddists, with plenty of tweets like the following: The interpretations of the paper promoted by these tweets were mostly desperately inaccurate. Diet studies are anyway notoriously unreliable. As John Ioannidis has said "Almost every single nutrient imaginable has peer reviewed publications associating it with almost any outcome." This sad situation comes about partly because most of the data comes from non-randomised cohort studies that tell you nothing about causality, and also because the effects of diet on health seem to be quite small. The study in question was a randomized controlled trial, so it should be free of the problems of cohort studies. But very few tweeters showed any sign of having read the paper. When you read it you find that the story isn’t so simple. Many of the problems are pointed out in the online comments that follow the paper. Post-publication peer review really can work, but you have to read the paper. The conclusions are pretty conclusively demolished in the comments, such as: “I’m surrounded by olive groves here in Australia and love the hand-pressed EVOO [extra virgin olive oil], which I can buy at a local produce market BUT this study shows that I won’t live a minute longer, and it won’t prevent a heart attack.” We found no tweets that mentioned the finding from the paper that the diets had no detectable effect on myocardial infarction, death from cardiovascular causes, or death from any cause. The only difference was in the number of people who had strokes, and that showed a very unimpressive P = 0.04. Neither did we see any tweets that mentioned the truly impressive list of conflicts of interest of the authors, which ran to an astonishing 419 words. “Dr. Estruch reports serving on the board of and receiving lecture fees from the Research Foundation on Wine and Nutrition (FIVIN); serving on the boards of the Beer and Health Foundation and the European Foundation for Alcohol Research (ERAB); receiving lecture fees from Cerveceros de España and Sanofi-Aventis; and receiving grant support through his institution from Novartis. Dr. Ros reports serving on the board of and receiving travel support, as well as grant support through his institution, from the California Walnut Commission; serving on the board of the Flora Foundation (Unilever). . . “ And so on, for another 328 words. The interesting question is how such a paper came to be published in the hugely prestigious New England Journal of Medicine. That it happened is yet another reason to distrust impact factors. It seems to be another sign that glamour journals are more concerned with trendiness than quality. One sign of that is the fact that the journal’s own tweet misrepresented the work. The irresponsible spin in this initial tweet from the journal started the ball rolling, and after this point, the content of the paper itself became irrelevant. The altmetrics score is utterly disconnected from the science reported in the paper: it more closely reflects wishful thinking and confirmation bias. The fourth paper in the altmetrics top 100 is an equally instructive example.  This work was also published in a glamour journal, Science. The paper claimed that a function of sleep was to “clear metabolic waste from the brain”. It was initially promoted (inaccurately) on Twitter by the publisher of Science. After that, the paper was retweeted many times, presumably because everybody sleeps, and perhaps because the title hinted at the trendy, but fraudulent, idea of “detox”. Many tweets were variants of “The garbage truck that clears metabolic waste from the brain works best when you’re asleep”. But this paper was hidden behind Science’s paywall. It’s bordering on irresponsible for journals to promote on social media papers that can’t be read freely. It’s unlikely that anyone outside academia had read it, and therefore few of the tweeters had any idea of the actual content, or the way the research was done. Nevertheless it got “1,479 tweets from 1,355 accounts with an upper bound of 1,110,974 combined followers”. It had the huge Altmetrics score of 1848, the highest altmetric score in October 2013. Within a couple of days, the story fell out of the news cycle. It was not a bad paper, but neither was it a huge breakthrough. It didn’t show that naturally-produced metabolites were cleared more quickly, just that injected substances were cleared faster when the mice were asleep or anaesthetised. This finding might or might not have physiological consequences for mice. Worse, the paper also claimed that “Administration of adrenergic antagonists induced an increase in CSF tracer influx, resulting in rates of CSF tracer influx that were more comparable with influx observed during sleep or anesthesia than in the awake state”. Simply put, giving the sleeping mice a drug could reduce the clearance to wakeful levels. But nobody seemed to notice the absurd concentrations of antagonists that were used in these experiments: “adrenergic receptor antagonists (prazosin, atipamezole, and propranolol, each 2 mM) were then slowly infused via the cisterna magna cannula for 15 min”. Use of such high concentrations is asking for non-specific effects. The binding constant (concentration to occupy half the receptors) for prazosin is less than 1 nM, so infusing 2 mM is working at a million times greater than the concentration that should be effective. That’s asking for non-specific effects. Most drugs at this sort of concentration have local anaesthetic effects, so perhaps it isn’t surprising that the effects resembled those of ketamine. The altmetrics editor hadn’t noticed the problems and none of them featured in the online buzz. That’s partly because to find it out you had to read the paper (the antagonist concentrations were hidden in the legend of Figure 4), and partly because you needed to know the binding constant for prazosin to see this warning sign. The lesson, as usual, is that if you want to know about the quality of a paper, you have to read it. Commenting on a paper without knowing anything of its content is liable to make you look like an jackass. A tale of two papers Another approach that looks at individual papers is to compare some of one’s own papers. Sadly, UCL shows altmetric scores on each of your own papers. Mostly they are question marks, because nothing published before 2011 is scored. But two recent papers make an interesting contrast. One is from DC’s side interest in quackery, one was real science. The former has an altmetric score of 169, the latter has an altmetric score of 2.  The first paper was “Acupuncture is a theatrical placebo”, which was published as an invited editorial in Anesthesia and Analgesia [download pdf]. The paper was scientifically trivial. It took perhaps a week to write. Nevertheless, it got promoted it on twitter, because anything to do with alternative medicine is interesting to the public. It got quite a lot of retweets. And the resulting altmetric score of 169 put it in the top 1% of all articles altmetric have tracked, and the second highest ever for Anesthesia and Analgesia. As well as the journal’s own website, the article was also posted on the DCScience.net blog (May 30, 2013) where it soon became the most viewed page ever (24,468 views as of 23 November 2013), something that altmetrics does not seem to take into account. Compare this with the fate of some real, but rather technical, science.  My [DC] best scientific papers are too old (i.e. before 2011) to have an altmetrics score, but my best score for any scientific paper is 2. This score was for Colquhoun & Lape (2012) “Allosteric coupling in ligand-gated ion channels”. It was a commentary with some original material. The altmetric score was based on two tweets and 15 readers on Mendeley. The two tweets consisted of one from me (“Real science; The meaning of allosteric conformation changes http://t.co/zZeNtLdU ”). The only other tweet as abusive one from a cyberstalker who was upset at having been refused a job years ago. Incredibly, this modest achievement got it rated “Good compared to other articles of the same age (71st percentile)”. Conclusions about bibliometrics Bibliometricians spend much time correlating one surrogate outcome with another, from which they learn little. What they don’t do is take the time to examine individual papers. Doing that makes it obvious that most metrics, and especially altmetrics, are indeed an ill-conceived and meretricious idea. Universities should know better than to subscribe to them. Although altmetrics may be the silliest bibliometric idea yet, much this criticism applies equally to all such metrics. Even the most plausible metric, counting citations, is easily shown to be nonsense by simply considering individual papers. All you have to do is choose some papers that are universally agreed to be good, and some that are bad, and see how metrics fail to distinguish between them. This is something that bibliometricians fail to do (perhaps because they don’t know enough science to tell which is which). Some examples are given by Colquhoun (2007) (more complete version at dcscience.net). Eugene Garfield, who started the metrics mania with the journal impact factor (JIF), was clear that it was not suitable as a measure of the worth of individuals. He has been ignored and the JIF has come to dominate the lives of researchers, despite decades of evidence of the harm it does (e.g.Seglen (1997) and Colquhoun (2003) ) In the wake of JIF, young, bright people have been encouraged to develop yet more spurious metrics (of which ‘altmetrics’ is the latest). It doesn’t matter much whether these metrics are based on nonsense (like counting hashtags) or rely on counting links or comments on a journal website. They won’t (and can’t) indicate what is important about a piece of research- its quality. People say – I can’t be a polymath. Well, then don’t try to be. You don’t have to have an opinion on things that you don’t understand. The number of people who really do have to have an overview, of the kind that altmetrics might purport to give, those who have to make funding decisions about work that they are not intimately familiar with, is quite small. Chances are, you are not one of them. We review plenty of papers and grants. But it’s not credible to accept assignments outside of your field, and then rely on metrics to assess the quality of the scientific work or the proposal. It’s perfectly reasonable to give credit for all forms of research outputs, not only papers. That doesn’t need metrics. It’s nonsense to suggest that altmetrics are needed because research outputs are not already valued in grant and job applications. If you write a grant for almost any agency, you can put your CV. If you have a non-publication based output, you can always include it. Metrics are not needed. If you write software, get the numbers of downloads. Software normally garners citations anyway if it’s of any use to the greater community. When AP recently wrote a criticism of Heather Piwowar’s altmetrics note in Nature, one correspondent wrote: "I haven’t read the piece [by HP] but I’m sure you are mischaracterising it". This attitude summarizes the too-long-didn’t-read (TLDR) culture that is increasingly becoming accepted amongst scientists, and which the comparisons above show is a central component of altmetrics. Altmetrics are numbers generated by people who don’t understand research, for people who don’t understand research. People who read papers and understand research just don’t need them and should shun them. But all bibliometrics give cause for concern, beyond their lack of utility. They do active harm to science. They encourage “gaming” (a euphemism for cheating). They encourage short-term eye-catching research of questionable quality and reproducibility. They encourage guest authorships: that is, they encourage people to claim credit for work which isn’t theirs. At worst, they encourage fraud. No doubt metrics have played some part in the crisis of irreproducibility that has engulfed some fields, particularly experimental psychology, genomics and cancer research. Underpowered studies with a high false-positive rate may get you promoted, but tend to mislead both other scientists and the public (who in general pay for the work). The waste of public money that must result from following up badly done work that can’t be reproduced but that was published for the sake of “getting something out” has not been quantified, but must be considered to the detriment of bibliometrics, and sadly overcomes any advantages from rapid dissemination. Yet universities continue to pay publishers to provide these measures, which do nothing but harm. And the general public has noticed. It’s now eight years since the New York Times brought to the attention of the public that some scientists engage in puffery, cheating and even fraud. Overblown press releases written by journals, with connivance of university PR wonks and with the connivance of the authors, sometimes go viral on social media (and so score well on altmetrics). Yet another example, from Journal of the American Medical Association involved an overblown press release from the Journal about a trial that allegedly showed a benefit of high doses of Vitamin E for Alzheimer’s disease. This sort of puffery harms patients and harms science itself. We can’t go on like this. What should be done? Post publication peer review is now happening, in comments on published papers and through sites like PubPeer, where it is already clear that anonymous peer review can work really well. New journals like eLife have open comments after each paper, though authors do not seem to have yet got into the habit of using them constructively. They will. It’s very obvious that too many papers are being published, and that anything, however bad, can be published in a journal that claims to be peer reviewed . To a large extent this is just another example of the harm done to science by metrics –the publish or perish culture. Attempts to regulate science by setting “productivity targets” is doomed to do as much harm to science as it has in the National Health Service in the UK. This has been known to economists for a long time, under the name of Goodhart’s law. Here are some ideas about how we could restore the confidence of both scientists and of the public in the integrity of published work. • Nature, Science, and other vanity journals should become news magazines only. Their glamour value distorts science and encourages dishonesty. • Print journals are overpriced and outdated. They are no longer needed. Publishing on the web is cheap, and it allows open access and post-publication peer review. Every paper should be followed by an open comments section, with anonymity allowed. The old publishers should go the same way as the handloom weavers. Their time has passed. • Web publication allows proper explanation of methods, without the page, word and figure limits that distort papers in vanity journals. This would also make it very easy to publish negative work, thus reducing publication bias, a major problem (not least for clinical trials) • Publish or perish has proved counterproductive. It seems just as likely that better science will result without any performance management at all. All that’s needed is peer review of grant applications. • Providing more small grants rather than fewer big ones should help to reduce the pressure to publish which distorts the literature. The ‘celebrity scientist’, running a huge group funded by giant grants has not worked well. It’s led to poor mentoring, and, at worst, fraud. Of course huge groups sometimes produce good work, but too often at the price of exploitation of junior scientists • There is a good case for limiting the number of original papers that an individual can publish per year, and/or total funding. Fewer but more complete and considered papers would benefit everyone, and counteract the flood of literature that has led to superficiality. • Everyone should read, learn and inwardly digest Peter Lawrence’s The Mismeasurement of Science. A focus on speed and brevity (cited as major advantages of altmetrics) will help no-one in the end. And a focus on creating and curating new metrics will simply skew science in yet another unsatisfactory way, and rob scientists of the time they need to do their real job: generate new knowledge. It has been said “Creation is sloppy; discovery is messy; exploration is dangerous. What’s a manager to do? The answer in general is to encourage curiosity and accept failure. Lots of failure.” And, one might add, forget metrics. All of them. ### Follow-up 17 Jan 2014 This piece was noticed by the Economist. Their ‘Writing worth reading‘ section said "Why you should ignore altmetrics (David Colquhoun) Altmetrics attempt to rank scientific papers by their popularity on social media. David Colquohoun [sic] argues that they are “for people who aren’t prepared to take the time (or lack the mental capacity) to evaluate research by reading about it.”" 20 January 2014. Jason Priem, of ImpactStory, has responded to this article on his own blog. In Altmetrics: A Bibliographic Nightmare? he seems to back off a lot from his earlier claim (cited above) that altmetrics are useful for making decisions about hiring or tenure. Our response is on his blog. 20 January 2014. Jason Priem, of ImpactStory, has responded to this article on his own blog, In Altmetrics: A bibliographic Nightmare? he seems to back off a lot from his earlier claim (cited above) that altmetrics are useful for making decisions about hiring or tenure. Our response is on his blog. 23 January 2014 The Scholarly Kitchen blog carried another paean to metrics, A vigorous discussion followed. The general line that I’ve followed in this discussion, and those mentioned below, is that bibliometricians won’t qualify as scientists until they test their methods, i.e. show that they predict something useful. In order to do that, they’ll have to consider individual papers (as we do above). At present, articles by bibliometricians consist largely of hubris, with little emphasis on the potential to cause corruption. They remind me of articles by homeopaths: their aim is to sell a product (sometimes for cash, but mainly to promote the authors’ usefulness). It’s noticeable that all of the pro-metrics articles cited here have been written by bibliometricians. None have been written by scientists. 28 January 2014. Dalmeet Singh Chawla,a bibliometrician from Imperial College London, wrote a blog on the topic. (Imperial, at least in its Medicine department, is notorious for abuse of metrics.) 29 January 2014 Arran Frood wrote a sensible article about the metrics row in Euroscientist. 2 February 2014 Paul Groth (a co-author of the Altmetrics Manifesto) posted more hubristic stuff about altmetrics on Slideshare. A vigorous discussion followed. 5 May 2014. Another vigorous discussion on ImpactStory blog, this time with Stacy Konkiel. She’s another non-scientist trying to tell scientists what to do. The evidence that she produced for the usefulness of altmetrics seemed pathetic to me. 7 May 2014 A much-shortened version of this post appeared in the British Medical Journal (BMJ blogs) ## Science is harmed by hype. How to live for 969 years. #### December 31st, 2013 · 4 Comments Jump to follow-up [This an update of a 2006 post on my old blog] The New York Times (17 January 2006) published a beautiful spoof that illustrates only too clearly some of the bad practices that have developed in real science (as well as in quackery). It shows that competition, when taken to excess, leads to dishonesty. More to the point, it shows that the public is well aware of the dishonesty that has resulted from the publish or perish culture, which has been inflicted on science by numbskull senior administrators (many of them scientists, or at least ex-scientists). Part of the blame must attach to "bibliometricians" who have armed administrators with simple-minded tools the usefulness is entirely unverified. Bibliometricians are truly the quacks of academia. They care little about evidence as long as they can sell the product. The spoof also illustrates the folly of allowing the hegemony of a handful of glamour journals to hold scientists in thrall. This self-inflicted wound adds to the pressure to produce trendy novelties rather than solid long term work. It also shows the only-too-frequent failure of peer review to detect problems. The future lies on publication on the web, with post-publication peer review. It has been shown by sites like PubPeer that anonymous post-publication review can work very well indeed. This would be far cheaper, and a good deal better than the present extortion practised on universities by publishers. All it needs is for a few more eminent people like mathematician Tim Gowers to speak out (see Elsevier – my part in its downfall). Recent Nobel-prizewinner Randy Schekman has helped with his recent declaration that "his lab will no longer send papers to Nature, Cell and Science as they distort scientific process" The spoof is based on the fraudulent papers by Korean cloner, Woo Suk Hwang, which were published in Science, in 2005. As well as the original fraud, this sad episode exposed the practice of ‘guest authorship’, putting your name on a paper when you have done little or no work, and cannot vouch for the results. The last (‘senior’) author on the 2005 paper, was Gerald Schatten, Director of the Pittsburgh Development Center. It turns out that Schatten had not seen any of the original data and had contributed very little to the paper, beyond lobbying Scienceto accept it. A University of Pittsburgh panel declared Schatten guilty of “research misbehavior”, though he was, amazingly, exonerated of “research misconduct”. He still has his job. Click here for an interesting commentary. The New York Times carried a mock editorial to introduce the spoof..  One Last Question: Who Did the Work? By NICHOLAS WADE In the wake of the two fraudulent articles on embryonic stem cells published in Science by the South Korean researcher Hwang Woo Suk, Donald Kennedy, the journal’s editor, said last week that he would consider adding new requirements that authors “detail their specific contributions to the research submitted,” and sign statements that they agree with the conclusions of their article. A statement of authors’ contributions has long been championed by Drummond Rennie, deputy editor of The Journal of the American Medical Association, and is already required by that and other medical journals. But as innocuous as Science‘s proposed procedures may seem, they could seriously subvert some traditional scientific practices, such as honorary authorship. Explicit statements about the conclusions could bring to light many reservations that individual authors would not otherwise think worth mentioning. The article shown [below] from a future issue of the Journal of imaginary Genomics, annotated in the manner required by Science‘s proposed reforms, has been released ahead of its embargo date. The old-fashioned typography makes it obvious that the spoof is intended to mock a paper in Science. The problem with this spoof is its only too accurate description of what can happen at the worst end of science. Something must be done if we are to justify the money we get and and we are to retain the confidence of the public My suggestions are as follows • Nature Science and Cell should become news magazines only. Their glamour value distorts science and encourages dishonesty • All print journals are outdated. We need cheap publishing on the web, with open access and post-publication peer review. The old publishers would go the same way as the handloom weavers. Their time has past. • Publish or perish has proved counterproductive. You’d get better science if you didn’t have any performance management at all. All that’s needed is peer review of grant applications. • It’s better to have many small grants than fewer big ones. The ‘celebrity scientist’, running a huge group funded by many grants has not worked well. It’s led to poor mentoring and exploitation of junior scientists. • There is a good case for limiting the number of original papers that an individual can publish per year, and/or total grant funding. Fewer but more complete papers would benefit everyone. • Everyone should read, learn and inwardly digest Peter Lawrence’s The Mismeasurement of Science. ### Follow-up 3 January 2014. Yet another good example of hype was in the news. “Effect of Vitamin E and Memantine on Functional Decline in Alzheimer Disease“. It was published in the Journal of the American Medical Association. The study hit the newspapers on January 1st with headlines like Vitamin E may slow Alzheimer’s Disease (see the excellent analyis by Gary Schwitzer). The supplement industry was ecstatic. But the paper was behind a paywall. It’s unlikely that many of the tweeters (or journalists) had actually read it. The trial was a well-designed randomised controlled trial that compared four treatments: placebo, vitamin E, memantine and Vitamin E + memantine. Reading the paper gives a rather different impression from the press release. Look at the pre-specified primary outcome of the trial. The primary outcome measure was " . . the Alzheimer’s Disease Cooperative Study/Activities of Daily Living (ADCSADL) Inventory.12 The ADCS-ADL Inventory is designed to assess functional abilities to perform activities of daily living in Alzheimer patients with a broad range of dementia severity. The total score ranges from 0 to 78 with lower scores indicating worse function." It looks as though any difference that might exist between the four treaments is trivial in size. In fact the mean difference between Vitamin E and placebos was only 3.15 (on a 78 point scale) with 95% confidence limits from 0.9 to 5.4. This gave a modest P = 0.03 (when properly corrected for multiple comparisons), a result that will impress only those people who regard P = 0.05 as a sort of magic number. Since the mean effect is so trivial in size that it doesn’t really matter if the effect is real anyway. It is not mentioned in the coverage that none of the four secondary outcomes achieved even a modest P = 0.05 There was no detectable effect of Vitamin E on • Mean annual rate of cognitive decline (Alzheimer Disease Assessment Scale–Cognitive Subscale) • Mean annual rate of cognitive decline (Mini-Mental State Examination) • Mean annual rate of increased symptoms • Mean annual rate of increased caregiver time, The only graph that appeared to show much effect was The Dependence Scale. This scale “assesses 6 levels of functional dependence. Time to event is the time to loss of 1 dependence level (increase in dependence). We used an interval-censored model assuming a Weibull distribution because the time of the event was known only at the end of a discrete interval of time (every 6 months).” It’s presented as a survival (Kaplan-Meier) plot. And it is this somewhat obscure secondary outcome that was used by the Journal of the American Medical Assocciation for its publicity. Note also that memantine + Vitamin E was indistinguishable from placebo. There are two ways to explain this: either Vitamin E has no effect, or memantine is an antagonist of Vitamin E. There are no data on the latter, but it’s certainly implausible. The trial used a high dose of Vitamin E (2000 IU/day). No toxic effects of Vitamin E were reported, though a 2005 meta-analysis concluded that doses greater than 400 IU/d "may increase all-cause mortality and should be avoided". In my opinion, the outcome of this trial should have been something like “Vitamin E has, at most, trivial effects on the progress of Alzheimer’s disease”. Both the journal and the authors are guilty of disgraceful hype. This continual raising of false hopes does nothing to help patients. But it does damage the reputation of the journal and of the authors.  This paper constitutes yet another failure of altmetrics. (see more examples on this blog). Not surprisingly, given the title, It was retweeted widely, but utterly uncritically. Bad science was promoted. And JAMA must take much of the blame for publishing it and promoting it. ## We know little about the effect of diet on health. That’s why so much is written about it #### November 18th, 2013 · 20 Comments Jump to follow-up One of my scientific heroes is Bernard Katz. The closing words of his inaugural lecture, as professor of biophysics at UCL, hang on the wall of my office as a salutory reminder to refrain from talking about ‘how the brain works’. After speaking about his discoveries about synaptic transmission, he ended thus.  "My time is up and very glad I am, because I have been leading myself right up to a domain on which I should not dare to trespass, not even in an Inaugural Lecture. This domain contains the awkward problems of mind and matter about which so much has been talked and so little can be said, and having told you of my pedestrian disposition, I hope you will give me leave to stop at this point and not to hazard any further guesses." Drawing ©Jenny Hersson-Ringskog The question of what to eat for good health is truly a topic about "which so much has been talked and so little can be said" That was emphasized yet again by an editorial in the Brirish Medical Journal written by my favourite epidemiologist. John Ioannidis. He has been at the forefront of debunking hype. Its title is “Implausible results in human nutrition research” (BMJ, 2013;347:f6698. Get pdf ). The gist is given by the memorable statement "Almost every single nutrient imaginable has peer reviewed publications associating it with almost any outcome." and the subtitle Definitive solutions won’t come from another million observational papers or small randomized trials“. Being a bit obsessive about causality, this paper is music to my ears. It vindicates my own views, as an amateur epidemiologist, on the results of the endless surveys of diet and health. There is nothing new about the problem. It’s been written about many times. Young & Karr (Significance, 8, 116 – 120, 2011: get pdf) said "Any claim coming from an observational study is most likely to be wrong". Out of 52 claims made in 12 observational studies, not a single one was confirmed when tested by randomised controlled trials. Another article cited by Ioannidis, "Myths, Presumptions, and Facts about Obesity" (Casazza et al , NEJM, 2013), debunks many myths, but the list of conflicts of interests declared by the authors is truly horrendous (and at least one of their conclusions has been challenged, albeit by people with funding from Kellogg’s). The frequent conflicts of interest in nutrition research make a bad situation even worse. The quotation in bold type continues thus. "On 25 October 2013, PubMed listed 291 papers with the keywords “coffee OR caffeine” and 741 with “soy,” many of which referred to associations. In this literature of epidemic proportions, how many results are correct? Many findings are entirely implausible. Relative risks that suggest we can halve the burden of cancer with just a couple of servings a day of a single nutrient still circulate widely in peer reviewed journals. However, on the basis of dozens of randomized trials, single nutrients are unlikely to have relative risks less than 0.90 for major clinical outcomes when extreme tertiles of population intake are compared—most are greater than 0.95. For overall mortality, relative risks are typically greater than 0.995, if not entirely null. The respective absolute risk differences would be trivial. Observational studies and even randomized trials of single nutrients seem hopeless, with rare exceptions. Even minimal confounding or other biases create noise that exceeds any genuine effect. Big datasets just confer spurious precision status to noise." And, later, "According to the latest burden of disease study, 26% of deaths and 14% of disability adjusted life years in the United States are attributed to dietary risk factors, even without counting the impact of obesity. No other risk factor comes anywhere close to diet in these calculations (not even tobacco and physical inactivity). I suspect this is yet another implausible result. It builds on risk estimates from the same data of largely implausible nutritional studies discussed above. Moreover, socioeconomic factors are not considered at all, although they may be at the root of health problems. Poor diet may partly be a correlate or one of several paths through which social factors operate on health." Another field that is notorious for producing false positives, wirh false attribution of causality, is the detection of biomarkers. A critical discussion can be found in the paper by Broadhurst & Kell (2006), "False discoveries in metabolomics and related experiments". "Since the early days of transcriptome analysis (Golub et al., 1999), many workers have looked to detect different gene expression in cancerous versus normal tissues. Partly because of the expense of transcriptomics (and the inherent noise in such data (Schena, 2000; Tu et al., 2002; Cui and Churchill, 2003; Liang and Kelemen, 2006)), the numbers of samples and their replicates is often small while the number of candidate genes is typically in the thousands. Given the above, there is clearly a great danger that most of these will not in practice withstand scrutiny on deeper analysis (despite the ease with which one can create beautiful heat maps and any number of ‘just-so’ stories to explain the biological relevance of anything that is found in preliminary studies!). This turns out to be the case, and we review a recent analysis (Ein-Dor et al., 2006) of a variety of such studies." The fields of metabolomics, proteomics and transcriptomics are plagued by statistical problems (as well as being saddled with ghastly pretentious names). ### What’s to be done? Barker Bausell, in his demolition of research on acupuncture, said: [Page39] “But why should nonscientists care one iota about something as esoteric as causal inference? I believe that the answer to this question is because the making of causal inferences is part of our job description as Homo Sapiens.” The problem, of course, is that humans are very good at attributing causality when it does not exist. That has led to confusion between correlation and cause on an industrial scale, not least in attempts to work out the effects of diet on health. More than in any other field it is hard to do the RCTs that could, in principle, sort out the problem. It’s hard to allocate people at random to different diets, and even harder to make people stick to those diets for the many years that are needed. We can probably say by now that no individual food carries a large risk, or affords very much protection. The fact that we are looking for quite small effects means that even when RCTs are possible huge samples will be needed to get clear answers. Most RCTs are too short, and too small (under-powered) and that leads to overestimation of the size of effects. That’s a problem that plagues experimental pyschology too, and has led to a much-discussed crisis in reproducibility. "Supplements" of one sort and another are ubiquitous in sports. Nobody knows whether they work, and the margin between winning and losing is so tiny that it’s very doubtful whether we ever will know. We can expect irresponsible claims to continue unabated. The best thing that can be done in the short term is to stop doing large observational studies altogether. It’s now clear that inferences made from them are likely to be wrong. And, sad to say, we need to view with great skepticism anything that is funded by the food industry. And make a start on large RCTs whenever that is possible. Perhaps the hardest goal of all is to end the "publish or perish" culture which does so much to prevent the sort of long term experiments which would give the information we want. Ioannidis’ article ends with the statement "I am co-investigator in a randomized trial of a low carbohydrate versus low fat diet that is funded by the US National Institutes of Health and the non-profit Nutrition Science Initiative." It seems he is putting his money where his mouth is. Until we have the results, we shall continue to be bombarded with conflicting claims made by people who are doing their best with flawed methods, as well as by those trying to sell fad diets. Don’t believe them. The famous "5-a-day" advice that we are constantly bombarded with does no harm, but it has no sound basis. As far as I can guess, the only sound advice about healthy eating for most people is • don’t eat too much • don’t eat all the same thing You can’t make much money out of that advice. No doubt that is why you don’t hear it very often. ### Follow-up Two relevant papers that show the unreliability of observational studies, "Nearly 80,000 observational studies were published in the decade 1990–2000 (Naik 2012). In the following decade, the number of studies grew to more than 260,000". Madigan et al. (2014) “. . . the majority of observational studies would declare statistical significance when no effect is present” Schuemie et al., (2012) 20 March 2014 On 20 March 2014, I gave a talk on this topic at the Cambridge Science Festival (more here). After the event my host, Yvonne Noblis, sent me some (doubtless cherry-picked) feedback she’d had about the talk. ## Yet another incompetent regulator. The General Pharmaceutical Council is criminally negligent #### November 4th, 2013 · 4 Comments Jump to follow-up The General Pharmaceutical Council (GPhC) has been the statutory body responsible for the regulation of pharmacy since 2010. It’s status is similar to that of the GMC and. heaven help us, the GCC. Before that the regulator was the same as the professional body, the Royal Pharmaceutical Society of Great Britain (RPS). The RPS proved to be as useless as most other regulators, as documented in detail in my 2008 post, At around the time it stopped being a regulator, the RPS started to condemn quackery more effectively, but by then it had lost the power to do much about it (I hope the latter wasn’t the cause of the former). The body that could do something, the GPhC has done essentially nothing. as described in this post. I did a 2 year apprenticeship in Timothy White’s and Taylor’s Homeopathic (yes, really) Chemists in the 1950s. My first degree was in pharmacy. I got my interest in pharmacology from reading Martindale’s Extra Pharmacopoeia in the shop. I soon decided that I didn’t really want to spend the rest of my life selling lipstick and Durex. The latter was quite a big seller because the Boots across the road didn’t sell contraceptives (they changed their minds in the 1960s). In those days, we spent quite a lot of time making up (almost entirely ineffective) ‘tonics’ and ‘cough mixtures’. Now the job consists largely of counting pills. This has exacerbated the ‘chip on the shoulder’ attitude that was present even in the 1950s. For a long time now, pharmacists have wanted to become the a ‘third tier’ in the NHS, alongside GP practices and hospitals., after hospitals and doctors". . Here are a few comments on this proposition. First let me say that I’ve met some very good and ethical pharmacists. I did a vacation job in a hospital pharmacy where the boss had an encyclopaedic knowledge of the effects and side effects of drugs, and of their dosage. His advice was often sought by doctors, and rightly so. He had no way of knowing at the time that his advice to replace barbiturates with thalidomide would lead to such a tragedy, because the evidence had been concealed by the manufacturer. Some of the problems alluded to here have already been highlighted by two excellent pharmacists, Anthony Cox and @SparkleWildfire, neither of whom work in pharmacists shops. They are absolutely spot on but they seem to be in a minority among pharmacists. The problems seem to lie mostly in retail shops. Their shelves are laden with ineffective pills and potions. And the pharmacist has every incentive to sell them. His/her income depends on it directly if it’s a privately owned pharmacy. And his/her standing with head office depends on it in chain store pharmacies. This conflict of financial interest is the prime reason why pharmacists are not qualified to form a third tier of healthcare. The avoidance of conflicts of interest among doctors was one of the great accomplishments of the NHS. In the USA there are huge scandals when, as happens repeatedly, doctors order expensive and unnecessary treatments from which they profit. It’s no consolation that such problems are creeping back in the UK as a result of the government’s vigorous efforts to sell it off. Here are few examples of things that have gone wrong, and who is to blame. Then I’ll consider what can be done. Ineffective medicines In any pharmacy you can see ineffective ‘tonics’ and ‘cough medicines’, unnecessary supplements with dishonest claims and even, heaven help us, the ultimate scam, homeopathic pills. What’s worse, if you ask a pharmacist for advice, it’s quite likely that they’ll recommend you to buy them.  I was amazed to discover that a number of old-fashioned ‘tonics’ and ‘cough medicines’ still have full marketing authorisation. That’s the fault of the Medicines and Healthcare Regulatory Auhority (MHRA) who are supposed to assess efficacy and totally failed to do so, Read about that in “Some medicines that don’t work. Why doesn’t the MHRA tell us honestly?” . It’s hard to blame a pharmacist for the bad advice given by the MHRA, but a good one would tell patients to save their money. Big corporate pharmacies Companies like Boots seem to have no interest whatsoever in ethical behaviour. All that matters is sales. They provide “(mis)educational” materials that promote nonsense They advertise ridiculous made-up claims in the newspapers, which get shot down regularly by the Advertising Standards Authority, but by that time the promotion is over so they don’t give a damn. See for example, CoQ10 scam and the ASA verdict on it. And "Lactium: more rubbish from Boots the Chemists. And a more serious problem". And "The Vitamin B scam. Don’t trust Boots" Recently the consumer magazine Which? checked 122 High Street pharmacies. They got unsatisfactory advice from 43% of them, a disastrously bad performance for people who want to be the third tier of healthcare.  Even that’s probably better than my own experience. Recently, when I asked a Sainsbury’s pharmacist about a herbal treatment for prostate problems, he pointed to the MHRA’s kite mark and said it must work because the MHRA approved it -he was quite unaware that you get the THR kite mark without having to present any evidence at all about efficacy. Of course that is partly the fault of the MHRA for allowing misleading labels, but nevertheless, he should have known. See “Why does the MHRA refuse to label herbal products honestly? Kent Woods and Richard Woodfield tell me” for more on how the MHRA has betrayed its own standards. When I’ve asked Boots’ pharmacists advice about persistent diarrhoea in an infant, saying I wanted a natural remedy, I’ve usually been guided to the homeopathic display. Only once was I told firmly that I should use rehydration not homeopathy (something every good parent knows) and when I asked that good pharmacist where she’d been educated, she said in Germany (mildly surprising given the amount of junk available in German pharmacies) ### Regulators Anthony Cox, a pharmacist who has been consistently on the side of right, says "This is something that needs to be dealt with at a regulatory and professional body by the whole profession, and I am certain we have the majority of the UK pharmacy profession on side." But the regulator has done nothing, and it isn’t even clear that there is a majority on his side. At a 2009 meeting of Branch Representatives of the RPS a motion was proposed: “…registration as a pharmacist and practice as a homeopath are not compatible, and that premises registered with the Society should not be used for the promotion of homeopathy” Although that is obviously sensible to most people, the proposal was followed by a speaker from Leicester who thought it right to keep an open mind about Avogadro’s number and the motion was defeated. So much for the "scientists on the High Street" aspiration. There have been two major scandals surrounding homeopathy recently. Both were revealed first by bloggers, and both came to wide notice through television programs. None were noticed by the regulators, and when they were brought to the attention of the regulator, nothing effective was done. The malaria scandal A lot has been wriitten about this here and on other blogs e.g. here and here. The idea that sugar pills can prevent or cure malaria is so mind-bogglingly dangerous that it was condemned by the Queen’s Homeopathic Physician, Peter Fisher. It was exposed on a BBC Newsnight programme in 2006. Watch the video. The Gentle Art of Homeopathic Killing was an article that originally appeared on the excellent Quackometer blog produced by Andy Lewis. "The Society of Homeopaths were so outraged about one of their members flouting the code of ethics so blatantly that they took immediate action. That action was, as expected, not to do anything about the ethics breach but to threaten Andy and his hosting ISP with legal action for defamation. The article is reproduced here as a public service". Some of the people involved in this bad advice were pharmacists, Very properly they were referred to the RPS in 2006 qnd 2009, the regulator at that time. They sat on the complaint so long that eventually the RPS was replaced by the GPhC as regulator. Nothing much has happened. The GPhC did precisely nothing. Read their pathetic response. Homeopathy for meningitis An equally murderous fraud, "homeopathic vaccines" by Ainsworth’s has long been targeted by bloggers. In January 2013, Samantha Smith made an excellent BBC South West programme about it. Watch it and get angry. Anthony Pinkus, pharmacist at Ainsworths, has been referred to the then regulator, the RPS, in 2006 and 2009. It’s said that he took "remedial action", though there is little obvious change judged by the video above. No doubt some of the most incriminating stuff has been removed from his web site to hide it from the ASA. It’s safer to mislead people by word of mouth. Since the last video more complaints have been made to the GPhC. So far, nothing but silence. ### Why doesn’t the regulator regulate? This pamphlet is reproduced from the July 2011 Quackometer post, “Ainsworths Pharmacy: Casual Disregard for the Law  It’s almost as though those royal warrants, enlarged on right, acted as a talisman that puts this dangerous company outside the grasp of regulators. I hope that the GPhC Council , and Duncan Rudkin (its chief executive and registrar), are not so worried about their knighthoods that they won’t risk upsetting the royal family, just to save patients from malaria and meningitis. Their CEO, Robert Nicholls is only a CBE so far. Another reason for their inaction might be that the GPhC Council members, and Duncan Rudkin (its chief executive and registrar), lack critical faculties. Perhaps they have not been very well educated? Many of them aren’t even pharmacists, but that curious breed of professional administrators who inhabit the vast number of quangos, tick their boxes and do harm. Or perhaps they are just more interested in protecting the income of pharmacists than in protecting their customers? Education The solution to most problems is education. But there is no real knowledge of how many pharmacists in the UK are educated in the critical assessment of evidence. A recent paper from the USA did not give cause for optimism. It’s discussed by the excellent US pharmacist, Scott Gavura, at Science-based medicine. The results are truly horrifying. “Few students disagreed with any CAM therapy. There was the greatest support for vitamins and minerals (94%, mean 4.29) which could include the science-based use these products. But there was strong support for demonstrably ineffective treatments like acupuncture, with 64% agreeing it was acceptable. Even homeopathy, which any pharmacy student with basic medicinal chemistry skills ought to know is absurd, was supported by over 40% of students.” If the numbers are similar in the UK, the results of the Which? magazine survey are not so surprising. And if they are held by the GPhC Council. their inaction is to be expected. We just don’t know, and perhaps someone should find out. I suspect that sympathy for quackery may sometimes creep in through that old-fashioned discipline known as pharmacognosy. It is about the botany of medicinal plants, and it’s still taught, despite the fact that very few drugs are now extracted from plants. At times, it gets dangerously close to herbalism. For example, at the School of Pharmacy (now part of UCL) a book is used Fundamentals of Pharmacognosy and Phytotherapy by Michael Heinrich, Joanne Barnes, Simon Gibbons and Elizabeth M. Williamson, ot the Centre for Pharmacognosy and Phytotherapy at the School of Pharmacy. The introductory chapter says. “TRADITIONAL CHINESE MEDICINE (TCM) The study of TCM is a mixture of myth and fact, stretching back well over 5000 years. At the time, none of the knowledge was written down, apart from primitive inscriptions of prayers for the sick on pieces of tortoise carapace and animal bones, so a mixture of superstition, symbolism and fact was passed down by word of mouth for centuries. TCM still contains very many remedies, which were selected by their symbolic significance rather than proven effects; however, this does not necessarily mean that they are all ‘quack’ remedies!” Well, not necessarily. But as in most such books, there are good descriptions of the botany, more or less good accounts of the chemical constituents followed by uncritical lists of things that the herb might (or might not) do. The fact that even the US National Institutes of Health quackery branch, NCCAM, doesn’t claim that a single herbal treatment is useful tells you all you need to know. Joanna Barnes is Associate Professor in Herbal Medicines, School of Pharmacy, University of Auckland, New Zealand. She has written a book, Herbal Medicines (“A useful book for learning holistic medicine”) that is desperately uncritical about the alleged theraputic effectiveness of plants. ("Simon Gibbons is on the editorial board of The Chinese Journal of Natural Medicine. Elizabeth Williamson is editor of the Journal of Phytotherapy Research, a journal that has a strong flavour of herbalism (take the infamous snoring remedy). These people aren’t quacks but they are dangerously tolerant of quacks. The warning is in the title. "Phytotherapy" is the current euphemism for herbalism. It’s one of those red light words that tells you that what follows is unlikely to be critical. Exeter’s fantasy herbalist, Simon Mills, now describes himself as a phytotherapist. What more warning could you need? Perhaps this explains why so many pharmacists are unworried by selling things that don’t work. Pharmacy education seems not to include much about the critical assessment of evidence. It should do. Chemist and Druggist magazine certainly doesn’t help. It continually reinforces the idea that there is a debate about homepathy. There isn’t. And in one of its CPD modules Katherine Gascoigne says "Homeopathic remedies are available, but are best prescribed by a homeopath" Ms Gascoigne must be living on another planet. ### Conclusions The main conclusion from all of this is that the General Pharmaceutical Council is almost criminally negligent. It continues to allow pharmacists, Anthony Pinkus among them, to endanger lives. It fails to apply its own declared principles. The members of its Council, and Duncan Rudkin (its chief executive and registrar), are not doing their job. Individual pharmacists vary a lot, from the superb to those who believe in quackery. Some, perhaps many, are embarrassed by the fact that their employer compels them to sell rubbish. It’s too much to expect that they’ll endanger their mortgage payments by speaking out about it, but the best ones will take you aside and explain that they can’t recommend it. The GPhC itself is regulated by the Professional Standards Authority, the subject of my last post. We can’t expect anything sensible from them. In the USA there is a shocking number of pharmacists who seem to believe in quackery. In the UK. nobody knows, though judging by their failure to vote against the daftest of all scams, homeopathy, there is no cause for complacency here. It seems that there will have to be big improvements in pharmacy education before you can have much confidence in the quality of the advice that you get in a pharmacy. ### Follow-up Yesterday a talk was given at the School of Pharmacy, organised by the “The Centre for Homeopathic Education” (an oxymoron if there ever was one). The flyer had all the usual nonsense. Its mention of “Remedies & Tonics for Cancer Recovery” might well have breached the Cancer Act (1939). When I asked whether the amount received in room rental was sufficient to offest the damage to the reputation of the School of Pharmacy resulting from hosting a nutty (and possible illegal) event, I had the greatest difficulty in extracting any sort of response from the school’s director, Duncan Craig. I’m told that he considers “the policy on space rental to be a UCL management issue, rather than a matter of discussion on scientific ethics with a colleague”. Oh dear. ## One incompetent regulator, the Professional Standards Authority, approves another, the CNHC #### October 13th, 2013 · 10 Comments Jump to follow-up The consistent failure of ‘regulators’ to do their job has been a constant theme on this blog. There is a synopsis of dozens of them at Regulation of alternative medicine: why it doesn’t work, and never can. And it isn’t only quackery where this happens. The ineptitude (and extravagance) of the Quality Assurance Agency (QAA) was revealed starkly when the University of Wales’ accreditation of external degrees was revealed (by me and by BBC TV Wales, not by the QAA) to be so bad that the University had to shut down. Here is another example that you couldn’t make up. Yes, the Professional Standards Authority (PSA) has agreed to accredit that bad-joke pseudo-regulator, the Complementary & Natural Healthcare Council (CNHC, more commonly known as Ofquack) Ofquack was created at the instigation of HRH the Prince of Wales, at public expense, as a means of protecting the delusional beliefs of quacks from criticism. I worked for them for a while, and know from the inside that their regulation is a bad joke. When complaints were made about untrue claims made by ‘reflexologists’, the complaints were upheld but they didn’t even reach the Conduct and Competence committee, on the grounds that the reflexologists really believed the falsehoods that they’d been taught. Therefore, by the Humpty Dumpty logic of the CNHC, their fitness to practise was not affected by their untrue claims. You can read the account of this bizarre incident by the person who submitted the complaints, Simon Perry. In fact in the whole history of the CNHC, it has received a large number of complaints, but only one has ever been considered by their Conduct and Competence Committee. The rest have been dismissed before they were considered properly. That alone makes their claim to be a regulator seem ridiculous. The CNHC did tell its registrants to stop making unjustified claims, but it has been utterly ineffective in enforcing that ruling. In May 2013, another 100 complaints were submitted and no doubt they will be brushed aside too: see Endemic problems with CNHC registrants.. As I said at the time It will be fascinating to see how the CNHC tries to escape from the grave that it has dug for itself. If the CNHC implements properly its own code of conduct, few people will sign up and the CNHC will die. If it fails to implement its own code of conduct it would be shown to be a dishonest sham. In February of this year (2013), I visited the PSA with colleagues from the Nightingale Collaboration. We were received cordially enough, but they seemed to be bureaucrats with no real understanding of science. We tried to explain to them the fundamental dilemma of the regulation of quacks, namely that no amount of training will help when the training teaches things that aren’t true. They were made aware of all of the problems described above. But despite that, they ended up endorsing the CNHC. ### How on earth did the PSA manage to approve an obviously ineffective ‘regulator’? The job of the PSA is said to be “. . . protecting users of health and social care services and the public”. They (or at least their predecessor, the CHRE), certainly didn’t do that during the saga of the General Chiropractic Council. The betrayal of reason is catalogued in a PSA document [get local copy]. Here is some nerdy detail. It is too tedious to go through the whole document, so I’ll deal with only two of its many obvious flaws, the sections that deal with the evidence base, and with training. The criteria for accreditation state Standard 6: the organisation demonstrates that there is a defined knowledge base underpinning the health and social care occupations covered by its register or, alternatively, how it is actively developing one. The organisation makes the defined knowledge base or its development explicit to the public. The Professional Standards Authority recognises that not all disciplines are underpinned by evidence of proven therapeutic value. Some disciplines are subject to controlled randomized trials, others are based on qualitative evidence. Some rely on anecdotes. Nevertheless, these disciplines are legal and the public choose to use them. The Authority requires organisations to make the knowledge base/its development clear to the public so that they may make informed decisions. Since all 15 occupations that are “regulated” by the CNHC fall into the last category. they “rely on anecdotes”, you would imagine the fact that “The Authority requires organisations to make the knowledge base/its development clear to the public” would mean that the CNHC was required to make a clear statement that reiki, reflexology etc are based solely on anecdote. Of course the CNHC does no such thing. For example, the CNHC’s official definition of reflexology says Reflexology is a complementary therapy based on the belief that there are reflex areas in the feet and hands which are believed to correspond to all organs and parts of the body There is, of course, not the slightest reason to think such connections exist, but the CNHC gives no hint whatsoever of that inconvenient fact. The word “anecdote” is used by the PSA but occurs nowhere on the CNHC’s web site. It is very clear that the CNHC fails standard 6. But the PSA managed to summon up the following weasel words to get around this glaring failure: “The professional associations (that verify eligibility for CNHC registration) were actively involved in defining the knowledge base for each of the 15 professions. The Panel further noted that Skills for Health has lead responsibility for writing and reviewing the National Occupational Standards (NOS) for the occupations CNHC registers and that all NOS have to meet the quality criteria set by the UK Commission for Employment and Skills (UKCES), who are responsible for the approval of all NOS across all industry sectors. The Panel considered evidence provided and noted that the applicant demonstrated that there is a defined knowledge base underpinning the occupations covered by its registers. The knowledge base was explicit to the public”. The PSA, rather than engaging their own brains, simply defer to two other joke organisations, Skills for Health and National Occupational Standards. But it is quite obvious that for things like reiki, reflexology and craniosacral therapy, the “knowledge base” consists entirely of made-up nonsense. Any fool can see that (but not, it seems, the PSA). Skills for Health lists made-up, HR style, “competencies” for everything under the sun. When I got them to admit that their efforts on distance-healing etc had been drafted by the Prince of Wales’ Foundation, the conversation with Skills for Health became surreal (recorded in January 2008) DC. Well yes the Prince of Wales would like that. His views on medicine are well known, and they are nothing if not bizarre. Haha are you going to have competencies in talking to trees perhaps? “You’d have to talk to LANTRA, the land-based organisation for that.” DC. I’m sorry, I have to talk to whom? “LANTRA which is the sector council for the land-based industries uh, sector, not with us sorry . . . areas such as horticulture etc.” DC. We are talking about medicine aren’t we? Not horticulture. “You just gave me an example of talking to trees, that’s outside our remit ” You couldn’t make it up, but it’s true. And the Professional Standards Authority rely on what these jokers say. The current Skills for Health entry for reflexology says “Reflexology is the study and practice of treating reflex points and areas in the feet and hands that relate to corresponding parts of the body. Using precise hand and finger techniques a reflexologist can improve circulation, induce relaxation and enable homeostasis. These three outcomes can activate the body’s own healing systems to heal and prevent ill health.” This is crass, made-up nonsense. Of course there are no connections between “areas in the feet and hands that relate to corresponding parts of the body” and no reason to think that reflexology is anything more than foot massage. That a very expensive body, paid for by you and me, can propagate such preposterous nonsense is worrying. That the PSA should rely on them is even more worrying. National Occupational Standards is yet another organisation that is utterly dimwitted about medical matters, but if you look up reflexology you are simply referred to Skills for Health, as above. UK Commission for Employment and Skills (UKCES) is a new one on me. The PSA says that “the UK Commission for Employment and Skills (UKCES), who are responsible for the approval of all NOS across all industry sectors” It is only too obvious that the UKCES leadership team have failed utterly to do their job when it comes to made-up medicine. None of them know much about medicine. It’s true that their chairman did once work for SmithKline Beecham, but as a marketer of Lucozade, a job which anyone with much knowledge of science would not find comfortable.. You don’t need to know much medicine to spot junk. I see no excuse for their failure. The training problem. The PSA’s criteria for accreditation say Standard 9: education and training The organisation: 9a) Sets appropriate educational standards that enable its registrants to practise competently the occupation(s) covered by its register. In setting its standards the organisation takes account of the following factors: • The nature and extent of risk to service users and the public • The nature and extent of knowledge, skill and experience required to provide service users and the public with good quality care and later 9b) Ensures that registrants who assess the health needs of service users and provide any form of care and treatment are equipped to: • Recognise and interpret clinical signs of impairment • Recognise where a presenting problem may mask underlying pathologies • Have sufficient knowledge of human disease and social determinants of health to identify where service users may require referral to another health or social care professional. Anyone who imagines for a moment that a reflexologist or a craniosacral therapist is competent to diagnose a subarachnoid haemorrhage or malaria must need their head examining. In any case, the CNHC has already admitted that their registrants are taught things that aren’t true, so more training presumably means more inculcation of myths. So how does the PSA wriggle out of this? Their response started “The Panel noted that practitioners must meet, as a minimum, the National Occupational Standards for safe and competent practice. This is verified by the professional associations, who have in turn provided written undertakings to CNHC affirming that there are processes in place to verify the training and skills outcomes of their members to the NOS” Just two problems there. The NOS standards themselves are utterly delusional. And checking them is left to the quacks themselves. To be fair, the PSA weren’t quite happy with this, but after an exchange of letters, minor changes enabled the boxes to be ticked and the PSA said “The Panel was now satisfied from the evidence provided, that this Standard had been met”. ### What’s wrong with regulators? This saga is typical of many other cases of regulators doing more harm than good. Regulators are sometimes quacks themselves, in which case one isn’t surprised at their failure to regulate. But organisations like the Professional Standards Authority and Skills for Health are not (mostly) quacks themselves. So how do they end up giving credence to nonsense? I find that very hard to comprehend, but here are a few ideas. (1) They have little scientific education and are not really capable of critical thought (2) Perhaps even more important, they lack all curiosity. It isn’t very hard to dig under the carapace of quack organisations, but rather than finding out for themselves, the bureaucrats of the PSA are satisfied by reassuring letters that allow them to tick their boxes and get home. (3) A third intriguing possibility is that people like the PSA yield to political pressure. The Department of Health is deeply unscientific and clearly has no idea what to do about alternative medicine. They have still done nothing at all about herbal medicine, traditional Chinese medicine or homeopathy, after many years of wavering. My guess is that they see the CNHC as an organisation that gives the appearance that they’ve done something about reiki etc. I wonder whether they applied pressure to the PSA to accredit CNHC, despite it clearly breaking their own rules. I have sent a request under the Freedom of Information Act in an attempt to discover if the Department of Health has misbehaved in the way it did when it attempted to override NHS Choices. The responsibility for this cock-up has to rest squarely on the shoulders of the PSA’s director, Harry Cayton. He was director of the CHRE from which PSA evolved and is the person who so signally failed to do anything about the General Chiropractic Council fiasco, ### What can be done? This is just the latest of many examples of regulators who not only fail to help but actually do harm by giving their stamp of approval to mickey mouse organisations like the CNHC. Most of the worst quangos survived the “bonfire of the quangos”.. The bonfire should have started with the PSA, CNHC and Skills for Health. They cost a lot and do harm. There is a much simpler answer. There is a good legal case that much of alternative medicine is illegal. All one has to do is to enforce the existing law. Nobody would object to quacks if they stopped making false claims (though whether they could stay in business if they stopped exaggerating is debatable). There is only one organisation that has done a good job when it comes to truthfulness. That is the Advertising Standards Authority. But the ASA can do nothing apart from telling people to change the wording of their advertisements, and even that is often ignored. The responsibility for enforcing the Consumer Protection Law is Trading Standards. They have consistently failed to do their job (see Medico-Legal Journal, Rose et al., 2012. “Spurious Claims for Health-care Products“. If they did their job of prosecuting people who defraud the public with false claims, the problem would be solved. But they don’t, and it isn’t. ### Follow-up The indefatigable Quackometer has wriiten an excellent account of the PSA fiasco ## A review of Do You Believe in Magic, by Paul Offit. And a fine piece of timidity from Nature Medicine #### August 27th, 2013 · 5 Comments Jump to follow-up Despite the First Amendment in the US and a new Defamation Act in the UK, fear of legal threats continue to suppress the expression of honest scientific opinion. I was asked by Nature Medicine (which is published in the USA) to write a review of Paul Offit’s new book. He’s something of a hero, so of course I agreed. The editor asked me to make some changes to the first draft, which I did. Then the editor concerned sent me this letter.  Thank you for the revised version of the book review. The chief editor of the journal took a look at your piece, and he thought that it would be a good idea to run it past our legal counsel owing to the strong opinions expressed in the piece in relation to specific individuals. I regret to say that the lawyers have advised us against publishing the review. After that I tried the UK Conversation. They had done a pretty good job with my post on the baleful influence of royals on medicine. They were more helpful then Nature Medicine, but for some reason that I can’t begin to understand, they insisted that I should not name Nature Medicine, but to refer only to "a leading journal". And they wanted me not to name Harvard in the last paragraph. I’m still baffled about why. But it seemed to me that editorial interference had gone too far, so rather than have an editor re-write my review, I withdrew it. It is precisely this sort of timidity that allows purveyors of quackery such success with their bait and switch tactics. The fact that people seem so terrified to be frank must be part of the reason why Harvard, Yale and the rest have shrugged their shoulders and allowed nonsense medicine to penetrate so deeply into their medical schools. It’s also why blogs now are often better sources of information than established journals. Here is the review. I see nothing defamatory in it. ### Do You Believe in Magic? The Sense and Nonsense of Alternative Medicine  Paul A. Offit Harper, 2013 336 pp., hardcover26.99 ISBN: 0062222961 Reviewed by David Colquhoun Research Professor of Pharmacology, UCL.

Here’s an odd thing. There is a group of people who advocate the silly idea that you can cure all ills by stuffing yourself with expensive pills, made by large and unscrupulous Pharma companies.  No, I’m not talking about pharmacologists or doctors or dietitians.  They mostly say that stuffing yourself with pills is often useless and sometimes harmful, because that’s what the evidence says .

Rather, the pill pushers are the true believers in the alternative realities of the “supplement” industry. They seem blithely unaware that the manufacturers are mostly the same big pharma companies that they blame for trying to suppress “natural remedies”.  Far from trying to suppress them, pharma companies love the supplement industry because little research is needed and there are few restrictions on the claims that can be made.

Paul Offit’s excellent book concentrates on alternative medicine in the USA, with little mention of the rest of the world. He describes how American pork barrel politics have given supplement hucksters an almost unrestricted right to make stuff up.

Following the thalidomide tragedy, which led to birth defects in babies in the 1950s and 60s, many countries passed laws that required evidence that a drug was both effective and safe before it could be sold.  This was mandate by the Kefauver-Harris amendment (1961) in the USA and the Medicines Act (1968) in the UK.  Laws like that upset the quacks, and in the UK the quacks got a free pass, a ‘licence of right‘, largely still in existence.

 In order to sell a herbal concoction in the UK you need to present no evidence at all that it works, just evidence of safety, in return for which you get a reassuring certification mark and freedom to use misleading brand names and labels. Tradional herbal mark

In the USA the restrictions didn’t last long.  Offit describes how a lobby group for vitamin sellers, the National Health Federation, had a board made up of quacks, some of whom, according to Offit (page 73) had convictions.  They found an ally in Senator William Proxmire who introduced in 1975 an amendment that banned the Food and Drugs Administration (FDA) from regulating the safety of megavitamins.  Tragically, this bill was even supported by the previously-respected scientist Linus Pauling.  Offit tells us that “to Proxmire” became a verb meaning to obstruct science for political gain.

The author then relates  how the situation got worse with the passage of the  Dietary Supplement Health and Education Act (DSHEA) in 1994. It was passed with the help of ex-vitamin salesman Senator Orin Hatch and lots of money from the supplement industry.

This act iniquitously defined a “supplement” as “a product intended to supplement the diet that bears or contains one or more of the following ingredients: a vitamin, a mineral, an herb or other botanical, or an amino acid”.  At a stroke, herbs were redefined as foods.  There was no need to submit any evidence of either efficacy or even of safety, before marketing anything. All a manufacturer had to do to sell almost any herbal drug or megadose vitamin was to describe it as a “dietary supplement”.  The lobbying to get this law through was based on appealing to the Tea Party tendency –get the government’s hands off our vitamins. And it was helped by ‘celebrities’ such as Sissy Spacek and Mel Gibson (it’s impossible to tell whether they really believed in the magic of vitamins, or whether they were paid, or had Tea Party sympathies).

Offit’s discussion of vaccination is a heartbreaking story of venom and misinformation. As co-inventor of the first rotavirus vaccine he’s responsible for saving many lives around the world.  But he, perhaps more than anyone, suffered from the autism myth started by the falsified work of Andrew Wakefield.

The scientific community took the question seriously and soon many studies showed absolutely no link between vaccination and autism.  But evidence did not seem to interest the alternative world.  Rather than Offit being lauded as a saver of children’s lives, he describes how he was subjected to death threats and resorted to having armed guards at meetings.

Again, Offit tells us how celebrities were able to sway public opinion   For example (chapter 6), the actress Jenny McCarthy and talk-show hostess Oprah Winfrey promoted, only too successfully, the vaccine-autism link despite abundant evidence that it didn’t exist, and promoted a number of theories that were not supported by any evidence, such as the idea that autism can be “cured” by mega-doses of vitamins and supplements.

Of course vaccines like the one for rotavirus can’t be developed without pharmaceutical companies because, as Offit says, only they "have the resources and expertise to make a vaccine. We can’t make it in our garage".  When the Children’s Hospital of Philadelphia sold its royalty stake in the rotavirus vaccine for \$182 million, Offit received an undisclosed share of the intellectual property, “in the millions ”.

That’s exactly what universities love. We are encouraged constantly to collaborate with industry, and, in the process, make money for the university. It’s also what Wakefield, and the Royal Free Hospital where he worked, hoped to do.  But sadly, these events led to Offit being called names such as “Dr Proffit” and “Biostitute” (to rhyme with “prostitute”) by people like Jenny McCarthy and Robert F. Kennedy Jr.  The conspiritorialist public lapped up this abuse, but appeared not to notice that many quacks have become far richer by peddling cures that do not work.

One lesson from this sad story is that we need to think more about the potential for money to lead to good science being disbelieved, and sometimes to corrupt science.

Everyone should buy this book, and weep for the gullibility and corruption that it describes.

I recommend it especially to the deans of US Medical schools, from Harvard downwards, who have embraced “integrative medicine” departments. In doing so they have betrayed both science and their patients.

Abraham Flexner, whose 1910 report first put US medicine on a sound scientific footing, must be turning in his grave.

### Follow-up

30 August 2013

Quack lobby groups got a clause inserted into Obamacare that will make any attempt to evaluate whether a treatment actually works will leave insurance companies open to legal action for "discrimination".

"Discrimination? Yes! We must not allow the government to exclude health care providers just because those providers don’t cure anything."

The latest piece of well-organised corporate corruption by well-funded lobbyists is revealed by Steven Salzberg, in Forbes Magazine. The chaos in the US health system makes one even more grateful for the NHS and for the evaluation of effectiveness of treatments by NICE.