A talk by Julie Rehmeyer at the Joint Statistics Meetings, 2016.
In 2011, headlines worldwide announced that an effective treatment had been found for a debilitating illness that affects 17 million people around the world. The study was published in The Lancet. Great news!
Except that it wasn't. Patients with the illness, known as chronic fatigue syndrome or myalgic encephalomyelitis, quickly decried the study as having severe scientific problems. Furthermore, it didn't fit with their experience: patients reported that the two treatments -- psychotherapy and gradually increasing exercise -- had little impact and could indeed be dangerous. They analyzed the study and spelled out its serious flaws, particularly statistical ones, in scientific journals, but the authors dismissed the concerns as prejudice against psychiatry.
In late 2015, journalist David Tuller wrote a 14,000-word expose of the flaws in the trial, citing the grave concerns of researchers. Dr. Ronald Davis of Stanford University, for example, said, "I'm shocked that the Lancet published it.I don't understand how it got through any kind of peer review."
I'll describe the problems with the science, with the functioning of the scientific institutions, and with the journalism.
One of the most damaging cases of bad statistical practice that I have personally encountered in my career as a journalist.
Largest treatment trial in the history of chronic fatigue syndrome, known as PACE.
Highly influential:
-- Public health recs around the world
-- only science most docs know about the disease
But so highly flawed that data are uninterpretable.
In the last year, the trial has received serious public criticism, including an open letter to the Lancet from 42 scientists demanding an independent investigation into it, but the public health recommendations remain.
Two reasons to study it:
1. Patients being hurt by it to this day. Critiques have come out, but no retraction, and public health recs remain.
2. Object lesson in how our systems can break down. In this case, there were serious breakdowns statistically, scientifically, journalistically, and in public heatlh.
Personally motivated because I have suffered from the disease.
What is the PACE trial?
Published Feb. 2011 in the Lancet.
Tested two treatments: graded exercise therapy and cognitive behavioral therapy (CBT). It claimed that they were effective treatments and led to “recovery” in 22 percent of cases.
Big trial:
-- 641 patients
-- 8 million dollars
Smells like a really good study.
Received widespread media coverage.
For example,
The New York Times gave the most nuanced coverage, though it still pointed out few of the serious problems with the trial.
Worldwide coverage. ME is an alternate name – short for myalgic encephalomyelitis and commonly abbreviated ME/CFS. I’ll call it ME/CFS.
This article claimed, “About 30 per cent of patients given cognitive behavioural therapy or graded exercise made a full recovery to normal levels of activity..."
The study actually made no claims about recovery. A followup study later claimed, using some highly suspect logic, that 22 percent recovered. But at the press conference, one of the researchers said that 30 percent got “back to normal” – hence many articles repeated the claim. The true statement was that 30 percent of patients were “within the normal range,” statistically speaking, which, as the statisticians here know, isn’t related to “a full recovery to normal levels of activity.”
Of particular interest: “…scientists have found encouraging people with ME to push themselves *to their limits* gives the best hope of reovery.”
This is a particularly dangerous thing to say, because the hallmark symptom of the disease is an inability to tolerate exercise.
That was the state of this patient. Exceeding the limits on exercise – which can vary day to day – cause a resurgence of all symptoms:
-- fatigue, but also
-- unrefreshing sleep
-- cognitive problems (can be unable to speak or do simple arithmetic)
-- inability to regulate blood pressure
-- immune problems
-- neurological problems, including an inability to tolerate light or noise. Hence the earphones on this woman.
Crashes can last days to weeks to months to forever.
Almost all experienced patients try to avoid crashes through “pacing,” judging from day to day how much energy they have and not exceeding it.
PACE researchers argue against this: They say by starting at a low level of activity and increasing gradually, patients can push past those limits and cure themselves.
My biggest problem was semi-paralysis:
It felt like my brain signals were barely reaching my legs, while my feet were tied to 100-pound weight.
It used to happen commonly from too much activity, but it would also happen for no reason that I knew of.
Now, I’m incredibly lucky. I’ve mostly recovered.
This is me dancing at my wedding three years ago. I’m incredibly lucky just to be well enough to be here speaking to you today.
But among patients who are seriously affected, as I was, that’s
Very rare. I’m incredibly fortunate just to be well enough to be here today.
If I’d gotten unlucky, I could have ended up like
this young man, Whitney Dafoe, with a severe case of the disease.
Common severe symptoms:
Paralysis
Seizures
Inability to eat or digest food
On and on and on
Occasionally, but rarely, lethal
So when the study first came out, it struck me as highly improbable:
-- Cognitive behavioral therapy helpful for most chronic illnesses, but not by curing them.
-- Exercise seemed like exactly the wrong idea.
But still, I was impressed by the study’s credentials, and what mattered wasn’t what I found plausible, it was what the data were actually telling us. What was going on? I’m a science writer, so over the following years, I dug into the science.
And I found problems. Big problems. The most striking, perhaps, was in their definition
of recovery. One of the main claims of PACE was that 22 percent of patients recovered. Given that recovery isn’t common, this was a remarkable claim, one that, if true, would be very exciting.
The PACE researchers said that patients had “recovered” if they met four
Criteria:
Patient had to assess their own physical function as above a given threshold.
Similarly, they had to assess their fatigue as below a given threshold.
They had to say they’d improved sufficiently overall.
And they had to no longer meet the definition of having chronic fatigue syndrome. Sounds very impressive!
The researchers published these criteria in a protocol before they started—a very good thing—but then they weakened all four criteria partway through the trial—a very bad thing.
So here’s how they changed their criterion for
Physical function:
To enter the trial, patients had to have a score of 65 of below on physical function. Almost all reasonably healthy adults would score 95 or 100, so these people were pretty impaired.
According to the protocol, participants had to score 85 or higher(along with meeting other criteria) at the end of the trial to count as “recovered.”
But after the study began, they lowered that threshold.
Not to 80.
Not to 75.
Not to 70.
Not to 65, the entry threshold.
But to 60.
That’s lower than the entry threshold. So that means you could enter the trial with a score of 65, get worse over the course of the trial and end up at 60, and be said to have “recovered,” as long as you meet the other criteria as well.
So let’s look at the other
criteria.
The second one is a self-assessment of
Fatigue.
More complicated, changed scoring system.
Upshot same: Can deteriorate while meeting criterion.
The third criterion was
Overall improvement. Originally, participants had to say that they overall felt “very much better,” but the researchers changed it to allow answers of “much better” as well.
The final criterion is that patients had to
No longer meet the definition of having CFS. This sounds quite impressive, but in fact, as the researchers implemented this, very few patients could meet the first three criteria and not meet this last one, so it added very little.
So that means that “recovered” patients didn’t necessarily look like
this. They may look like
this.
For example, here’s a realistic scenario for a “recovered” patient: A participant in the trial gets sleep medication for the first time, since all the participants got specialized medical care. That’s wonderful, and it makes her feel overall much better. It also means that she has a bit less trouble finding words than she used to, so her fatigue score goes up slightly. Her physical function, however, goes down: Before, she didn’t have trouble with a flight of stairs, but now she struggles with that. She certainly can’t exercise vigorously, or walk a mile, and even vacuuming her house is quite hard. Nevertheless, according to the researchers, she would no longer qualify as having CFS and would be considered “recovered.”
So their claim that 22 percent of patients recovered is highly suspect, but what about their claim that the treatments were effective, that patients
Improved?
Also highly problematic.
Also changed from protocol.
New version: comparison within the treatments of the trial only, no objective thresholds.
Concluded that cognitive behavioral therapy and exercise were more effective than specialist medical care only, or their version of patients’ “pacing” approach (though very problematic implementation).
But didn’t show that improvement was clinically significant. Data suggests it was tiny, and only on self-assessments.
All objective measures failed. Tiny improvement on walking speed for exercise patients (one would hope so!), but still worse than patients waiting for a lung transplant.
The problems go on and one, but let me briefly mention one more, one that affected the entire study.
-- competing definitions.
They used one of the most general, requiring only six months of disabling fatigue as the primary symptom. No others:
-- no unrefreshing sleep
-- no congitive problems
-- no blood pressure regulation problems
-- no neurological problems
-- no immune problems
Excluded some fatiguing illness, but included depression. Problem!
-- causes fatigue
-- responds to CBT and exercise
The US NIH has called for this definition to be scrapped.
Subgroup analysis:
-- CDC definition
-- London ME definition
Still found benefits from CBT and exercise.
But all the patients first had to meet the Oxford definition. That’s a serious problem, and this
Venn diagram shows why.
For many patients, especially the ones who are sicker, fatigue is not their primary problem. Cognitive problems, weakness, and other neurological problems often predominate. For me, it was paralysis.
So wrong patients included, right patients excluded.
They’re arguing you can generalize to all these patients, but you can’t.
There are many more problems I don’t have time to detail.
Some highly informed patients pointed out these problems, but the researchers – and Richard Horton, the editor of the Lancet – dismissed their concerns as prejudice against psychiatry. And for nearly five years, journalists were entirely credulous. As a patient myself, it was difficult for me to take them on.
I’m pleased to say, though, that another journalist finally did
an expose of the problems. I highly recommend that you check out Tuller’s work: