Statistics don't lie. People do.
I have the greatest respect for statisticians, who methodically sift through messy data to determine what can confidently and honestly be said about them. But even the most sophisticated analysis depends on how the data were obtained. The miniscule false-positive rate for DNA tests, for example, is not going to protect you if the police swap the tissue samples.
One of the core principles in clinical trials is that researchers specify what they're looking for before they see the data. Another is that they don't get to keep trying until they get it right.
But that's just the sort of behavior that some drug companies have engaged in.
In the Pipeline informs us this week of a disturbing article in the New England Journal of Medicine. The authors analyzed twenty different trials conducted by Pfizer and Parke-Davis evaluating possible off-label (non-FDA-approved) uses for their epilepsy drug Neurontin (gabapentin).
If that name sounds familiar, it may be because Pfizer paid a $0.43 billion dollar fine in 2004 for illegally promoting just these off-label uses. As Melody Peterson reported for The New York Times and in her chilling book, "Our Daily Meds," company reps methodically "informed" doctors of unapproved uses, for example by giving them journal articles on company-funded studies. The law then allows the doctors to prescribe the drug for whatever they wish.
But the distortion doesn't stop with the marketing division.
The NEJM article draws on internal company documents that were discovered for the trial. Of 20 clinical trials, only 12 were published. Of these, eight reported a statistically significant outcome that was not the one that was described in the original experimental design. The authors say "…trials with findings that were not statistically significant (P≥0.05) for the protocol-defined primary outcome, according to the internal documents, either were not published in full or were published with a changed primary outcome."
A critical reason to specify the goals, or primary outcome, ahead of time is that the likelihood of getting a statistically significant result by chance increases as more possible outcomes are considered. In genome studies, for example, the criterion for significance is typically reduced by a factor that is the number of genes tested, or equivalently the number of possible outcomes.
None of this would be surprising to Peterson. She described a related practice in which drug companies keep doing trials until they get two positive outcomes, which is what the FDA requires for approval.
By arbitrary tradition, the numerical threshold for statistical significance is taken as a 5% or less chance that an outcome arose by chance (P-value). This means that if you do 20 trials you'll have a very good chance of getting one or more that are "significant," even if there is no effect.
A related issue arose for the recent, highly publicized results of an HIV/AIDS vaccine test in Thailand. Among three different analysis methods, one came up with a P-value of 4%, making it barely significant.
This means is that only one in twenty-five trials like this would get such a result by chance. That makes the trial a success, by the usual measures.
But this trial is just one of many trials for potential vaccines, most of which have shown no effect. The chances that any one of these trials gave a positive result is much larger, presumably more than 5%.
In addition, the Thai vaccine was expected to work by slowing down existing infection. Instead, the data show reduced rates of initial infection. Measured in terms of final outcome (death), it was a success. But in some sense the researchers moved the goalposts.
Sometimes, of course, a large trial can uncover a real but anticipated effect. It makes sense to follow up on these cases, recognizing that a single result is only a hint.
Because of the subtleties in defining the outcome of a complex study, there seems to be no substitute for repeating a trial, stating a clearly defined outcome. Good science writers understand this. It would be nice to think that the FDA did, too, and established procedures to ensure reliable conclusions.