The Man Who Did the Math

John Ioannidis has found fault with most currently published medical research, but he’s optimistic about the future
John Ioannidis
John Ioannidis has found that small samples sizes, poorly designed protocols and lax statistical standards have produced a plethora of unsubstantiated and misleading research. Photo: Kelvin Ma
April 4, 2011

Share

In this modern, secular world, we place great faith in the scientific method. That might be why when we hear about a study that says we should take more of a particular vitamin, eat this fruit or take that drug, we pay close attention. “Studies have shown…” is a common refrain in news reports, lending a certain seal of approval.

But then another study comes out, and it turns out that the vitamin we’re taking is causing more problems than it’s solving, the fruit we’re consuming in the quest for a long life isn’t doing anything of the sort and that prescription we’ve been dutifully downing every day is not solving a health problem, but causing another.

To cite two outsize examples: a decade ago post-menopausal women were told, based on existing studies, they should be on hormone replacement therapy—until it was shown years later that HRT can increase risk of breast cancer, heart disease and stroke. Vioxx was widely prescribed as a safe pain medication, with many studies giving it the nod—until it proved to cause an increased rate of heart attacks and was swiftly removed from the market.

Why is it that so many research studies don’t survive later efforts to replicate the initial results—a standard practice for validating research—and what can be done about it? Those questions have concerned John Ioannidis for the better part of two decades, and now dominate his academic life.

So what is ailing medical research?

In dozens of papers, Ioannidis and his collaborators have found that small samples sizes, poorly designed protocols and lax statistical standards have produced a plethora of unsubstantiated and misleading research. And while the research studies undergo peer review before being accepted into professional journals, the standards that the profession uses to judge research have not been stringent enough, Ioannidis says, leading to systemic problems.

A longtime adjunct professor at Tufts University School of Medicine, Ioannidis made the point sharply in the title of his widely cited 2005 paper, “Why Most Published Findings Are False,” which appeared in the journal PloS Medicine.

Not everyone agrees, of course. PLoS published a response to the article by two researchers who questioned the mathematical models Ioannidis used to support his claims. But even they agreed “that many medical research findings are less definitive than readers suspect … [and] that bias of various forms is widespread.”

There’s obviously plenty of downside to any flawed study. Bad research on medical interventions has the potential to expose large numbers of people to ineffective, harmful and expensive treatments. Basic scientists run the risk of wasting millions of federal research dollars in their pursuit of science based on misleading findings. For example, there are still numerous researchers who continue work “based entirely on the premise that beta-carotene is an effective chemo-preventive agent for cancer, despite repeated refutations by large trials,” Ioannidis says. “That money could have been invested elsewhere.”

What Ioannidis really wants, though, is not to castigate scientists. Instead, he wants them to develop ways to make their research better. He’s making his case in respected journals, such as the Lancet and the Journal of the American Medical Association, and regularly speaks at scientific conferences, where attendees want to learn how to get their work right. The Atlantic magazine recently carried a long article about Ioannidis, calling him “one of the world’s foremost experts on the credibility of medical research.” And he was quoted at length by the science writer Jonah Lehrer in a January New Yorker article that explored problems with the scientific method.

“I think he has come across something that has not been recognized before. It is potentially extremely important,” says Jerome Kassirer, a Distinguished Professor at Tufts School of Medicine and the former editor of the New England Journal of Medicine. “I think that it is something that needs to be followed up. If it’s true, then every study is going to have to be evaluated based on what is already published and what the new observations are,” he says.

Staying in Motion

As a high school student, Ioannidis’ first passion was mathematics, but he eventually decided to become a physician, graduating first in his class at the University of Athens Medical School. He says he was intrigued by the “evidence-based medicine” movement that was gaining traction in the early 1990s: it sought to base treatments on proven techniques. But Ioannidis thought that many studies on which the evidence relied seemed sloppy, based on small sample sizes and poor science. He started to apply meticulous statistical analyses to the studies—and found them wanting.

He first came to Tufts School of Medicine in 1993 on a fellowship and started working on just these questions. He has been an adjunct faculty member since 1996, and now holds the rank of professor. In 2008 he also was named director of the genetics/genomics component of the Tufts Clinical and Translational Science Institute and of the Center for Genetic Epidemiology and Modeling in the Institute for Clinical Research and Health Policy Studies at Tufts Medical Center. He works closely with his medical school colleague Joseph Lau, whose research focus is on evidence-based medicine and meta-analysis, in particular developing tools to conduct systemic reviews of scientific studies.

Ioannidis (pronounced yo-NEE-dees) is nothing if not peripatetic: for the last dozen years or so, he’s also held an appointment as professor and chair of the Department of Hygiene and Epidemiology at the University of Ioannina School of Medicine in Greece, and late last year was named the C.F. Rehnborg Chair in Disease Prevention at the Stanford University School of Medicine, where he currently holds appointments as professor of medicine and of health research and policy and director of the Stanford Prevention Research Center. He also holds adjunct professor appointments at the Harvard School of Public Health and at Imperial College London.

Earlier this year he was again back at Tufts Medical School, working with Johanna Seddon, a professor of ophthalmology, on a large-scale meta-analysis of genome-wide association studies for age-related macular degeneration, and Peter Castaldi, an assistant professor of medicine, on an evaluation of validation practices, among others. “We have video conferences between Greece and Tufts every couple of weeks with presentations on our research,” adds Ioannidis. “It’s a small world.”

Problems in the Mix

The crux is this, according to Ioannidis, who has analyzed thousands of research studies: those that garner media attention—and thus public notice—often have tiny sample sizes, and the smaller they are, the less likely they are to be true. Take vitamin D. Early and enthusiastic small studies touting its many health benefits, from preventing cancer to protecting children from allergies, are now being tempered by larger and better-designed studies that haven’t found nearly the results the earlier investigations claimed.

In late November, the Institute of Medicine issued a report urging caution before loading up on vitamin D. Ioannidis was not surprised. “We’ve had tons of observational data on various vitamins, such as vitamin E and vitamin C and beta-carotene, suggesting that you can see huge health benefits,” he says, from improved cardiovascular health to lower incidence of cancer. “Then others do further research, running very large randomized studies, and they show no benefit most of the time, and sometimes even show harm.”

And the more that money or politics is involved in the research, the less likely the results are to be valid, he says. Think of big pharmaceutical companies reporting on drug tests: there’s a huge incentive to play up positive results, however weak, and downplay negative ones.

In discussing ways to counter loose methodology, Ioannidis talks often about the “gold standard” for research: large randomized, controlled trials. Yet even that supposedly ideal metric isn’t perfect. Ioannidis and his colleagues found that approximately a quarter of the most influential randomized studies in the medical literature were refuted within a few years, and even the large ones, which offer more protection from error, can be wrong 10 percent of the time.

For example, he says, “claims that vitamin E can decrease cardiovascular events and death by half were supported not just by smaller, observational studies, but also by at least one large randomized trial of 2,002 patients.” But those results were later disproved in subsequent studies. Likewise, the claim that magnesium can markedly decrease mortality in patients with acute myocardial infarction was supported by a large randomized trial with more than 2,000 patients, but was later not substantiated when tested in two trials with over 9,000 and 58,000 patients, respectively.

There’s also the issue of selective reporting. While drug companies have to register with the FDA for clinical trials in advance, there is no obligation to report the results. “They run hundreds of trials and report only those that get the most promising results,” Ioannidis says. The public never learns about the others.

He cites trials of antidepressant medications. Only about half of the hundreds of studies done by drug companies were positive, but “when you look at the literature, almost everything that has been published has positive results.”

Improving Standards

It’s not all bad news, though. “It would be tragic,” Ioannidis says, if things had not improved since he and other scientists started working on improving the statistical validity of biomedical research. “I think that there has been quite a lot of progress in the way that research is designed and reported,” he says, but adds a quick qualifier: it depends on the field. One discipline that has made clear progress, he says, is genetic epidemiology, the study of how genetic factors influence health. “I think it is one where within a short period of time we have seen dramatic improvement in the standards of research,” he says.

Even relatively recently there were plenty of papers published in genetic epidemiology that used only a handful of cases and controls and employed very lenient statistical significance rules—“this gene causes that disease, at least in the 24 people studied” was a standard approach. “Pretty much anything generated from that type of study was not replicated in large-scale follow-up studies up to 95 to 98 percent of the time,” Ioannidis says.

But in the last five years, in response to criticism from Ioannidis and others, genetic epidemiologists have raised the bar. To determine the validity of findings, they use a measurement called a probability value, or p value. In the past—and still today in many fields—a finding would be considered statistically significant with a p value of 0.05.  

“This means that if a single researcher runs 20 analyses, he or she will find one spurious—or false-positive–discovery by chance,” Ioannidis says. “This was quite tolerable in the past when there were fewer scientists running fewer analyses. However, now there are millions of scientists, some of whom run millions of analyses in each study they conduct. To avoid false-positive results in genetics, the current goal for a p value should be less than 0.00000005.” Many other fields should also become more stringent in their statistical criteria for claiming discoveries, he adds.

Along with more stringent statistical criteria, large consortia of up to 30 or more teams of investigators often are “joining forces with common protocols, common definitions, stringent quality control, sharing of all the data and a common analysis plan,” he says. Genetics research is a good example, as international groups of researchers sift through vast amounts of data to identify the genes that cause different diseases.

The new standards have led to a realization that many genetic factors don’t increase the risk of, say, having diabetes or a heart attack or Parkinson’s disease two-fold or three-fold, as previously thought. Instead, those risks work out to 1.1-fold or 1.15-fold. “These are far more tempered and conservative discoveries, but they are genuine,” he says.

With all the talk of flawed research, could Ioannidis be wrong about being wrong? “Absolutely,” he says. “I think that this is just part of the scientific process: everything needs to be replicated and reexamined by other investigators, and needs to be transparent enough so that it can be replicated or refuted.”

 

Taylor McNeil can be reached at taylor.mcneil@tufts.edu.

 

 

If You Like This