3 June 2011. Unlike the classic children’s tale where slow and steady wins the race, in biomedical research it’s the hare that often gets the glory. This seems especially true in the biomarker field, where potential new indicators of disease risk and prognosis leap onto the scene with record speed, yet vanishingly few finish the journey to routine clinical use. A study in this week’s Journal of the American Medical Association reports the magnitude of the problem. In this analysis of high-profile biomarker papers, “about 85 percent of the time, the highly cited study reported a much stronger effect compared to what was suggested by the meta-analysis and largest study of the same association,” lead author John Ioannidis of Stanford University School of Medicine, Palo Alto, California, told ARF. The caution against overinterpreting individual studies applies to biomarker research in Alzheimer’s and other neurodegenerative diseases, scientists in the field noted, and large studies to address this concern are underway. Inflated effect sizes may not reflect faulty research, but rather ill-fated publication practices and funding mechanisms that are just starting to recognize the need for systems-based approaches in biomarker research.
To gauge the validity of initially reported biomarkers, Ioannidis and coauthor Orestis Panagiotou of the University of Ioannina School of Medicine, Greece, looked at biomarker articles cited more than 400 times in ISI Web of Science through 2010. The 35 eligible studies, all published between 1991 and 2006 in 10 high-impact journals, represent roughly the top 3 percent of biomarker papers, the authors note. The effects reported in these frequently cited papers were compared against results of meta-analyses and larger studies, with the idea that the latter investigations are “closer to the truth, statistically speaking,” Ioannidis said. Among the well-cited biomarkers, most were genetic variants or blood-based factors, and more than two-thirds were proposed for cancer or cardiovascular applications. The list did not include any imaging measures or indicators aimed at Alzheimer’s and other neurodegenerative disease. Ioannidis advises AlzGene on meta-analysis methodology.
Many of the highly cited associations were exaggerated, the authors found. For 30 of 35 frequently cited biomarkers, the largest study of each marker reported less promising effects, often more than twofold lower than those touted in the highly cited study. And in 29 cases, smaller effect sizes came up in the meta-analysis. At times, the large study or meta-analysis reported an association in the opposite direction as the oft-cited paper.
AD scientists found the results compelling, though not particularly new or surprising. “I think it’s an interesting paper. It will make people think,” said Chengjie Xiong, who heads the AD biostatistics core at Washington University School of Medicine in St. Louis, Missouri. However, “if you want to get to the bottom of the issue, it’s not the research per se. It’s how the research is published. Many negative findings do not end up in the literature.”
How does this play out in biomarker research? Initial studies tend to be small, which subjects them to considerable statistical variability, Xiong noted. “If the effect is variable in the right direction, journal editors get excited and will publish the findings,” he said. “But if they’re variable in the other direction, nobody cares and nobody publishes. That’s how the system works.” Making matters worse, Ioannidis said, spectacular results not only have an edge in getting published, but they are also more likely to get cited.
To curb these biases, “all studies should be reported—weak associations, strong associations, no associations,” Ioannidis said. “Otherwise, we’re left with a literature that is replete with seeming successes but does not reflect reality.”
Recognizing this problem, the journal Neurobiology of Aging started accepting negative data in 2004 (see ARF interview). This applies to all relevant research studies, not just on biomarkers. To date, the journal has received 165 such submissions, 89 of which were accepted for publication, editor-in-chief Paul Coleman noted in an e-mail to ARF. “Just as one wants evidence for the validity of a positive result, we ask for evidence of the validity of a negative result—adequate sample size, appropriate procedures, appropriate data analyses, etc. We feel the standards for a negative result should be as rigorous as they would be for a positive result,” added Coleman, who heads an AD lab at the Banner Sun Health Research Institute in Arizona.
Given how research is funded, that may be easier said than done, suggested John Trojanowski of the University of Pennsylvania School of Medicine in Philadelphia. “Study sections will often fund discovery science, but it’s hard to get an R01 to confirm other people’s results,” he told ARF. “A kiss of death in an R01 grant application is ‘oh well, this is just confirmatory.’ There should be other funding mechanisms that say, ‘Yes, validation and confirmation are what we want,’” Trojanowski said. “It’s hard for single labs to do all the work needed to not only discover, but also validate and qualify, biomarkers to bring them forward to clear standards so they can be used in the clinic.”
Over time, this is becoming clear to the U.S. National Institutes of Health, which committed nearly two-thirds of the $120 million needed for both phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI). This is a highly coordinated study engaging more than 1,000 research volunteers across North America (see ARF ADNI series and ARF related news story) for the purpose of validating and standardizing AD biomarkers. Similar efforts have sprung forth in the Parkinson’s field with the Parkinson’s Progression Markers Initiative (see ARF related news story), and for frontotemporal dementia research with the FTLD Neuroimaging Initiative. A recent Nature editorial argues that larger, collaborative studies like these are required to move claimed biomarkers into clinical practice (Poste, 2011).
Meta-analyses can also alleviate the problem of exaggerated effect sizes, as the JAMA paper points out. However, pooling data from studies with different cohorts and analysis techniques may cause problems, too, Xiong said. Ideally, meta-analyses would re-crunch the numbers on individual patient data collected from the various studies. However, such raw data are often unavailable. In a recent paper, Xiong and colleagues describe a method for doing meta-analysis that combines individual-level data from certain studies with summary statistics from others (Xiong et al., 2011).
How different labs collect the same data can complicate biomarker analysis even further—a problem not discussed in the JAMA report, noted Douglas Galasko of the University of California at San Diego (see full comment below). “The AD field has been highly aware of these issues, and is addressing quality control for imaging and biofluid biomarkers appropriately,” he wrote in an e-mail to ARF (e.g., ARF related news story; see also Shaw et al., 2011 and Fagan et al., 2011).
Ultimately, “it would be premature to doubt all scientific efforts at marker discovery and unwise to discount all future biomarker evaluation studies,” wrote Patrick Bossuyt of the University of Amsterdam, The Netherlands, in a commentary accompanying the paper. The study “should convince clinicians and researchers to be careful to match personal hope with professional skepticism, to apply critical appraisal of study design and close scrutiny of findings where indicated, and to be aware of the findings of well-conducted systematic reviews and meta-analyses when evaluating the evidence on biomarkers.”—Esther Landhuis.
Ioannidis JP, Panagiotou OA. Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. JAMA. 1 June 2011;305(21):2200-2210. Abstract
Bossuyt PM. Editorial: The thin line between hope and hype in biomarker research. JAMA. 1 June 2011;305(21):2229-2230. Abstract