It is a complaint heard again and again—an experiment that works in one researcher’s hands flops when others attempt to repeat it. In the January 27 Nature online, the National Institutes of Health in Bethesda, Maryland, acknowledges disquiet over the reproducibility of published data and proposes several avenues to address it. “The NIH is deeply concerned about this problem,” write NIH Director Francis Collins and Principal Deputy Director Lawrence Tabak.
The issue has spread to the highest levels of science policy. This Friday, President Obama’s Council of Advisors on Science and Technology will hold a meeting on scientific reproducibility (see PCAST meeting agenda).
In their Nature essay, Collins and Tabak note that preclinical treatment studies, for example those using mouse models, are most likely to fail the reproducibility test. They blame this on a variety of factors, including include poor training in experimental design, overemphasis on publication in a few high-status journals, and a system that relegates negative data to file drawers instead of public venues. “Some scientists reputedly use a ‘secret sauce’ to make their experiments work—and withhold details from publication or describe them only vaguely to retain a competitive edge. What hope is there that other scientists will be able to build on such work to further biomedical progress?” Collins and Tabak write.
The NIH is developing a training program on experimental design. By the end of 2014, it will become part of mandatory training for postdocs in intramural NIH labs, and will be made available for other institutions to use.
NIH grant reviewers should keep an eye out for less-than-sound experimental design in grant proposals, Collins and Tabak write. Some of the Institutes are trying out a checklist to remind reviewers to look for important design features such as blinding in submitted proposals. Randomization and blinding of preclinical studies has become standard practice in industry, but it is not uniformly done in academic labs. The same is holds true for sample size calculations and having a statistical analysis plan drawn up before an experiment begins.
The agency is also piloting a program that assigns one or more reviewers to assess the published data used to support the research proposal. For example, these reviewers might determine that preclinical studies require validation before a clinical trial moves forward. By the end of 2014, the NIH will determine which of these approaches to apply in future grant evaluations.
In addition, the NIH wants to make scientific information, including negative data and critical comments, accessible outside of the standard publications. In December, it launched the online discussion forum PubMed Commons, which already has accumulated more than 700 comments on published work. The agency also has invited applications from researchers interested in developing a “Data Discovery Index,” which would provide unpublished primary data in a citable format. Applications are due March 6, 2014, and grants will be awarded next September.
Tabak and Collins emphasize that “reproducibility is not a problem that the NIH can tackle alone.” They deplore that academia encourages researchers to focus on publishing in only a few top-notch journals. Other leaders have also blamed those journals for focusing on manuscripts that maximize excitement and impact factor rather than papers with the most rigorous methodology. The 2013 Nobel laureate Randy Schekman of the University of California in Berkeley recently pledged to eschew what he called “luxury journals,” such as Science, Nature, and Cell (see Times Higher Education). To mitigate the effect of publication records in grant decisions, the NIH will consider asking applicants to specify their contributions to cited papers.
Journals are taking heed of this criticism. Nature recently rescinded space limits on its methods sections, developed a checklist to ensure that authors provide the experimental details that others would need to replicate the work, and planned to consult statisticians more regularly (see May 2013 news story). Earlier this month, Science announced that it would invite more statisticians to its Board of Reviewing Editors and expect authors to provide experimental details on randomization and blinding (McNutt, 2014). In 2012, the open-access publisher Public Library of Science and collaborators launched an initiative to repeat experiments and publish the results.
The Alzheimer’s field has seen its share of irreproducible results. Most of the time, an initially high-profile paper fades away quietly if others fail to reproduce the findings but never publish those data. Occasionally, negative replication attempts do make it into the published record. Eight laboratories reported being unable to reproduce a 2010 Nature study that suggested the cancer medication imatinib slowed Aβ production (see Jan 2014 news story and commentary). Several labs have struggled to replicate a 2003 finding that Aβ accumulates in the eye’s lens (see May 2013 news story). In both cases, authors of the original findings pointed to differences in experimental technique for the diverging results. Some see standardization of research methods as a solution for translational research, and this has become a key issue for the Alzheimer’s Disease Neuroimaging Initiative and efforts to develop fluid biomarkers for use in the clinic (see Oct 2008 news story; Nov 2009 conference story).
Ultimately, no one player alone can fix the irreproducibility problem. “Success will come only with the full engagement of the entire biomedical-research enterprise,” write Collins and Tabak.—Amber Dance and Gabrielle Strobel
- Guidelines at Nature Aim to Stem Tide of Irreproducibility
- GSAP Revisited: Does It Really Play a Role in Processing Aβ?
- Not Seeing Eye to Eye: Do Lenses Accumulate Aβ?
- ADNI Results: A Story of Standardization and Science
- Worldwide Quality Control Set to Tame Biomarker Variation
- Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets?. Nat Rev Drug Discov. 2011 Sep;10(9):712. PubMed.
- Shineman DW, Basi GS, Bizon JL, Colton CA, Greenberg BD, Hollister BA, Lincecum J, Leblanc GG, Lee LB, Luo F, Morgan D, Morse I, Refolo LM, Riddell DR, Scearce-Levie K, Sweeney P, Yrjänheikki J, Fillit HM. Accelerating drug discovery for Alzheimer's disease: best practices for preclinical animal studies. Alzheimers Res Ther. 2011;3(5):28. PubMed.
- Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ, Larocca GM, Haendel MA. On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ. 2013;1 Epub 2013 Sep 5 PubMed.
- Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013 May;14(5) Epub 2013 Apr 10 PubMed.
- Russell JF. If a job is worth doing, it is worth doing twice. Nature. 2013 Apr 4;496(7443) PubMed.
- Landis SC, Amara SG, Asadullah K, Austin CP, Blumenstein R, Bradley EW, Crystal RG, Darnell RB, Ferrante RJ, Fillit H, Finkelstein R, Fisher M, Gendelman HE, Golub RM, Goudreau JL, Gross RA, Gubitz AK, Hesterlee SE, Howells DW, Huguenard J, Kelner K, Koroshetz W, Krainc D, Lazic SE, Levine MS, Macleod MR, McCall JM, Moxley RT, Narasimhan K, Noble LJ, Perrin S, Porter JD, Steward O, Unger E, Utz U, Silberberg SD. A call for transparent reporting to optimize the predictive value of preclinical research. Nature. 2012 Oct 11;490(7419):187-91. PubMed.
- Macleod M. Why animal research needs to improve. Nature. 2011 Sep 28;477(7366) PubMed.
- Mullard A. Reliability of 'new drug target' claims called into question. Nat Rev Drug Discov. 2011 Aug 31;10(9) PubMed.
- Macarthur D. Methods: Face up to false positives. Nature. 2012 Jul 25;487(7408) PubMed.
- Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012 Mar 28;483(7391) PubMed.
- Franzoni C, Scellato G, Stephan P. Science policy. Changing incentives to publish. Science. 2011 Aug 5;333(6043) PubMed.
- Bexarotene Revisited: Improves Mouse Memory But No Effect on Plaques
- Paris: Standardization a Hurdle for Spinal Fluid, Imaging Markers
- Are TDP-43 Mice Living Up to Expectations?
- Chicago—ALS Database Opens for Business