Echoing the past phase of irreproducible genetic association studies, an inability to replicate relationships between brain structure, function, and behavior besets brain imaging researchers today. How can they up their game to be sure brain-wide associations are true? Just like geneticists did when they built massive samples of many thousands of participants, say Nico Dosenbach, Washington University School of Medicine, St. Louis, and Damien Fair, University of Minnesota Medical School, Minneapolis. In the March 16 Nature, they reported that effect sizes shrank, false-negative rates dropped, and reproducibility rose with bigger sample sizes as they analyzed 48,500 MRI scans from three large datasets of healthy people.

  • Thousands of healthy participants are needed to find reliable brain-wide associations.
  • AD researchers say smaller neuroimaging studies of dementia can be reliable.
  • A large brain scan dataset is being compiled for meta-analysis.

“This confirms, supports, and emphasizes the suspicion that many investigators have, that the results of small studies might lead to chance findings,” Michael Weiner, University of California, San Francisco, wrote (full comment below). Nick Fox, University College London, agreed. “This important paper warns about the dangers of underpowered cross-sectional brain-behavioral association studies, with small, often over-ambitious and over-interpreted studies yielding inflated associations that cannot be replicated,” he wrote to Alzforum.

That said, some of the work needed to achieve larger sample sizes is already underway for Alzheimer’s and other brain diseases. Since 2009, researchers led by Paul Thompson, University of Southern California, Los Angeles, have been compiling neuroimaging and genomic datasets for meta-analysis through the Enhancing NeuroImaging Genetics through Meta-Analysis Consortium. “ENIGMA has put brain disease imaging on a firmer footing by enabling researchers who are collecting smaller cohort studies to team up with others to identify how robust their findings are,” Thompson wrote (full comment below).

Bigger is better when it comes to sample size, especially when trying to tease out subtle relationships between the brain and some aspect of cognition. However, the high cost of MRI limits most current brain-wide association studies (BWAS) to fewer than 100 participants. How many people must researchers scan to detect reproducible brain-behavior relationships?

To find out, co-first authors Scott Marek of WashU and Brenden Tervo-Clemmens, now at Harvard Medical School, Boston, who had worked with Marek in graduate school, turned to three large neuroimaging datasets. They accessed MRI scans and cognitive data from 35,735 healthy adults ages 40 to 69 in the U.K. Biobank, 11,874 children aged 9 or 10 from the U.S. Adolescent Brain Cognitive Development (ABCD) study, and 1,200 22- to 35-year-olds from the U.S. Human Connectome Project (HCP, see Oct 2012 conference news). They correlated brain structure and function measurements, such as resting-state functional connectivity, to cognitive variables, such as composite score on the NIH Toolbox, to evaluate how effect size and reproducibility changed as the sample grew in size.

Effect sizes of brain-behavior correlations were similar across same-sized samples from each databank. This was so despite the participants’ 60-year age span, different MRI scanners, and scan durations. Correlations weakened as the number of participants grew, hinting that brain-behavior relationships are likely exaggerated in small samples. Applying a stringent p-value threshold worsened the problem, as it led scientists to focus on the most inflated and often incorrect effects. False-negative rates were staggeringly high, causing researchers to miss nearly all true associations in groups of fewer than 1,000 people, versus a third in a 4,000-person sample.

Attempts at replicating correlations in independent samples—a standard practice in modern GWAS research—became slightly more successful as sample sizes grew. For example, a weak correlation between resting-state functional connectivity and cognitive ability was reproduced just 5 percent of the time between groups of 500 people or less, versus up to 25 percent of the time between groups of 2,000 participants.

Huge statistical error bars in small cohorts can easily flip conclusions. For example, in a sample of 25 people, the 99 percent confidence interval of the effect size was ±0.52, meaning that two independent samples could give opposite conclusions about the same brain-behavior association.

Fox noted that the brain’s tremendous heterogeneity between one person and the next weakens reproducibility. Betty Tijms, VU University, Amsterdam, concurred, saying that correlations and reproducibility are especially weak when studying relationships between brain function and cognition in healthy people (full comment below).

More People, Firmer Data. In larger studies, correlations become more reproducible (red), while errors such as false-negative rates drop (yellow), as does variability among samples within the dataset such as confidence interval (orange). [Courtesy of Marek et al., Nature, 2022.]

What does this mean for Alzheimer's disease neuroimaging studies? The problem is real but not quite as serious as it is for basic research on intelligence, for example.

“Care should be taken to generalize these findings to research on AD, where effects might be more pronounced and do not require large sample sizes,” Tijms wrote. Andre Altmann, University College London, agreed. He noted that past neuroimaging studies on fewer than 100 participants have reliably mapped regional atrophy in AD (full comments below). Likewise, Marcus Raichle of Wash U thinks the present work does not diminish the integrity of previous AD research using double-digit numbers of participants, because researchers were looking at disease progression rather than the subtle psychological correlations between people that this paper assessed. The authors themselves note that intraindividual change or intervention studies tend to show larger effects than cross-sectional studies. People’s brains change whereas their genomes are immutable; that, too, is a reason why GWAS studies have to be huge.

Dementia studies do require large samples when effects are subtle or when there are few affected individuals, wrote Thompson. This, too, is analogous to GWAS, which readily pick out large effects or common variants but need big samples for small or rare ones. In GWAS, samples have grown to more than one million (Lee et al., 2018). 

Through the ENIGMA Consortium, Thompson and colleagues at 700 institutions worldwide have collected more than 120,000 MRI scans, as well as genetic data, from tens of thousands of people (reviewed by Thompson et al., 2022; Apr 2012 news). Using standardized processing of imaging and genetic data, scientists are studying more than 30 brain disorders, including AD, Parkinson’s disease, and frontotemporal dementia.

Tomas Paus, University of Montreal, Canada, said brain imaging samples can be aggregated for meta-analyses with methods that preserve their respective studies' ways of assessing the brain and behavior (full comment below).

ENIGMA has enabled scientists to determine how few participants were needed to find robust patterns from the aggregate data. For example, after analyzing MRI scans from 2,350 people with Parkinson’s disease, a robust and distinctive cortical thinning signature emerged from scans of only a few hundred people.

This is remarkable because PD had, for many years of individually analyzed, smaller MRI studies, given scientists the discouraging message of “move on, nothing to see here.” This finding supports the notion that pronounced effects of neurodegenerative disease can be seen with achievable sample sizes, Thompson said (Laansma et al., 2021).—Chelsea Weidman Burke

Comments

  1. This is an interesting paper. One overall goal of our field is to determine how we can use brain MRI information, together with other information about the participants, to gain knowledge concerning the brain regions which are involved in specific functions, and how they are affected by disease, and by treatments. Many of the reports made over the years have not been replicated. This very careful study demonstrates that the more data that is available, the more robust the findings are. This confirms, supports, and emphasizes the suspicion that many investigators have had that the results of small studies might lead to chance findings, or findings limited to a population subset, which would not stand the test of time. Therefore more large-scale studies, and data-sharing which facilitates amalgamation of data from separate studies, are to be encouraged.

    An important issue, not directly addressed by this paper, is the “generalizability” of such findings. The vast majority of brain-imaging studies come from projects in which the participants volunteer to participate. Such studies tend to over-select from populations who have the interest and time to participate in research, that is, people with higher socioeconomic status, who are more affluent. To obtain truly generalizable findings, we need more large-scale brain imaging studies from participants from epidemiologically sampled populations such as the Health and Retirement study, the Framingham study, or the Mayo Clinic Study of Aging.

  2. Marek et al. are to be commended on their timely reminder of the large sample sizes often required to obtain reliable and reproducible findings in brain imaging studies. With this in mind, we formed the ENIGMA Consortium over a decade ago, and a recent special issue of Human Brain Mapping (Thompson et al., 2022) showcases more than 30 papers pooling brain-imaging data from across the world, in diverse populations from over 45 countries, often yielding the largest neuroimaging studies of over 30 major brain diseases and conditions. What we learned regarding reproducibility may offer some relief from the reproducibility crisis, and the concerns expressed by Marek et al. ENIGMA has benefited from the power of global collaboration to put the neuroimaging of disease on a firmer footing. It offers a solution for those collecting smaller cohort studies to team up with others worldwide to identify how robust their findings are, but also to discover crucial modulating factors that influence disease risk and progression in the brain. Some examples of what we learned by pooling tens of thousands of brain scans worldwide are below.

    1. There are robust brain correlates of major brain diseases. By pooling brain disease data from more than 45 countries and more than 30 disorders, ENIGMA published the largest neuroimaging studies of Parkinson’s disease, epilepsy, schizophrenia, bipolar disorder, major depressive disorder, and PTSD (reviewed in Thompson et al., 2020). The general message is that the characteristic signatures of brain disease are highly robust and reproducible worldwide. Independent consortia in Japan are finding almost the exact same patterns of disease effects on the brain, and even the same rankings for brain biomarkers from MRI and DTI (Koshiyama et al., 2022). To first order, this agreement is remarkable and has brought robustness and rigor to fields where small effects are the rule, and where modulators of disease are important to identify.
    2. Do we always need thousands of subjects to find anything reliable? No. ENIGMA’s studies allowed us to go back and see which of the findings in the largest studies of each disease would have been found if groups had just published on their own, and what factors affect the discoverability and consistency of disease biomarkers worldwide. In neurological disorders such as Parkinson’s disease, epilepsy, and ataxia, extremely robust effects are found, replicating and extending effects found in smaller studies of a few hundred people, but often revealing stage-specific patterns and unsuspected subtypes. Similarly, ENIGMA’s studies show huge effects on the brain for many rare genetic variants—cohorts were able to discover these effects in a few dozen cases, but larger studies were able to verify and confirm them. In the psychiatric neuroimaging field, ENIGMA’s largest neuroimaging studies to date of anorexia, schizophrenia, bipolar disorder, and PTSD all show robust effects on brain MRI, DTI, and resting state fMRI. Pooling tens of thousands of MRIs across sites and countries helped to both discover and verify patterns that were once hotly debated or controversial. By pooling data from hundreds of labs, we found that the patterns of structural brain asymmetry are consistent worldwide (Kong et al., 2022), but are subtly altered in schizophrenia (Schijven et al., 2022) and autism (Postema et al., 2019)—supporting a hypothesis that had been controversial since the 1970s. Other hypotheses that have recently been resolved with very large datasets include the presence of subtle brain differences in schizotypy (Kirschner et al., 2021), a mysterious protective pattern seen in unaffected siblings of people with bipolar disorder (de Zwarte et al., 2022), and the association of robust brain differences in hippocampal volume with epigenetic variations such as methylation (Jia et al., 2021). 
    3. Modulators of disease and ancestral diversity. Sometimes disease effects do differ in important ways across populations—depending on prevailing risk factors and a person’s ancestry, effects found in one population may not always be found in another. In ENIGMA’s studies of OCD, large-scale data pooling (Piras et al., 2021) resolved medication, duration of illness, and age of onset as largely accounting for the differences seen in the reported brain effects across cohorts worldwide—a field that had once been regarded as rife with inconsistencies. In ADHD, we found a subtle and shifting pattern of effects on the brain that varies with age—in a remarkable study pooling data from over 150 cohorts worldwide and comparing effects across OCD, ASD and ADHD (Boedhoe et al. 2020). 

    As noted by Marek et al., special design considerations are required—including very large samples (thousands or tens of thousands) when very high dimensional datasets are screened for associations. To avoid the risk of spurious associations when large sets of biomarkers are screened, a split sample—or an independent replication sample—is often used. Before anyone complains about the cost, great success has been achieved by pooling existing datasets across the world, after careful coordination and harmonization. This process is eased in ENIGMA by using standardized workflows that process datasets consistently, avoiding the methodological heterogeneity that often leads to discrepancies, known and unknown, in the published literature.

    When thousands or even millions of biomarkers are screened for associations at the same time, the risk of false positive associations increases; as Smith and Nichols (2018) note, this case is widely recognized in brain mapping, and there are already many widely used methods to mitigate it. Methods to meta-analyze brain maps and aggregate evidence across multiple brain mapping studies have been highly successful (BrainMap, Neurosynth, SDM, and meta-TBM). In genetics, the same multiple testing issues have led to the coordinated analysis of large samples to detect associations between genomic variation and key brain biomarkers. In our own GWAS studies, we screened over a million genomic markers to identify common genetic variants associated with brain measures derived from MRI (Grasby et al., 2020), and even with the rates of brain atrophy over time (Brouwer et al., 2022), offering key new genomic targets for drug discovery. These genetic risk effects are individually small but highly robust—much like the genetic risk loci for Alzheimer’s disease beyond APOE, which required thousands of subjects to discover, but much smaller samples to verify (when the penalty of multiple testing is reduced in a follow-up, independent replication phase). One great success of Alzheimer’s research and degenerative disease research in general is the quest to identify disease-related markers in the genome, and in brain images. Such efforts have often disentangled the heterogeneity of disease, and discovered disease subtypes that can be dealt with in a more personalized and targeted way.

    We propose the following remedies for the need for large samples in neuroimaging research. First of all, large samples are not always required: They are only required when the effects are small, or if a large number of associations are being tested at the same time, or if key modulators of the disease effect are suspected. All these situations will typically lead to different findings across cohorts, when cohorts are small (under 100 subjects). If the effects are small, or multiple tests are performed, or modulators are important (e.g., sex differences, or ancestry effects), then clubbing together as consortia will offer the power to discover and verify effects, and find key modulators. We have shown this repeatedly, in over 100 papers by ENIGMA over the last decade.

    The neurology and psychiatry fields have been immensely successful in this regard, with vast international efforts in genomics (ADSP, ADGC, PGC), and neuroimaging (ENIGMA). We thank Marek et al. for their call to action for better reproducibility in brain imaging, and we encourage teams worldwide to unite to discover and verify factors that resist the devastating diseases that affect us all today.

    References:

    . The Enhancing NeuroImaging Genetics through Meta-Analysis Consortium: 10 Years of Global Collaborations in Human Brain Mapping. Hum Brain Mapp. 2022 Jan;43(1):15-22. Epub 2021 Oct 6 PubMed.

    . ENIGMA and global neuroscience: A decade of large-scale studies of the brain in health and disease across more than 40 countries. Transl Psychiatry. 2020 Mar 20;10(1):100. PubMed.

    . Neuroimaging studies within Cognitive Genetics Collaborative Research Organization aiming to replicate and extend works of ENIGMA. Hum Brain Mapp. 2022 Jan;43(1):182-193. Epub 2020 Jun 5 PubMed.

    . Mapping brain asymmetry in health and disease through the ENIGMA consortium. Hum Brain Mapp. 2022 Jan;43(1):167-181. Epub 2020 May 18 PubMed.

    . Large-scale analysis of structural brain asymmetries in schizophrenia via the ENIGMA consortium. medRxiv, March 11, 2022. medRxiv

    . Author Correction: Altered structural brain asymmetry in autism spectrum disorder in a study of 54 datasets. Nat Commun. 2021 Dec 8;12(1):7260. PubMed.

    . Cortical and subcortical neuroanatomical signatures of schizotypy in 3004 individuals assessed in a worldwide ENIGMA study. Mol Psychiatry. 2021 Oct 27; PubMed.

    . Intelligence, educational attainment, and brain structure in those at familial high-risk for schizophrenia or bipolar disorder. Hum Brain Mapp. 2022 Jan;43(1):414-430. Epub 2020 Oct 7 PubMed.

    . Epigenome-wide meta-analysis of blood DNA methylation and its association with subcortical volumes: findings from the ENIGMA Epigenetics Working Group. Mol Psychiatry. 2021 Aug;26(8):3884-3895. Epub 2019 Dec 6 PubMed.

    . White matter microstructure and its relation to clinical features of obsessive-compulsive disorder: findings from the ENIGMA OCD Working Group. Transl Psychiatry. 2021 Mar 17;11(1):173. PubMed.

    . Subcortical Brain Volume, Regional Cortical Thickness, and Cortical Surface Area Across Disorders: Findings From the ENIGMA ADHD, ASD, and OCD Working Groups. Am J Psychiatry. 2020 Sep 1;177(9):834-843. Epub 2020 Jun 16 PubMed.

    . Statistical Challenges in "Big Data" Human Neuroimaging. Neuron. 2018 Jan 17;97(2):263-268. PubMed.

    . The genetic architecture of the human cerebral cortex. Science. 2020 Mar 20;367(6484) PubMed.

    . Age-dependent genetic variants associated with longitudinal changes in brain structure across the lifespan. bioRxiv, November 8, 2021 bioRxiv

  3. This study neatly and systematically evaluates the boundaries of effect sizes that can be expected for varying sample sizes in Brain Wide Association Studies (BWAS) with cognitive or mental health measures by making use of three large, publicly available datasets (ABCD, HPC UKB) on mostly normal individuals.

    They find that when sample sizes are small, effect sizes are probably an exaggeration of the real effect, since effect sizes decreased when replicated in larger samples. Importantly, applying a stringent p-value threshold will not solve this problem, and will lead to bias for inflated effects. This indicates that when sample sizes are small, all results should be reported, and researchers should not fixate on p-values only. 

    The authors further compared effect sizes for small subsets of the data with effects observed in the total large data. Striking from those analyses is that the false-negative rate remained very high for very large sample sizes that included more than 2,000 individuals. This puts us between Scylla and Charybdis: Small and robust effect sizes may help understanding brain-phenotype relationships at a group level, but may not apply to single participants; whereas high effect sizes in small samples may exaggerate potential clinical meaningfulness. 

    In general, effect sizes in this study were small, which may reflect that investigating brain-phenotype relationships in normal individuals is a hard thing to do. It can be questioned whether performance on different cognitive tests would map onto the same brain areas, which is here assumed by the use of a composite score. Individuals show complex differences in brain-phenotype relationships, which can change with older age, and it is conceivable that small effect sizes echo this inter-individual heterogeneity. Here the ground truth was assumed be the effect observed in the largest group, but in fact this is unknown.

    Care should be taken to generalize these findings to research on Alzheimer’s disease, where effects might be more pronounced and do not require large sample sizes. Hippocampal atrophy can be seen with the naked eye and has been associated with memory decline in many studies. It would be interesting if this exercise could be repeated using a known, possibly artificial, ground truth to find ideal sample sizes for a given effect size.

  4. This is a very interesting article and a long-needed study and a great reference for the future. It demonstrates that the laws of (sample size and effect size in) statistics are universal and make no difference between the type of data. Thus, BWAS suffers from the same problems as GWAS and likely the remedies are similar: more data and replications in out-of-sample cohorts.

    We knew that small sample sizes are more likely, due to sampling bias, to produce more extreme effect sizes, i.e., significant results. These typically fail to replicate. However, what surprised me was that our common strategy to limit false-positives in the first place, that is, to apply a more rigorous statistical threshold, makes things even worse!

    For studies of AD, the takeaway is twofold. Firstly, we can assume that effect sizes of dementia on the brain are stronger than the cognitive phenotype examined here, therefore the sample-size requirements may be in the hundreds instead of thousands. Indeed, studies in the past have reliably mapped hallmark features of AD such as regional atrophy. However, subtle, or secondary, cognitive effects of the disease that are more varied between patients may indeed require thousands of participants to reliably map in order to better understand the disease’s heterogeneity. Secondly, the fundamental takeaway is that our study designs should include replication cohorts, besides aiming to maximize the expected effect sizes by focusing on longitudinal designs and the most promising imaging modalities.

  5. This important paper highlights both the benefits and challenges of open science. The benefits are clear—many samples that are widely available provide statistical power and opportunities for replication. However, there are challenges, mostly associated with using many exploratory correlations across multiple metrics both within and between the brain and behavior. The many aspects to measure, and ways to do so, make it difficult for the field to rely on a handful of large studies, each making its own choices about what to measure. As an alternative, researchers can aggregate smaller samples through meta-analyses, keeping the variety of brain and behavior phenotypes.

    As a side note, I argue against imposing committee-driven harmonization when studying brain and behavior because of the complexity of both that, in turn, requires a wide variety of approaches. Different forms of meta-analyses, such as the Activation Likelihood Estimates method and the ENIGMA and CHARGE consortia, have been very helpful in this regard.

    It is noteworthy that, although important, statistical power alone does not replace the need for asking the right question. Researchers must thoughtfully use large datasets to test specific hypotheses grounded in biology (brain) and psychology (behavior), taking advantage of existing knowledge gathered at multiple levels in various species, including humans, over the past 100+ years.

    Conceptually, brain-wide association studies are quite different from genome-wide association studies. The GWAS approach has a solid foundation: It is easily assessed in a highly standardized manner (e.g., SNP arrays), and inter-individual variations in just four nucleotides across the genome are correlated with various phenotypes. On the other hand, one can argue that BWAS is a correlation between a series of open-ended constructs on both sides (e.g., various metrics of “functional connectivity” against various, often latent, metrics of cognition or mental health). Furthermore, unlike the reasonably noise-free genomic variations of genotyping, phenotyping of the brain and behavior is noisy.

  6. This study is important because it shows that the concept of BWAS for complex phenotypes such as psychopathology and cognitive screening test composites may be principally flawed. In my view, increasing funding to obtain GWAS-like MRI sample sizes for hypothesis-free research on complex phenotypes in unstratified populations does not seem to be the best investment of resources. An important value of large cohorts is the ability to stratify for specific genetic, clinical, or socio-demographic or other phenotypes for hypothesis-driven brain imaging studies. This is, in my view, the predominant approach in the Alzheimer’s brain imaging field.

    In fact, the authors have been quite balanced in making a distinction between hypothesis-driven brain imaging research in stratified samples and the problems associated with BWAS discovery studies. We should make sure that this message does not get lost while digesting the important learnings from this study.

  7. This is a timely and very important study. It confirmed several of our previous findings obtained in searching for associations between brain structure and behavior in healthy populations, but the current study further shows that the concerns we previously raised similarly apply to associations between brain connectivity and behavior. In particular, effect sizes are inflated in small samples (Kharabian Masouleh et al., 2019), and examining the same question in two different samples can lead to opposite conclusions, e.g., the sign of the associations can be flipped (Genon et al., 2017). Altogether, these works highlight important challenges in neuroimaging research, especially when aiming to relate brain to behavior in healthy populations.

    First, neuroimaging data, as indirect measurements of either brain structure or brain connectivity, are particularly noisy. We can reasonably wonder to which extent the interindividual variability that we measure in brain structure is valid in healthy populations. Similar concerns apply to psychometric data in healthy populations. For instance, we may wonder whether slight differences in the score on a list-learning task between two participants, or differences in personality scores, truly reflect differences in cognitive abilities or stable differences in behavior between these participants, respectively, allowing inferences on their neural correlates.

    Data and research in clinical populations, such as patients with Alzheimer’s disease, are affected to a lesser extent by these concerns. Indeed, we have previously shown that brain structural-behavior associations have better replicability rates in these populations (Kharabian Masouleh et al., 2019). This is likely because the current methods in the field well capture atrophy, as well as behavioral deficits in these populations. This leads to greater effect sizes than those observed for associations in healthy populations, hence yielding more robust findings. Thus, improving the validity of neuroimaging markers of brain structure and function would be an important perspective for progresses in understanding brain-behavior associations.

    Second, sampling variability is a crucial issue in neuroimaging research, like in any research field. The present study well demonstrated that extremely large cohorts are needed to identify robust brain-behavior associations. With cohorts of thousands of participants, a strict cross-validation scheme can be implemented to evaluate the generalizability of the findings. However, even with a within-cohort cross-validation setting, when the data have been acquired at a single site using a unique research protocol, the question of generalizability to a completely independent dataset remains open.

    Researchers have to remain aware of the characteristics of their samples and, thus, of potential bias. For instance, most datasets in our fields are dominated by one type of population, and the model derived for these samples may not generalize to some minorities in the general populations (Li et al., 2022). So, there is still a lot of work to be done to derive brain-behavior models that will be useful and accurate for single individuals.

    References:

    . Searching for behavior relating to grey matter volume in a-priori defined right dorsal premotor regions: Lessons learned. Neuroimage. 2017 Aug 15;157:144-156. Epub 2017 May 25 PubMed.

    . Empirical examination of the replicability of associations between brain structure and psychological variables. Elife. 2019 Mar 13;8 PubMed.

    . Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity. Sci Adv. 2022 Mar 18;8(11):eabj1812. Epub 2022 Mar 16 PubMed.

  8. The work by Marek and colleagues is an outstanding demonstration of a discussion that has been ongoing for the better part of a decade in the fields of psychology and cognitive neuroscience. Often deemed the "replication crisis," there is recognition that many behavioral and neuroimaging studies do not replicate (Open Science Collaboration, 2015; Camerer et al., 2018; Elliott et al., 2020). This lack of replication is often tied to small sample sizes leading to inadequate power (Button et al., 2013), and the ultimate result is poor science. Given how widespread and systematic such issues are across scientific specializations, it is unlikely that studies of neurodegenerative diseases such as Alzheimer's are free of such concerns. In fact, the prevalence of case studies or case series in more clinically oriented fields may exacerbate such issues.

    There are a number of factors, though, that are in our favor. The first is that the overall power in a given study is not simply due to sample size, but also the size of the effect being studied. Many disease effects are orders of magnitude larger than the subtle behavioral associations investigated by Marek and colleagues. For example, detecting atrophy in individuals with an AD diagnosis, or abnormal amyloid values in APOE E4 carriers, are robust, strong effects that can be seen in a handful of individuals. Statistical concerns become more prominent when analyses are focused on individual differences, which are likely to be subtle, as well as increasingly complex statistical models (i.e., three-way interactions).

    Another strength of our field is that the multiple large cohorts being studied in the U.S., Europe, and Asia can inherently lead to a high degree of replication and convergence. While there is often emphasis placed on being the first to show a finding, as a field we should appreciate the utility that replication and reaching a consensus bring. There is also a growing push to make data from these large cohorts more accessible so that studies do have increasingly large Ns. Still, the most important thing we can do is to ask reasonable, well-thought-out scientific questions. Larger and larger samples are not helpful if the base science being pursued is inherently flawed or ill planned.

    References:

    . PSYCHOLOGY. Estimating the reproducibility of psychological science. Science. 2015 Aug 28;349(6251):aac4716. PubMed.

    . Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav. 2018 Sep;2(9):637-644. Epub 2018 Aug 27 PubMed.

    . What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis. Psychol Sci. 2020 Jul;31(7):792-806. Epub 2020 Jun 3 PubMed.

    . Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013 May;14(5):365-76. Epub 2013 Apr 10 PubMed.

Make a Comment

To make a comment you must login or register.

References

News Citations

  1. The Human Connectome Project: Deciphering the Brain’s Wiring
  2. GWAS and Anatomy—Pooled Data Finger Genes Driving Brain Size, IQ

Paper Citations

  1. . Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet. 2018 Jul 23;50(8):1112-1121. PubMed.
  2. . The Enhancing NeuroImaging Genetics through Meta-Analysis Consortium: 10 Years of Global Collaborations in Human Brain Mapping. Hum Brain Mapp. 2022 Jan;43(1):15-22. Epub 2021 Oct 6 PubMed.
  3. . International Multicenter Analysis of Brain Structure Across Clinical Stages of Parkinson's Disease. Mov Disord. 2021 Nov;36(11):2583-2594. Epub 2021 Jul 20 PubMed.

External Citations

  1. Enhancing NeuroImaging Genetics through Meta-Analysis Consortium
  2. NIH Toolbox

Further Reading

No Available Further Reading

Primary Papers

  1. . Reproducible brain-wide association studies require thousands of individuals. Nature. 2022 Mar 16; PubMed.