Introduction

As researchers push into the frontier of early-stage and preventative Alzheimer's disease clinical trials, they will have to measure whether a drug works in people who do not have dementia yet. Traditional outcome measures were designed for moderate AD, and do not detect cognitive change at early stages of the disease. Therefore, leaders of the Dominantly Inherited Alzheimer Network (DIAN), Alzheimer’s Prevention Initiative (API), and the Anti-Amyloid Treatment in Asymptomatic AD Trial (A4) each have analyzed separate sets of longitudinal data to find those test items that are most sensitive to change in presymptomatic disease. They combined these into new composites.

Pharmaceutical companies focusing on prodromal AD follow a similar strategy. These novel composites are being used in trials now underway, but the FDA has not yet formally validated them. It has issued a brand-new guidance on the topic, though. Does it jibe with what the trials are doing now? Will the agency accept change as measured by these batteries as sufficient for drug approval? With so many groups designing their own composites, will the field arrive at a common standard, or will it be all researchers for themselves for some time to come?

On 28 February, 2013, leaders from these initiatives addressed these questions. Suzanne Hendrix, Pentara Corporation, Salt Lake City, Utah, presented API’s approach, while Michael Donohue, University of California, San Diego, talked about the A4 trial, and Jason Hassenstab, Washington University in St. Louis, Missouri, covered DIAN’s method. Veronika Logovinsky, Eisai Inc., discussed her company’s work in the MCI arena, and Keith Wesnes, Bracket (formerly United BioSource Corporation), Goring, U.K., provided an independent perspective. Nick Kozauer from the Food and Drug Administration shared his viewpoint and answered questions about the new guidance.
  

  • Listen to the Webinar.

 

  • Michael Donohue's Presentation.

 

  • Jason Hassenstab's Presentation.

 

  • Suzanne Hendrix's Presentation.

 

  • Veronika Logovinsky's Presentation.

 

  • Keith Wesnes's Presentation.

 

Background

Background Text
By Madolyn Bowman Rogers

The Alzheimer’s field has undergone a sea change. Most researchers now agree that interventions need to start earlier. Pharmaceutical companies are increasingly turning their attention to people with mild cognitive impairment (MCI) in hopes that potential disease-modifying drugs will be effective in that population. Meanwhile, three large prevention initiatives—DIAN, API, and A4—aim even earlier, looking to see if therapies can delay the onset of clinical impairment in presymptomatic populations.

One conundrum facing these efforts is how to measure a drug effect. The FDA is unlikely to accept biomarker stabilization alone as grounds for drug approval, even less so since recent trials have suggested that biomarkers and cognitive endpoints do not necessarily move together (see ARF related news story).

Researchers will need to provide evidence of clinical improvement. However, established test batteries such as the ADAS-cog were designed for moderate AD. On many of its individual tests, nearly everyone in the early stages of disease performs at the ceiling. “Once you move into mild AD, the ADAS-cog starts to lose its sensitivity,” Veronika Logovinsky at Eisai Inc. in Woodcliff Lake, New Jersey, told Alzforum. Sensitivity flattens out even more in MCI. With existing tests, MCI trials would need to enroll thousands of people per arm to see efficacy, and would be prohibitively expensive, Logovinsky said.

Eisai has taken a different tack in a Phase 2 trial of the BAN2401 antibody, which targets Aβ protofibrils (see ARF related news story). The trial will enroll 800 people with MCI or mild AD. To develop a better outcome measure for this group, Logovinsky collaborated with Suzanne Hendrix, an independent biostatistician in Salt Lake City, Utah, who also works with API. The researchers analyzed data from previous MCI trials done by Eisai, the Alzheimer’s Disease Cooperative Study, and from ADNI to see how the individual components of each test performed. They looked for items that were the most sensitive to how a person’s cognition slipped over time, while showing the least variation from one person to another. Using this algorithm, they picked several items from the ADAS-cog, the MMSE, and the CDR, and combined them in a new composite outcome measure that they believe will be useful in this mildly symptomatic population (see ARF related news story).

Will early intervention trials become a Wild West, with every pharmaceutical company chasing its own? Many companies hope to avoid that. At the end of 2012, the Coalition Against Major Diseases (CAMD) started a project to work with regulatory groups on qualifying a single tool for use in longitudinal studies and trials. CAMD members include numerous pharmaceutical companies such as Eli Lilly, Genentech, and Novartis (see full list). Logovinsky will present Eisai’s clinical composite to the coalition to serve as a starting point for the process, she said.

Presymptomatic populations present an even greater problem, as people are still largely cognitively normal at this stage. To tackle it, API researchers Jessica Langbaum and Yui Ayutyanont of the Banner Alzheimer’s Institute in Phoenix, Arizona, teamed up with Hendrix to comb through longitudinal data from the Colombian API cohort as well as from the Rush Alzheimer’s Disease Center in Chicago, Illinois (see ARF related news story). They were the first to push into this frontier. They looked for test items that showed the most cognitive decline across two, five, or 10 years prior to a diagnosis of MCI or AD, Hendrix said. With the Colombian data, API researchers developed a composite battery that will be used in the upcoming trial of crenezumab in this population (see ARF related news story; ARF news story). The Rush data provided a slightly different composite that will be used in a planned prevention trial in people with the ApoE4 risk allele, Hendrix said.

For the presymptomatic stage, too, the field is moving toward collaboration. Hendrix sits on a DIAN committee and shares methodologies with Peter Snyder at Brown University, Providence, Rhode Island, who heads the DIAN cognitive core. They also talk with Michael Donohue at the University of California, San Diego, who is developing outcome measures for the A4 secondary prevention trial of asymptomatic people with biomarker evidence of Alzheimer’s disease (see ARF related news story). The A4 effort primarily draws upon data from ADNI and the Australian Imaging, Biomarkers, and Lifestyle (AIBL) Flagship Study of Ageing to create its preliminary composite.

Donohue encouraged DIAN leaders to include test items identified in the API cohort. The DIAN composite battery will be large and inclusive, containing items from the network’s own longitudinal dataset, and from AIBL as well as from API (see ARF related news story; ARF related news story). This will allow the researchers at each of the three initiatives to compare how different test items perform and see where they may be redundant and where they contribute unique information, Hendrix said. Ultimately, this may allow them to develop a single composite for use in presymptomatic populations.

Will the FDA accept any of these new measures as sufficient evidence of a drug’s efficacy? On 7 February 2013, the agency issued a statement on the matter. Called Guidance for Industry, Alzheimer’s Disease: Developing Drugs for the Treatment of Early Stage Disease, the document formalizes what regulatory scientists had been saying for some time. That is, because functional and global impairments are difficult to measure before the onset of overt dementia, the agency is dropping its longstanding requirement of a co-primary outcome and will be content with “clear evidence of an effect on delaying cognitive impairment.” This guidance considers the use of a validated composite scale to be appropriate for the prodromal/MCI stage. It particularly notes that an effect on a valid and reliable cognitive assessment can be sufficient for marketing approval. To expedite trials, AD researchers are diving in and using the new batteries. What will it take to formally validate them?

What role, if any, will computerized tests play in those trials? It’s an open question. DIAN is exploring computer tests, but most current composites draw solely from paper-and-pencil tests, as few historical data are available by which to judge computerized batteries. Some current trials, such as this Phase 2 Affiris trial, are using both types of test, which should help researchers compare their efficacy, Hendrix said. Another looming issue is how to create continuity across the disease spectrum. With different tests now in use or under development for presymptomatic, prodromal, and full-blown AD, researchers may have trouble tracing the cognitive decline of individual patients across the disease. At least some test items may have to overlap to provide continuity, Hendrix suggested.

Questions answered by Suzanne Hendrix, Pentara Corporation

Q: What's not clear in the presentations that compare a composite with other endpoints is, What data are used for the comparisons? For example, are the comparisons from the same dataset upon which the composites were developed, or are fully independent datasets used?

A: In my presentation, the composites were applied to the same dataset upon which the composite was developed. With more time, I would have shown results from a bootstrap sampling process where we used half the sample to create a composite and half to test the composite, giving us estimates of bias for which we could correct the estimates. We are now in the process of applying the composites to fully independent datasets. Specifically, for the Eisai analyses in MCI, the data from different studies differed sufficiently from each other (due to entry criteria, patient population differences, and study site differences) that the data from one study couldn't be used to create a composite that performed well in another study. When we pooled across all of the studies (or even more than one study), the results were much more generalizable to new datasets. The combination based on data from all four studies was tested on individual studies, specific populations within studies or across studies, and split half resampling. It showed minimal bias (less than 10 percent), although it was clear that relying on a single dataset (such as ADNI) could produce very different results. If companies have placebo groups from MCI studies on which they'd like to test the composite, we'd welcome those additional data.

Q: If the need is to look for measurement of Alzheimer's disease—a progressive disease—it seems that the standard should be rate of change over time, with specific reference to the location in the disease course, and focusing on change as the central issue. Have the developers considered using this central metric to select test items?

A: That is the standard that we use, but since the items are all on different scales, the rate of change has to be standardized to get rid of the units and give us something we are able to compare across different items. For instance, a one-point change on the CDR is not the same as one point change due to wordlist recall on the ADAS-cog. The ADAS-cog wordlist would show a faster rate of change than a CDR item, just because the points represent smaller increments of decline. We have chosen to divide by the standard deviation as the method of standardizing the scores (mean to standard deviation ratio = MSDR = the mean change over time divided by the standard deviation of the change over time; or the reciprocal, which is the coefficient of variation––CV). After this standardization, we do focus on the standardized rate of change (MSDR) with specific reference to the location in the disease course.

Q: So is a six-month delay in cognitive decline enough, or is one year enough? What's the minimal amount of time that is relevant? And have you been able to convert [cognitive test results] into time yet?

A: Common lengths for clinical trials of potential disease-modifying therapies in MCI would be one year, 18 months, two years, or three years, and studies are often powered to detect a 50 percent effect, i.e., 50 percent slowing of the rate of decline from placebo, in Phase 2, or a 25 percent effect in Phase 3. (A Phase 2 study is rarely powered for the smallest relevant effect size.) A 25 percent effect is the smallest I've heard discussed as one that could be considered clinically meaningful in the context of a disease-modifying treatment. This effect size translates into a three-month savings in a one-year study, a 4.5-month savings in an 18-month study, a six-month savings in a two-year study, and a nine-month savings in a three-year study.

Q: Can we consider the possibility that there are specific domain functions in the different stages of the disease development? And how we can classify this specificity?

A: In the analyses of the Rush data, we looked specifically at the different items that change an average of five years prior to diagnosis and two years prior to diagnosis. Additional domains are changing as patients get closer to diagnosis and, of course, after an MCI diagnosis. As some domains become more important, other, earlier changing domains become relatively less important. How these differences are assessed depends on the type of study. If the goal is to enroll high-risk individuals prior to an MCI diagnosis and then follow them for change over time, it would be important to have an idea of the mixture of individuals that would be recruited and then find a composite that declines over that group of individuals. It would likely be different if we were able to target them just prior to diagnosis, compared to having individuals with a wide range of times prior to diagnosis.

Q: I note that you did not include specific fluency or language [tests] such as categories or Boston Naming. Were these regarded as less informative or specific?

A: There were several of these included in the Rush data and in the suggested combinations. They all performed quite well, but we used a competitive model, and Category Fluency (Fruits) and Category Fluency (Animals) outperformed Boston Naming.

Q: My family carries a gene for autosomal-dominant AD. My emotional stress is intense when I participate in research studies. Are you able to separate this stress from actual cognitive decline? Emotional, psychological, and genetic counseling is an important component to retention of research participants.

A: Thank you so much for your participation in these often burdensome and personally stressful studies. We are not necessarily able to separate out the effects of stress, but statistically, we commonly assume that these types of issues affect the placebo group and active group similarly, so that the group comparisons are not affected. The bigger problem statistically is the risk of dropouts from the studies due to this stress. Dropouts in studies are problematic, especially in long studies, and can seriously affect the results.

Comments

Make a Comment

To make a comment you must login or register.

Comments on this content

  1. This is a timely discussion, as there is currently a lot of excitement around this topic. Pharma, academia, and regulatory agencies are all trying to work together to make progress in this area. The idea of deconstructing specific instruments like the ADAS-cog and MMSE has some history. However, the idea of combining certain subscales from different instruments to create more sensitive composite endpoints for clinical trials in early disease stages of AD, such as prodromal and preclinical AD, is quite recent, to my knowledge.

    <p>We at Janssen presented this concept at the FDA-Industry Workshop in Sept 2011, where Drs. Rusty Katz and Sue-Jane Wang from the FDA were invited panelists at our <a href="http://www.amstat.org/meetings/fdaworkshop/index.cfm?fuseaction=Abstract... target="_new">session</a>. We also presented it that same year at CTAD 2011. We have since published this work in Alzheimer's &amp; Dementia (Raghavan et al., 2013).

    </p><p>Several pharma companies have now come together to work precompetitively in the Clinical Endpoints Working Group within the ADNI PPSB. I co-lead this Working Group, along with Dr. Veronika Logovinsky at Eisai. Among other things, we plan to evaluate a number of endpoints since proposed by several companies on a variety of datasets.

    References:

    . The ADAS-Cog revisited: Novel composite scales based on ADAS-Cog to improve efficiency in MCI and early AD trials. Alzheimers Dement. 2013 Feb;9(1 Suppl):S21-31. PubMed.

  2. Neuropsychiatric drug discovery and development are undergoing a severe crisis, as many high-profile potential drugs with novel mechanisms of action fail to show clinical efficacy at Phase 3. These failures are prompting large pharmaceutical companies to deprioritize neuropsychiatry as a research area, and resulting in skepticism and further disinvestment. While drug discovery and development for psychiatric disorders have been at the forefront of this disillusionment, novel pharmacological treatments for neurological disorders, especially those with a strong behavioral component, have also borne more than their fair share of failures. Novel treatments for Alzheimer's disease have not been spared. There is no shortage of targets for potential treatment of AD. However, what symptomatic treatments there are have limited use, and treatments aimed at preventing or reversing neurodegenerative processes thought to underlie the cognitive and behavioral consequences of this disorder have not been able to demonstrate clinical efficacy.

    <p>Essentially, preclinical and clinical investigators speak completely different languages during the discovery and development life of a potential drug. Preclinical studies establish potential clinical efficacy through precise and accurate changes in biochemistry, physiology, and objective measures of changes in behavior in model systems. Neuropsychiatric clinical trials, on the other hand, are required to show human clinical (behavioral) improvement in a heterogeneous patient population, usually through clinical rating scales. These scales are associated with a large placebo component influenced, for example, by the level of training of individual raters, as well as language and cultural differences when scales are administered across global, multiple-site trials.

    </p><p>Closer interaction between preclinical and clinical investigators to develop common endpoints for assessing the effects of a drug on physical and behavioral aspects of a disorder is essential to close the translational gap. Biomarkers are helping to provide these common endpoints, but too few biomarkers are actually validated. To be useful, biomarkers of a disorder need to be validated using the same criteria of face, construct, and predictive validity as applied to other models of the disorder, especially when trying to relate physical amelioration of the disorder by the drug with changes in behavior or cognition. Indeed, while it may be possible for a compound to reverse or alter some physiological correlates of a disorder such as Alzheimer's, if the patient's cognitive and behavioral impairments are no better, that compound will never be a drug. Integrating more objective cross-species measures of behavioral correlates, such as species- and cultural-free cognitive testing with related physical markers such as neuroimaging, throughout the discovery and development process will help provide both the construct and predictive validity of biomarkers and animal models of neuropsychiatric disorders such as Alzheimer's. Hopefully, this integration will reduce the clinical trial attrition of potential neuropsychiatric drugs.

    </p><p>The take-home messages are the following:

    </p><p>1. More objective endpoints must be integrated into neuropsychiatric clinical trial design.

    </p><p>2. Biomarkers may provide a way of assessing disease/disorder diagnosis, progress, and, to an extent, the effects of a drug on the disease/disorder.

    </p><p>3. Changes in biomarkers must be consistent with changes in patient well-being and cognitive functioning (clinical efficacy); otherwise, their construct and predictive validity is limited.

    </p><p>4. If animal models for neuropsychiatric disorders are to have greater predictive validity and be useful as tools for CNS drug discovery and development, then the physiological and behavioral changes observed in these models in response to drug treatment should be consistent across species, and ultimately in the clinic.

    References:

    . Many are called, yet few are chosen. Are neuropsychiatric clinical trials letting us down?. Drug Discov Today. 2011 Mar;16(5-6):173-5. PubMed.

  3. A very interesting discussion in an area of key unmet need. I'm often asked by clients whether they should use traditional paper-and-pencil measures <em>or</em> computerized tests. I don't think it is a case of one or the other, but should instead be about selecting the best tests for the job. My own inclination would be to focus on the domains suggested for investigation by the European Task Force for AD, specifically episodic memory, working memory, and executive function (Vellas et al., 2008). When picking tests, I'm heavily influenced by the guidance notes that Keith Wesnes mentioned (Ferris et al., 1997). A final comment: Administration of the Clinical Dementia Rating scale requires well-trained, experienced raters. Clinical trial sponsors will need to be sure that their CDR raters are as good as those at the specialist academic sites in order to reproduce the same high-quality data.

    References:

    . Endpoints for trials in Alzheimer's disease: a European task force consensus. Lancet Neurol. 2008 May;7(5):436-50. PubMed.

    . Objective psychometric tests in clinical trials of dementia drugs. Position paper from the International Working Group on Harmonization of Dementia Drug Guidelines. Alzheimer Dis Assoc Disord. 1997;11 Suppl 3:34-8. PubMed.

References

News Citations

  1. CTAD: New Data on Sola, Bapi, Spark Theragnostics Debate
  2. Barcelona: Antibody to Sweep Up Aβ Protofibrils in Human Brain
  3. CTAD: Adaptive Antibody Trial to Try Bayesian Statistics
  4. Detecting Familial AD Ever Earlier: Subtle Memory Signs 15 Years Before
  5. NIH Director Announces $100M Prevention Trial of Genentech Antibody
  6. A Close Look at Passive Immunotherapy Newbie, Crenezumab
  7. Solanezumab Selected for Alzheimer’s A4 Prevention Trial
  8. DIAN Grows, Gets Ready for Therapeutic Trials
  9. DIAN Trial Picks Gantenerumab, Solanezumab, Maybe BACE Inhibitor

Other Citations

  1. See questions answered by Suzanne Hendrix

External Citations

  1. Phase 2 trial
  2. Coalition Against Major Diseases
  3. full list
  4. Australian Imaging, Biomarkers, and Lifestyle
  5. Guidance for Industry, Alzheimer’s Disease: Developing Drugs for the Treatment of Early Stage Disease
  6. Phase 2 Affiris trial

Further Reading

Papers

  1. . Nasal administration of amyloid-beta peptide decreases cerebral amyloid burden in a mouse model of Alzheimer's disease. Ann Neurol. 2000 Oct;48(4):567-79. PubMed.
  2. . Immunization for Alzheimer's disease: a shot in the arm or a whiff?. Ann Neurol. 2000 Oct;48(4):553-5. PubMed.