Biomarkers and Efficacy: Not (Yet?) a Perfect Union

Go to another part

Series - Alzheimer's Association International Conference (AAIC) - 2023:

Quick Links

Article
Comments
References
Further Reading

Tools

Add to my Library
Follow News Comments
Share

How would you like to share?
Facebook Twitter LinkedIn

22 Aug 2023

In amyloid immunotherapy trials to date, biomarker change correlates with clinical efficacy, but the relationship is far from perfect. Plaque can fall and fluid biomarkers normalize without a statistically significant clinical benefit, as in the Phase 3 gantenerumab trials. Or, cognitive decline can slow without a change in tau PET signal, as in the Phase 3 donanemab trial. At last month’s AAIC in Amsterdam, researchers wrestled to understand the link between biomarkers and efficacy, emphasizing the need to define these relationships better. Efforts are underway to model the associations between biomarkers and cognition, and to develop a standardized scale for comparing tau PET scans done with different tracers.

Biomarkers do not always track with efficacy, for example in gantenerumab Phase 3.
In updated model of biomarker relationships, plaque removal weakly affects tangles.
Researchers are constructing a unified CenTauR scale for tau PET.

“We have to put parameters on biomarker change,” Rachelle Doody of Roche said in Amsterdam.

First, evidence for correlation. Katherine Dawson of Biogen reported that in the positive Emerge trial of aducanumab, a drop in p-tau181 in both cerebrospinal fluid and blood correlated with less decline on cognitive and functional tests. In plasma, the correlations were weak, ranging from 0.11 to 0.21. In CSF, they were stronger, with a high of 0.44 for slowed decline on the ADCS-ADL-MCI functional test. Dawson noted that p-tau181 fell more in people who became amyloid-negative than in those who stayed above this threshold. “Biomarker changes are linked to efficacy,” she concluded.

Inconsistent Sex Difference. In the Phase 3 gantenerumab trials, CSF biomarkers tended to normalize more in women than men, belying the larger clinical benefit in men. [Courtesy of Roche.]

Yet this does not always hold true. In the Graduate Phase 3 trials of gantenerumab, fluid biomarkers normalized in the absence of a statistically significant cognitive effect. More puzzling still, a subgroup analysis by sex found opposing effects for biomarkers and cognition. While only men benefited on cognitive tests, women had more biomarker change in CSF, Tobias Bittner of Roche reported in Amsterdam. The CSF differences were typically small, and biomarkers moved toward normal in both sexes in the treatment group.

For example, CSF p-tau181 fell by about 20 percent in men, and 25 percent in women. Total tau, neurogranin, and GFAP posted similar numbers. Some secondary biomarkers of inflammation and synaptic damage, such as sTREM2, S100B, and α-synuclein, displayed more dramatic differences, improving only in women. A couple of biomarkers bucked the trend—Aβ42 rose by the same amount in both sexes, and NfL fell more in men than women. Overall, however, women seemed to have a stronger biomarker response than men (see image above). It is unclear if the biomarkers/cognition discrepancy might be due to higher drug exposure in women, who weigh less than men on average.

How much do biomarkers need to change for a measurable treatment benefit? This is still unknown. John Sims of Eli Lilly stressed that researchers should examine a panel of biomarkers to gauge efficacy, because individual markers, such as NfL, have at times given puzzling results.

Tau Tangles Are Harder to Budge Than Previously Thought
Tau PET confounded scientists, producing inconsistent results from trial to trial. Still, hopes are high for tau PET as an outcome measure, and in some cases, this marker does seem to track with efficacy. For example, for aducanumab, which reported a treatment benefit in one of its two Phase 3 trials, an admittedly tiny tau PET substudy found a slight drop in tangles in a medial temporal composite region for people on drug, contrasting with a slight increase for those on placebo (Dec 2019 conference news). Meanwhile, in the Graduate trials, the tau PET signal was unaffected by gantenerumab, Bittner said in Amsterdam.

Alas, the correspondence between degree of tangle cleanup and efficacy is far from perfect. In the Phase 3 Clarity trial, lecanemab slowed, but did not stop, tangle accumulation in the medial temporal lobe, and showed a trend toward slowing in other regions (Dec 2022 conference news). This was a lesser effect than reported for aducanumab, which cleared tangles in the MTL, despite a similar reported clinical efficacy in Clarity and Emerge.

In the Phase 2 donanemab trial, Lilly researchers had reported a slight drop in tangles in the medial temporal lobe, and slowing of accumulation in other regions (Mar 2021 conference news). However, in the Phase 3 Trailblazer-Alz2 trial, donanemab had no effect on tangles, despite delivering the same clinical efficacy as in Phase 2 (Jul 2023 conference news).

Stubborn Tangles. The original Q-ATN model (pink) overestimated how much tangle accumulation would slow compared with placebo (green) in the lecanemab (left) and aducanumab (right) Phase 3 trials. The updated model (blue) offers a better fit with actual data (dots). [Courtesy of Norman Mazer and Roche.]

Why this discrepancy between trials? The answer is unclear. One factor could be that amyloid removal touches tangles only indirectly. Norman Mazer, who recently retired from Roche, had previously developed a Quantitative ATN (Q-ATN) model of AD. It used observational and trial data to predict how amyloid removal would affect downstream biomarkers and cognition (Aug 2022 conference news; Mazer et al., 2022). In Amsterdam, Mazer and Frank Boess of Roche presented an update that incorporates data from the Graduate and Clarity trials, but not yet from the Trailblazer-Alz2 results.

The researchers found that the original model, based on sparse tau PET data, had overestimated how strongly plaque clearance would affect tangles. In the updated version, tangle accumulation slows at about half the rate as previously estimated, better matching trial data (see image above). This has a correspondingly smaller effect on CDR-SB as a result. For example, using baseline data from Graduate 1, the original model predicted that decline on the CDR-SB would slow by 0.86 points. The updated model predicted a slowing of 0.39 points, close to the observed value of 0.31. For other antibodies, too, the updated model predictions are more accurate, falling within the 95 percent confidence interval reported for sensitivity analyses of each trial.

The model predicts that tangles in the brain have a half-life of 3.4 years. This implies they barely budge over a typical trial's duration. The model also suggests that, for cognitive decline to slow, antibodies must hit the brakes on MTL tangles. Except for the Trailblazer-Alz2 data, all effective antibodies thus far have reportedly done this. Scientists at AAIC were nonplussed that Trailblazer-Alz2 posted no effect on tangles at all while reporting arguably the biggest clinical benefit thus far.

CenTauR Rides to the Rescue?
One factor in these discrepancies between trials could be differences between tau PET tracers. The aducanumab and lecanemab trials used MK6240; donanemab, flortaucipir; gantenerumab, GTP1. Flortaucipir is generally considered the least sensitive of the bunch, but at the moment, there is no way to directly compare PET data from different tau tracers. Thus, it is unclear if these data are equivalent.

Researchers at the Critical Path Institute are trying to change this. At last year’s AAIC, they convened a public-private partnership to determine how to scale tau PET scans against a common benchmark, with units called CenTauRs, similar to the centiloid scale for amyloid PET. Led by C-Path’s Sudhir Sivakumaran, the Critical Path for Alzheimer’s Disease (CPAD) consortium includes representatives from the companies that make tau PET tracers, as well as data scientists at the University of Southern California’s Laboratory of Neuro Imaging and Australia’s CSIRO. It receives input from regulatory agencies such as the FDA and EMA.

Unified Yardstick. The proposed CenTauR scale achieves a tight correlation between tangles measured with flortaucipir (x axis) and Roche's RO948 tracer (y axis). [Courtesy of Critical Path for Alzheimer’s Disease.]

In Amsterdam, Antoine Leuzy of Lund University, Sweden, presented the consortium’s work to date. First, the researchers tried the same method that had been used to construct the centiloid scale. It requires choosing a reference tracer to measure other tracers against. However, because tau PET does not have a gold-standard reference tracer, as PiB is for amyloid tracers, this method produced more noise and was less accurate than is the centiloid scale.

Instead, the scientists tried a “joint propagation model,” which obviates the need for a reference standard. Instead, it compares all tracers head-to-head while scaling each from 0 to 100. To develop the standard, the researchers mined data from five different trials that included multiple tracers. The BioFinder2 cohort uses both flortaucipir and RO948; a University of Pittsburgh study uses flortaucipir and MK6240. Roche has a study with GTP1 and MK6240, another with GTP1 and PI2620. Finally, the ACE Alzheimer Centre in Barcelona, Spain, uses both RO948 and PI2620. Together, the data encompass the five most commonly used tau tracers, each being directly compared against two others.

The joint propagation model was indeed able to closely align findings from these different tracers, Leuzy showed in Amsterdam. For example, the correlation between flortaucipir and RO948 data as measured in CenTauRs was around 0.98 (see image above). The scale was able to capture longitudinal change in the BioFinder2 cohort, as well. The method appears more precise than the centiloid-like approach, with a root-mean-squares error for alignment strategies being 6 versus 8.

In addition, the CenTauR scale may reduce the number of PET scans needed to detect an effect by anywhere from 13 to 60 percent, depending on the trial population, Leuzy calculated. A unified tau PET scale therefore is achievable and useful, he concluded. However, Leuzy noted that more data are needed from scans with different levels of tangle severity, to ensure the measure performs equally well at very high and low ends of the scale.

Once the CenTauR scale is ready to be deployed, will it better align tangle measurements with clinical efficacy in trials? The answer may help determine how well tau PET would work as an outcome measure.—Madolyn Bowman Rogers

Biomarkers and Efficacy: Not (Yet?) a Perfect Union

Quick Links

Tools

Comments

Make a Comment

References

Therapeutics Citations

News Citations

Paper Citations

Further Reading

News

Annotate