Why Bother With Round Robins on Blood Tests? Q&A with Kaj Blennow
Quick Links
At the AAIC conference held July 14–18 in Los Angeles, Kaj Blennow of the University of Gothenburg had his hands full trying to get researchers in the field engaged in the nitpicky drudgery of validation, standardization, and commutability studies when all they wanted to do was grab one of those fancy new blood tests and run with it. Blennow, a world leader in clinical chemistry in Alzheimer’s research, has been guiding the work of the Global Biomarkers Standardization Committee (GBSC) for the past 10 years, starting with a quality-control program for CSF Aβ and tau assays. Convening academic leaders in AD biomarker development and scientists from companies that make tests for biological fluids such as CSF and blood, this group has been meeting in person and over the phone regularly, under the auspices of the Alzheimer’s Association.
Their work is not the stuff of Nature, Science, and Cell journals. It’s not glamorous. It’s complicated in a tedious sort of way. But it’s essential if the field is to move beyond tests that only work in the inventor’s lab, or are sold “research use only,” toward having a choice of automated, affordable, and clinically certified tests that are routinely used for millions of people in doctors’ offices, hospitals, and central labs—tests that perform identically, year after year, be it in London, Sydney, or anywhere else. The GBSC has had some success standardizing tests of CSF Aβ42 and tau, but given that the field is generating promising new AD tests, its work is far from done. In 2018, the group took up quality control of CSF NfL tests. In 2019, it started—with a first, much-debated round robin—to try to bring some order to a suddenly burgeoning plasma test scene. The round robin showed generally weak correlation among the tests. Even so, a handful of contenders are rushing to market (see Part 9 of this series). Is that good enough?
Q: What is your long-term goal for Alzheimer’s blood tests?
A: I foresee us having robust blood tests for clinical Alzheimer’s diagnosis and care. Ideally, multiple tests that are in agreement, as we have for cholesterol, blood sugar, cystatin C, hemoglobin etc. We need this for blood Aβ, and blood tau, too. I am trying to engage the field in the validation processes that are necessary to create this landscape. This means we have to look beyond just a single purpose for a favorite test, i.e., predicting brain amyloid deposition as a prescreen for clinical trials. We need tests that also work for screening, or even diagnostics, in clinical care.
Q: Is this realistic?
A: Of course. For CSF Aβ42, we are almost there. Fully automated, approved tests on the Fujirebio and Elecsys machines are being used in Europe in routine clinical care, spreading out from tertiary care centers, and may be approved in the U.S. next year.
Q: At AAIC, neurologists from different countries stressed the need for blood tests in primary care, where many dementia diagnoses are made—often the wrong diagnoses. Do you agree?
A: Absolutely. And it’s better if the world has more than one assay that measures the same thing and gives the same answer. It inspires confidence, and competition keeps prices down. As a research and clinical community, we want several assays that are equally good at measuring a given analyte.
Q: Is this happening?
A: Yes. The plasma round robin is a good example of a precompetitive collaboration between companies. For the CSF assays, companies have collaborated for a decade within the Alzheimer’s Association GBSC group, and are right now in the final stage of harmonizing assay readouts. So this is a favorable situation. But we need to continue, since there is much work waiting ahead.
Q: Why is the round robin result on the current plasma Aβ42/40 ratio tests important?
A: There is great excitement about the plasma Aβ42/40 ratio tests, and some have posted promising results. The round robin is a first step toward having standardized blood Aβ tests in clinical practice. For that to happen, we will need a certified reference material (“gold standard” plasma pools with known and exact Aβ levels) for the different vendors to calibrate their tests. To get a reference material, you also need a reference measurement procedure (one or more “gold standard” methods to measure Aβ levels) to establish a traceability chain. It’s a process, and requires multiple round-robin studies, including so-called commutability studies (in which candidate reference materials are tested out) along the way. The International Federation of Clinical Chemistry in Medicine (IFCC), the EU’s Joint Research Center, and the Institute for Reference Materials and Measurements (IRMM) guide us in this work, but it requires a sustained effort on the part of many academic and vendor labs.
Q: This is not something most people in our field focus on …
A: True, but it still needs to be done. It’s best learned from the above-mentioned fields in diagnostic clinical chemistry that went through these same steps on their way from selling a few research-grade tests to having tests approved for clinical practice. The EU’s Joint Committee for Traceability in Laboratory Medicine and the National Institute of Standards and Technology show what clinical development entailed for cholesterol and other tests we use in medical care all the time. We want JCTLM/NIST-approved materials for blood Aβ tests, and we will need to go through these steps, too. As part of this process, you need to show that different tests measure the same analyte, be it Aβ, tau, or NfL.
Q: If the round robin had shown that they do—it didn’t—then what would have happened?
A: Then we could have started the clinical development process I indicated, i.e., we could have initiated a standardization project and made a certified reference material. But the data show we are not fully ready for that. We should instead work on getting the methods better harmonized first.
Q: Not everyone here at AAIC thinks it’s a problem that the round robin showed poor correlations across the different test methods.
A: For a narrow application, or within one lab’s studies, that might be true. But for clinical use, it is not. Say you have a number of methods that have been shown to predict something, be it brain amyloidosis or liver disease or myocardial infarction. If these different methods that are supposed to measure the same solute do not correlate well, that usually means there are issues with some of them. It tells me to expect problems later on.
A: Test-retest performance, for one. Let’s do longitudinal studies in the same individuals. If you have one person with an Aβ42/40 ratio of 0.21 and examine the same person three or six months later, will they still read 0.21? We need data on that. This is especially important given that the reduction in the ratio in positive versus negative people is not more than around 15 percent, which is really minor.
Q: And this worries us because …
A: When you are working within a range as narrow as that, it is extremely important to have a very high-performing assay so that you can place a firm cutoff. Even if AUC values are high, the cutoff has to go right through spread of values in the control (amyloid-negative) group, and when your signal is only a 10 to 15 percent difference, the tests must be extremely robust.
Thin Blue Line? The difference in the plasma Aβ42/40 ratio between amyloid-positive and -negative people is less than 15 percent. [Courtesy of Palmqvist et al., JAMA Neurology 2019.]
For example, when you look at the blue line in Oskar Hansson’s data using the Roche test, it tells you there is very little separation to draw a cutoff. People close to that line may get false results. And it’s not just the Elecsys test. The Aβ42/40 ratio signal is small in the other tests, as well.
In CSF, by comparison, Aβ42 is 50 percent down from control, and the Aβ42/40 ratio values don’t even overlap much between control and amyloid-positive groups. This means placing the cutoff is easier and few people will get the wrong result. In CSF, the cutoff can accommodate a bit of variance between tests; in blood, it cannot.
Q: What problems could this cause?
A: You’ll have some people who are positive at first test but negative upon repeat even though they did not change, just because so many people are close to the cutoff. For use in clinical trial prescreening, that is one thing, but imagine this situation if screening in clinical care: “You have brain amyloid.” Three months later: “Sorry, we were wrong. You don’t.”
Q: What to do?
A: At a minimum, we could do test-retest studies with a predefined cutoff, to evaluate how big a percentage of controls individuals and patients will be negative on the first test and positive on the repeat test taken, e.g., a month later (and vice versa). This type of study needs to be done prospectively, with test results reported continuously, to mirror the real-life situation. Maybe this could be a study governed by the Alzheimer’s Association, and I also hope the vendors would agree to participate.
Q: I have heard criticism that the round robin should compare tests directly on the ratio, not on Aβ42 and 40 separately. What do you say to that?
A: It is definitely important to compare the ratio, but since it is built on two components, I find it essential that Aβ42 and Aβ40 correlate across assays.
Q: What do you want to see done next, as a result of the round robin?
A: One option would be to take a step back, work on the methods to improve their performance, and redo this kind of round robin with second-generation tests. Another is to work on larger cohorts. Oskar Hansson is doing that, and the FNIH [Foundation for the National Institutes of Health] are planning a round-robin study with the plasma tests on ADNI samples. This round robin we did is just a first indication. We now need a round robin on samples from 300 to 400 people who are amyloid-positive or negative. We’ll see if the tests correlate among each other, and also if they concur with amyloid positivity. For CSF Aβ42/40, concordance with amyloid PET has been shown in about 20 studies; the concordance rate is 90 percent, and this data became part of the appropriate use criteria for clinical CSF tests.
Q: What will the larger cohorts do that the smaller ones did not?
A: For example, they’ll help us walk the line between sensitivity and specificity. With high sensitivity, the risk is that you tell people that they have brain amyloid who don’t, especially when the range leaves no room for error. With high specificity, the risk is that you miss a lot of people who do have amyloid. When you look at plasma studies, e.g. the Roche data published by Palmqvist et al., 2019, the ratio changes so little between controls and cases that the optimal cutoff leads to a sensitivity and specificity in the 70 percent range.
Sensitivity and specificity. Findings from one evaluation of the Roche Ab immunoassays. [Courtesy of Palmqvist et al., JAMA Neurology 2019].
Q: Does the Aβ42/40 ratio correlate between plasma and CSF?
A: Rather poorly in many studies. And we don’t know what this lack of correlation tells us. What is the cause? Peripheral production of Aβ? Pre-analytics could have been a possible explanation for Aβ42, but then Aβ40 levels do not correlate between plasma and CSF. We need to learn more on the physiology of plasma Aβ.
Q: A company getting ready to sell their plasma test will not like to hear that correlation between their and their competitor’s test is important.
A: That’s possible, but when different labs use different assays on identical aliquots that contain a given amount of a well-studied analyte, and they measure two different amounts, that indicates a problem which need to be addressed.
Q: Any formal effort to get control of this issue?
A: Yes, we invite labs and companies to join us in the Alzheimer’s Association quality-control program of their blood-biomarker tests. It has worked well for the CSF tests, where it is still ongoing so labs can track their performance not just against others’ but over time, too. We will start with blood tests next year.
Q: Do you think p-tau will outperform the Aβ42/40 ratio?
A: It looks as if there will be a plasma p181-tau or p217-tau test that will really work. They could be better than the Aβ ratio because the difference between groups is much larger, around a 200 percent increase, not a 10 to 15 percent decrease. That makes everything easier.
Q: Could a plasma p181-tau test replace the Aβ42/40 ratio tests?
A: In the best of worlds, it will be complementary. My dream is that we soon can use blood NfL to screen for neurodegeneration, p181-tau to screen for tau pathology, and Aβ42/40 ratio for amyloidosis. And other tests are in development. But a lot of work lies ahead to get them from research-grade to clinically certifiable.
Q: Where is blood NfL in this certification process?
A: We started last year, with nine labs in an IFCC working group. It includes five companies selling NfL test on different platforms, with different antibodies. The first commutability study toward a reference material is getting underway.
Q: What was the most hopeful news to you at AAIC this year?
A: The data on plasma p181-tau, that is really very promising!
—Questions by Gabrielle Strobel
References
News Citations
Paper Citations
- Palmqvist S, Janelidze S, Stomrud E, Zetterberg H, Karl J, Zink K, Bittner T, Mattsson N, Eichenlaub U, Blennow K, Hansson O. Performance of Fully Automated Plasma Assays as Screening Tests for Alzheimer Disease-Related β-Amyloid Status. JAMA Neurol. 2019 Jun 24; PubMed.
External Citations
Further Reading
Annotate
To make an annotation you must Login or Register.
Comments
No Available Comments
Make a Comment
To make a comment you must login or register.