by Katie Stoll and Heidi Lindh
Heidi and Katie are genetic counselors and both work with the newly established charitable nonprofit, the Genetic Support Foundation (twitter @GeneticSupport), geneticsupportfoundation.org.
The importance of the Positive Predictive value (PPV) in interpreting Noninvasive Prenatal Testing (NIPT) results is increasingly on the minds of providers as evidenced by frequent discussions, presentations, and publications on the topic. But what if, in an effort to make their lab look like the best lab, the NIPT PPV was overstated in marketing materials or even on test reports? And what if providers and patients believed this information without question or further investigation?
Until 2014, four labs (Sequenom, Verinata Health/Illumina, Ariosa and Natera) were the only companies in the United States that offered NIPT. Over the past year, we have seen a burgeoning of new labs offering their own branded NIPT tests. In some cases, the 4 original companies act as “pass-through” labs in which the testing is branded and advertised through a secondary lab however the sample is ultimately sent to the primary lab for analysis and interpretation. In other cases, referral labs have brought NIPT testing in-house, developing their own algorithms and reporting, such as the case for the InformaSeqTM test offered by LabCorp and Integrated Genetics. In a recently published marketing document, Illumina lists 16 laboratory “partners” that all offer a version of the Illumina NIPT. The other primary NIPT labs are also distributing their tests through other labs as well; Quest Diagnostics and the Mayo Clinic have been secondary labs for the Sequenom NIPT (Quest also has their own brand, the “Q-Natal Advanced”and Natera’s NIPT is available through GenPath and ARUP).
The growing number of laboratories that offer some version of NIPT presents a significant challenge for healthcare providers who are struggling to navigate the various testing options to determine what is in the best interest of their patients. The competitive commercial landscape and aggressive marketing of NIPT to both patients and providers can further confound clinical decision-making given the paucity of information available to providers that is not delivered with an angle aimed at selling the test.
NIPT Statistics in Marketing Materials
We have noted that multiple labs offering testing have promoted extraordinarily high positive predictive values (PPVs) in their marketing materials distributed over the past year and on their websites ^ and on laboratory test reports. These tables include information regarding PPV frequently reference data from the Illumina platform and VerifiTM methodology and a study by Futch et al. as the source.
|Performance Data Presented in Marketing Brochures for NIPT|
These figures (or slight variations thereof) have been observed in the marketing materials for multiple laboratories offering NIPT. These specific statistics were reproduced from an InformaSeq brochure and sample test reports available online.
The PPVs reported in this table – being widely distributed on test reports and as educational information for providers – have NOT been demonstrated by the referenced study by Futch et al. or any published NIPT studies of which we are aware.
Of course, the PPV of a screening test depends on the prevalence of the condition in the population being screened. Using the sensitivity and specificity of testing accompanying these predictive value data in the same brochure, one could only derive PPV of >99% if the prevalence of Down syndrome in the screened population was 25% or 1 in 4 pregnancies, far higher than the a priori risk for the vast majority of women undergoing prenatal screening.
PPV = (sensitivity x prevalence) / ((sensitivity x prevalence) + (1 – specificity)(1 – prevalence))
.994 = (.999x.25)/((.999x.25) + (1-.998)(1-.25)
In contrast, if we utilize performance statistics provided by the laboratories, we calculate a PPV of 33% in a population with a prevalence of 1 in 1,000 (which is similar to the prevalence for women in their 20’s) and a PPV of 83% in a population with a prevalence of 1 in 100 (which is similar to the prevalence in women age 40).
The Futch Factor
The study by Futch and colleagues that is frequently cited in marketing materials for NIPT does not demonstrate the high PPVs that are referenced, although we suspect that these statistics were arrived at through a series of assumptions about the Futch data that we will attempt to outline.
This study reported that in a cohort of 5,974 pregnant women tested, there were 155 positive calls for T21, 66 positive calls for trisomy 18, and 19 positive calls for trisomy 13. In this published report, only a fraction of the positive NIPT results had confirmation of the fetal karyotype, 52/155 cases of Down syndrome (33.5%); 13/66 cases of trisomy 18 (19.7%); and 7/19 cases of trisomy 13 (53.8%). There was 1 case of Trisomy 21 that had a normal NIPT result (false negative result), however negative test results were not methodically followed-up, so the true false negative rate for the screened conditions is unknown.
In analyzing the data presented by Futch et al, for marketing materials to derive PPVs of >99% for Down syndrome, 91% for trisomy 18 and 84% for trisomy 13 would require that all of the positive calls WITHOUT follow-up by karyotype confirmation were true positives.
|Outcomes data from Futch et al, 2013 and projected PPVs based on category inclusion or exclusion as true positive.|
|Confirmed (karyotype or birth outcome)||52||13||7|
|Discordant (Unexplained NIPT results that do not match karyotype from a source or birth outcome)||1||6||3|
|No information (laboratory did not obtain any information on outcomes)||22||12||0|
|Pregnancy loss (miscarriage , demise or termination without karyotype)||7||5||2|
|Unconfirmed (no karyotype or birth outcome known but history of clinical findings suspicious of aneuploidy such as ultrasound findings or high-risk biochemical screening results )||73||30||7|
|Total Positive NIPTs where follow-up karyotype not confirmed||102||47||9|
|High End PPV*||99.4||90.1||84.2|
|Low end PPV**||33.5||19.7||36.8|
*High end PPV- It appears that marketing material PPVs are considering all categories, including confirmed, no information, pregnancy loss, and unconfirmed to be TRUE positives in determination of PPVs.
**Low end PPV- calculated considering all cases, which were not discordant to be false positive results. A minority of positive NIPT results were confirmed with birth outcome or fetal karyotype information.
Given that Futch et al. did not have confirmed fetal karyotype or birth outcome follow-up for the majority of positive calls, it seems at best unlikely, and at worst impossible, that all of these positive NIPT results were correctly called, rendering claims of such high PPVs in the marketing materials based on this assumption to be unfounded. On the other end of the spectrum, if the PPV was calculated to include the not-karyotyped/no-birth outcome information pregnancies as false positive, the assumed PPVs would be 33.5% for Down syndrome, 19.7% for trisomy 18 and 36.8% for trisomy 13. Since the study does not report follow-up karyotype for the majority of positive test results, the true PPV for these NIPTs test likely lies somewhere in-between the high end PPV and low end PPV, perhaps closer to the 40-45% (for T18 and T21) previously reported in another Illumina sponsored study.
While the PPV of NIPT for Down syndrome, trisomy 18 and trisomy 13 exceeds that of traditional biochemical screening, no studies have demonstrated test performance as high as those presented in many of the PPV/NPV tables that are being provided to healthcare providers in marketing materials and, in some cases, on test reports.
A Call For Truth In Advertising And In Test Reporting
Honest communication about test performance metrics must be available to providers so that they can provide accurate counseling to patients making critical decisions about their pregnancies. While most labs do state that NIPTs are screening tests and that confirmatory testing of positive results is recommended, it is not surprising that providers and patients are having difficulty appreciating the possibility of false positive results when the laboratories are incorrectly reporting positive predictive values that exceed 99%. The consequences of relying on lab-developed materials rather than a careful analysis of the available literature are significant. There are reports of patients terminating pregnancies based on NIPT results alone. It is not surprising that some women choose not to pursue diagnostic testing to confirm abnormal NIPT results given the very high stated predictive value.
It is imperative that we recognize not only the potential benefits of these new technologies but also their risks and limitations. Testing companies are primarily responsible to their shareholders and investors, so information provided by companies about their products is largely aimed at increasing test uptake. Professional societies need to call for independent data and federal funds need to be made available to support independent research related to NIPT. Policies and best practices cannot arise from the industry-influenced studies that are currently available. While some regulatory oversight of marketing materials will likely be necessary, we urge the laboratories to consider their marketing approach and how it is affecting patients and providers. If laboratories want to truly partner with patients and providers, they need to provide accurate and straight-forward information to limit provider liability and likewise, help patients avoid making life-changing decisions based on inaccurate and/or confusing information related to test performance. As a medical profession can we come together and make this change without regulatory oversight? Now that would be a medical breakthrough.
^ – Notably, Counsyl has also recently produced a table that provides more accurate estimates of their NIPT predictive values
4 responses to “Guest Post: PPV Puffery? Sizing Up NIPT Statistics”
Thank you, thank you, thank you. This is a wonderful and thorough analysis of the PPV “problem” and the way that laboratories are reporting normal and abnormal results. I also feel that it is particuarly misleading the way that many laboratories present risks in a numberic way on abnormal results (99/100 or 99%) when they are not presenting a PPV but are instead stating how confident the laboratory is in the high-risk assignment. This is such a massive departure from the way screening tests have traditionally been reported that it is little wonder that the medical community has had difficulty incorporating these results into the existing testing schema.
I sincerely believe that PPV is very difficult to calculate accurately with the information we have. When the number of affected pregnancies in any study is small enough that one more true positive or one more false positive will significantly affect the calculated specificity, then you cannot calculate an accurate specificity based on that data… only a number that is “close”. And when using a “close” specificity to calculate PPV, you will get an inaccurate PPV. And it may be a very inaccurate PPV.
I am beginning to think that the best way to estimate the effectiveness of the test, and to give meaningful information to patients who screen positive, is to look at studies such as Wang, et al, Gen in Med 2014 which examined the concordance of cfDNA test results and cytogenetic confirmatory testing. They found that if the indication for chromosomes was positive cfDNA testing for T21 – there was a 93% true-positive rate (TPR); if for T18, the TPR was 64%; and if for T13 the TPR was 38%. We have unpublished data from our lab that is similar (T21 – TPR of 91% n=44; T18 – TPR of 78% n=9; T13 – TPR of 50% n=12). In my opinion, telling a patient with a positive cfDNA screen for T13 that there is a 50% chance that her fetus is affected, but a 50% chance that it is not, is a better option than using a calculated PPV of 8% (see Katie Stoll’s previous post https://thednaexchange.com/2013/07/11/guest-post-nips-is-not-diagnostic-convincing-our-patients-and-convincing-ourselves/) or an overblown lab estimate determined by using the most favorable data interpreted in the most favorable way.
Personally, I would like to see all of the large national reference labs combine their cytogenetic results, for those cases for which a positive cfDNA test result is included in the indication for testing, into one collaborative study in order to develop a larger pool of data and to provide real-world odds that a fetus will be affected, given a positive result. If any lab GCs are interested in such a collaboration, please contact me!
Please correct me if I’m wrong, but it looks like the TPR rates in the Wang study were calculated the same way as a PPV is calculated (i.e., by taking the number of true positive and dividing by the total number of positives). In Table 2 of the Wang article, they combine their Table 1 TPRs with those from 2 other studies and call the pooled result a PPV in the Discussion section: “The observed PPV in this report was 94.4% for trisomy 21, 59.5% for trisomy 18, and 44.4% for trisomy 13.”
In my opinion, the main limitation of Wang et al is that they based their study population on the availability of cytogenetic results as opposed to the availability of NIPT results — in other words, they tell the story “backwards.” As such, calling their results “concordant” or “discordant” seems reasonable, but TPR and PPV seem like a stretch to the hard-core epidemiologist in me. This isn’t to say that their estimates are wrong, but there is the possibility of ascertainment bias and/or lack of representativeness. And I generally agree with their conclusion that their “findings raise concerns about the limitations of [NIPT] and the need for analysis of a larger number of false-positive cases to provide true [PPVs].”
As far as how PPV results are reported in the peer-reviewed literature are concerned, it is helpful when studies provide 95% CIs for their point estimates of ALL test characteristics (sensitivity, specificity, PPV, NPV, etc.). Having this information on precision takes into account the size of the study population and therefore how much confidence one has in the estimate that’s provided. It’s a way of statistically quantifying our uncertainty around estimates that are “close” but we’re not sure how “close.”
Authors also can help by providing clarity about how missing data are handled in their analyses – as this post suggests, making assumptions that assume a “best-case scenario” is statistically (and clinically) inappropriate when you’re missing data on >50% of your population. A “sensitivity analysis” such as the one that Katie and Heidi present above seems like a transparent way to present the best-case and worst-case scenarios. The more missing data you have, the wider those two estimates will be. There are also fancier statistical methods of imputing missing data that one could apply.
Another thing that is helpful is for GCs who work in “closed” healthcare systems to publish findings from their clinical experience since follow-up of both positive and negative NIPT cases is likely to be much more thorough than what labs can do (although my perspective on this may be biased since I have a horse in this race – see disclosure below). Even if the labs pool all their NIPT data as suggested by Danielle, they might still be looking at a small subset of patients for whom cytogenetic confirmation and/or pregnancy outcomes are available.
Full disclosure (and shameless plug): I work at Kaiser Permanente in Southern California and am a co-author on a NIPT poster that will be presented at ISPD this July (Kershberg et al). In this poster, we will present findings from 6600+ high-risk patients who elected to have NIPT, including pregnancy outcomes on 80% of both positive and negative NIPT cases. The poster will have a “sensitivity analysis” similar to the one presented in this blog post. For example, our base-case PPV for trisomy 21 was 93%. In the base-case, we excluded all missing values from the analysis. We then made best-case and worst-case assumptions about those missing values and calculated PPVs of 94% and 79%, respectively. Our colleagues from KP Northern California also will be presenting findings from their clinical experience with 7500+ patients (Norem et al). Both of these studies have their strengths and weaknesses, and I encourage anyone at ISPD to come by and discuss these analyses with the authors who will be attending.
Pingback: Informed Pregnancy Screen (Counsyl) | genetestify.com