COVID19: All About False Positives and Negatives
by Raywat Deonandan, PhD
Epidemiologist & Associate Professor
University of Ottawa
(I add my credentials to these COVID-19 blog posts in case they get shared. I want readers to know that my opinion is supposedly an educated and informed one)
You know, I teach this stuff for a living. And every year, when it comes time to teach my students how epidemiologists assess the quality of diagnostic tests, I have a tiny panic attack, since it can be so very confusing, and therefore quite challenging to explain to others how it works.
But it’s important for people to understand the definitions of the terms we use —sensitivity, specificity, positive predictive value, false positives, false negatives, and so on– because they are popping up more and more in the media, and producing a lot of confusion, as well as fueling some misinformation.
Ontario’s Associate Chief Medical Officer Dr Barbara Yaffe recently said, “If you’re testing in a population that doesn’t have very much COVID, you’ll get false positives almost half of the time.”
This lead my colleague and old friend, a prominent biostatistician known on this site as “Nasty Nicky B”, to tweet, “Incorrect. The PCR test for SARS-COV-2 has very high specificity. False positives are rare.”
He’s not wrong, of course. The “swab test” commonly used to identify COVID cases in North America is a PCR test that has a known low rate of false positives and a suspected higher rate of false negatives. In fact, I’ve seen data suggesting that the test’s false positive ratio is something like 0.01%. Whereas, the cases of false negatives might have been the result of improper technique (not pushing the swab deep enough to get a good sample, for example).
Okay, but how did Dr Yaffe get to a 50% false positive rate? None of us has seen anything resembling that number. I think she misspoke and meant to be referring to something called the positive predictive value, or PPV.
Man, I hate this topic. But let’s begin.
Sensitivity vs Specificity vs Prevalence
These are important statistics. They tell us how good and useful a test is. Now that we have a global public health emergency on our hands, I think the public is waking up to how much of an impact these measurements can have.
Consider an at-home pregnancy test, the type that has a strip that you urinate upon. It gives you a sense of whether or not you (a fertile woman) has been gloriously and wondrously knocked-up, as the kids used to say.
No test is perfect, though. So the manufacturer has conducted its diagnostics to inform the consumer about how accurate their pregnancy test actually is. To do so, they gave the test to a selection of women, some of whom were actually pregnant, and some of whom were not. How do they know if they were actually pregnant? They used a more reliable test as the gold standard, probably an ultrasound.
And they filled in this table:
From the contingency table above, we define some concepts. First, we have this thing called sensitivity, which is given by:
Sensitivity is, in this example, the proportion of actually pregnant women who are identified by the urine test as being pregnant. In the COVID reality, it would be the proportion of actually infected people who are correctly identified as being infected. For most people, this is clearly the statistic they care about the most. It’s easily understood and gives a clear measure of the quality of a test.
Specificity, on the other hand, in this example would tell us the proportion of non-pregnant women whom the urine test accurately excluded. In the COVID reality, specificity would tell us the percentage of healthy people who are correctly identified as not having the disease. Specificity is given by:
But what about these things called “false positive” and “false negative” rates? Well look at the top row of the contingency table. Of all the positive urine test results (a+b), “b” were in fact false (that’s the number of women who were actually not pregnant, even though the urine test said that they were.)
Do keep in mind that I tend to use the terms “rate” and “ratio” interchangeably, as many epidemiologists do, though we really should not.
So the false positive rate –i.e., the proportion of positive test results that are erroneous– is given by:
Meanwhile, the “false negative” rate is that proportion of urine tests rendering a negative result that were in fact positive. Confused yet? It will be given by:
Whew! If this is new to you, I bet you’re bored already. Can’t blame you. At this point in my lectures, I’m pumping aerosolized caffeine into the classroom and genuinely regretting my career choices.
As of May 23, the Public Health Ontario labs had detected false positives in COVID testing for about 20 cases from 228,000 specimens, with about 11,000 testing positive. This represents a false positive ratio of <0.01% and a specificity of >99.99%. That’s pretty darned good.
But what about prevalence? How do we compute the prevalence of a disease (or in this case, of pregnancy) from these data? Prevalence would be given by:
Okay, so now what?
Positive Predictive Value (PPV)
Because epidemiologists excel at one task –making simple concepts unnecessarily complicated– we also have this thing called “positive predictive value”, or PPV. It seeks the answer the question, “If the patient tests positive, what is the probability that she actually has the disease?”
At this point, you might be thinking, “Hey, if I test positive, doesn’t that mean I have the disease? I mean, isn’t that what a test is for?” Why invent a whole new diagnostic for a thing I thought we already covered with specificity and so forth?
As we noted above, tests are imperfect. In some instances, a test will give you an incorrect result. Going back to our contingency table above, the PPV is defined thusly:
I encourage you to take a moment to cogitate upon the distinction between sensitivity and PPV. They look similar, but are measuring different things. To recap: sensitivity tells us the proportion of truly disease people whom the test identified; whereas, PPV tells us the proportion of people who tested positive whom were truly diseased.
See the difference?
As far as what this means for COVID testing, laboratory diagnostics suggest that PPV for the COVID PCR test approaches 100%. So the pretty much all the people who test positive should actually have the disease…. maybe.
See, this is not the end of the story. To confuse you further, all the terms I’ve defined so far can be computed from each other, as per this equation:
Here’s a hint of where I’m going with this. If you’re mildly mathematically adept, you will see that PPV varies with prevalence. As prevalence approaches zero, so does PPV. And as prevalence becomes quite high, PPV explodes, as well. This is because as prevalence decreases, the PPV decreases because there will be more false positives for every true positive.
So even though laboratory conditions suggest that PPV approaches 100%, real life conditions might tell a different story.
An Example With a Twist
Let’s say that 4,810 women take a home pregnancy test, and all of them get follow-up ultrasound scans to confirm whether or not they are actually pregnant. The results are presented in this contingency table:
Now let’s calculate our diagnostics:
Sensitivity = 9/10 = 90%
Specificity = 4449/4800 = 92.7%
Prevalence = 10/4810 = 0.2%
PPV = 9/360 = 2.5%
Let’s interpret this. If we only looked at the sensitivity and specificity, we would think this is an extraordinarily good test. Both are over 90%, after all.
But why the low PPV? Well, PPV is sensitive to prevalence. When the prevalence of the condition (in this case, pregnancy) is low, then PPV becomes very poor. The prevalence in our example is very low: 0.2%.
So, in this example, if a woman were to screen positive on the urine test, there is only a 2.5% probability that she is actually pregnant.
(Disclaimer: these are simulated data that I plucked out of my dog’s butt hole. Real pregnancy tests are much better. Thankfully.)
Back to Dr Yaffe’s Comments
I strongly believe what she meant to say, instead of, “you’ll get false positives almost half of the time”, was: “almost half the positives you get will be false positives… when the prevalence of the disease is low.”
It was, in my opinion, a misstatement that probably should be corrected. When in interpreted through this generous lens, she’s not incorrect.
The larger issue, though, is the role of testing in a pandemic. When so many cases of COVID19 are in fact asymptomatic, it seems unlikely that prevalence is so low that the test results are suspect.
More to the point, even if low prevalence (for example, in remote parts of the province) renders a low PPV, that would just mean an elevated proportion of false positive results…. which is fine. We would much rather err on the side of false positives than false negatives, when it comes to detecting current infection.
In other words, I would much rather capture more cases and exclude them later, rather than to miss some cases who could go on and start new outbreaks.
Keep in mind, we have only been talking about the PCR swab test and a fake pregnancy test. The COVID antibody tests are another matter altogether. But we won’t get into those right now.
Why This Is Important
I’ve been informed that Dr Yaffe’s comments are being used to call into question some of the recent decisions made by Public Health to close some businesses, for example. The argument goes that if the false positive rate is so high, then in fact the community burden is only half of what is being reported, so there is no actual emergency.
No. No no no no no. No.
First, I doubt that community prevalence is so low that the value of the test is being called into question.
Second, I have faith that when Public Health confirms a case, they are employing what’s called an orthogonal testing algorithm, which is essentially applying two independent tests in succession to greatly reduce the probability of falsity. (For example, if a single test has a false positive probability of 10%, then the probability of that same test giving a false positive twice is 10% of 10%, or 1%.)
Third, these diagnostics pre-suppose an equal likelihood of everyone having the disease. That’s the way the math works. But that’s not the way real life works. If you suspect that someone is infected (i.e., they have symptoms or they were present in a facility during an outbreak), then the PPV becomes almost irrelevant. In other words, such individuals have a high pre-test probability of being infected, so the probability of a false positive test becomes low indeed.
So even though all tests are subject to the effects of low prevalence on PPV, that in no way means that the tests you read about in the newspaper are inflated. Got it?
Okay, it’s 9:AM and I haven’t had my coffee yet. This is way too much numbers geekotry for this time of the day.