You may have heard, although frankly it is probably just as well if you didn’t, that Elon Musk got some mixed results from SARS-CoV-2 rapid tests last week. He opted to use his celebrity to cast aspersions on the value of the tests. Normally I would consider that just another drop of noise in the roiling sea of nonsense. But I realized that the math involved in interpreting these tests is not altogether intuitive, and many of us might be uncertain what to do with such equivocal outcomes. And for some of us, that may already be or may become a concrete reality rather than a hypothetical exercise. So let’s talk through the numbers.
Per Musk, he had two positive results and two negative results. For simplicity, let’s assume that the tests were close enough in time that he didn’t change status along the way–that is to say, he didn’t get two negative results, then become infected, then have two positive results. In that case, two of the results must be wrong. At first glance, it might seem like a tie and so we’ve learned nothing. But in reality, while the particular result of two positives and two negatives is unlikely out of all the possible combinations of outcomes, it is much more likely to happen in one of the two possible scenarios (infected or not) than the other.
To see why, we need to be familiar with how tests are assessed. No screening test is perfect. There is always a chance it will return an incorrect result, and so we need to deal in probabilities. The most natural question to ask is “Given that I got a positive result, what is the probability I am actually infected?” This probability is the positive predictive value. A bit of math can show that while we can certainly answer this question, the value depends on the prevalence of the infection, or the fraction of people in the population who are infected. For the exact same test, the positive predictive value is higher if more people are infected. That makes it complicated to compare tests based on the positive predictive value, and so different probabilities that don’t depend on the prevalence tend to be used and reported.
The probabilities most commonly used are the sensitivity and specificity. Sensitivity answers the question “Given that I am actually infected, what is the probability that I get a positive result?” and specificity answers the question “Given that I am actually not infected, what is the probability that I get a negative result?” Those probabilities do not have to be the same and don’t have a fixed relationship with each other. Musk seems to have been given rapid antigen tests, which tend to have high specificity but somewhat lower sensitivity, a tradeoff for the quick results and simplicity of process. In other words, they will rarely be positive when there is no virus present, but they will sometimes be negative for infected people, especially if those people have less virus. For example, the BD Veritor test, mentioned in some reporting on Musk, has a specificity of 100% and a sensitivity of 84% as reported in this FDA document. Whether that is the exact test Musk got or if he got all four of the same brand I don’t know, but it doesn’t matter for this general discussion; other antigen tests have similar characteristics.
So, time to finally crunch some numbers. In the scenario where Musk is infected, the probability of a positive test is 0.84 and the probability of a negative test is 0.16. So the probability of whichever specific sequence of two positives and two negatives that Musk received is 0.84 x 0.84 x 0.16 x 0.16 = 0.018 or roughly 2%–unlikely, but not impossible. In the scenario that Musk is not infected, the probability of a positive test is 0 and the probability of a negative test is 1. In that case, the probability of his sequence is 0 x 0 x 1 x 1 = 0. Now, that is based on the reported specificity, but it is a bit too tidy; there is likely some small but nonzero chance of a false positive, just too small to have been occurred in the evaluation sample. For simplicity, let’s call it 1% (which is probably too high, but within the 95% confidence interval). So then the probability of his four results if Musk is infected is 0.018 or 2%, and the probability of those four results if Musk is not infected is 0.01 x 0.01 x 0.99 x 0.99 = 9.8×10^5 or roughly 0.01%. Thus, even though his particular sequence of results is very unlikely under either scenario, it is far more likely–184 times more likely in fact–if he is infected than if he isn’t.
For completeness, since we don’t know the exact sequence of results, we can also calculate the probability of getting two positives and two negatives in any order. Since there are 6 such combinations, it is just the probability times 6, or 0.108 (11%) if he is infected and 0.00059 (0.06%) if he is not infected, and the outcome is still 184 times more likely if he is infected than if he isn’t. In other words, if 10 people who were infected each received 4 tests, we could expect 1 of them to have some sequence of 2 positives and 2 negatives.
Now, there is one very important caveat to those calculations. Multiplying probabilities requires independent events, meaning that the outcome of one doesn’t tell you anything about the outcome of another. In this case, it may not be a warranted assumption. For example, if the samples for the tests were taken very close together, there may be fewer virus particles available for subsequent swabs making later tests more likely to be negative. Since we can’t know if the events were really independent, we should not take the above estimates as the definite probabilities of the two scenarios and their relatively likelihood. Nevertheless, I hope the exercise and discussion make it clear that even though the results were not perfectly consistent, it is still the case that something can be learned from such outcomes and that they are not indicative of a bogus test.