COVID19: What Are My Chances of Dying?
by Raywat Deonandan, PhD
Epidemiologist & Associate Professor
University of Ottawa
(I add my credentials to these COVID-19 blog posts in case they get shared. I want readers to know that my opinion is supposedly an educated and informed one)
I’m continuing to try to answer some of the more common questions I’ve been getting from the general public and media. One of the recurring ones goes something like, “I’m a healthy 32 year old man living in Birmingham, Alabama. What are my chances of dying of COVID-19?”
I’m going to try to explain why that’s a difficult question to answer, and also why it’s the wrong question to ask.
But first, this is still a personal blog, and I need to keep track of the media appearances I’ve been doing since I last updated this space. (This stuff is important for my professorial annual reports, and this is a fun way to keep track of them, so please bear with me). Here is a list:
- May 8, 2020 — interviewed by Katrina Clarke of the Hamilton Spectator for article, “Random testing is best bet to control COVID-19, says epidemiologist“
- May 8, 2020 — nterviewed by Tashauna Reid for CBC’s The National
- May 8, 2020 — interviewed for CBC article, “What works to flatten the curve and what science says on easing restrictions“
- May 8, 20202 — interviewed on Global News Radio 640 Toronto
- May 8, 2020 — gave comments to Michel Bolduc of CBC Quebec
- May 7, 2020 — interviewed by Chris Glover of CBC News
- May 7, 2020 — interviewed by Andrew Duffy of the Ottawa Citizen
- May 7, 2020 — interviewed by Robin Bresnahan on CBC Ottawa Morning
- May 6, 2020 — interviewed on CFRA radio by Kristy Cameron
- May 6, 2020 — interviewed by Mary Ormsby of the Toronto Star for article, “Since COVID-19 emerged, how many people have died in Ontario? They won’t tell us“
- May 5, 2020 — interviewed by Kenyon Wallace of the Toronto Star for article, “How many Ontarians are still getting COVID-19 in the community? Here’s why the province can’t say for sure“
- May 4, 2020 — interviewed on CJOB radio
- May 4, 2020 — interviewed by Morganne Campbell of Global News Toronto for article, “Questions raised on transporting coronavirus patients after Scarborough woman walks home“
- May 1, 2020 — interviewed by Ahmar Khan of Yahoo! News
- April 30, 2020 — interviewed on CFRA radio by Kristy Cameron
- April 29, 2020 — interviewed on CJOB radio
- April 29, 2020 — appeared on Zoomer TV in Toronto to discuss the evolution of COVID-19
- April 28, 2020 — interviewed for CBC article, “Coronavirus: N.B. first province to create ‘bubbles’ and other provinces look to follow suit”
- April 28, 2020 — interviewed by Katherine Aylesworth of Global News
- April 27, 2020 — interviewed on CJOB radio
- April 27, 2020 — interviewed by Leslie Roberts on CFRA radio
That’s a long list. No wonder my dog was feeling neglected. He even tried to get himself onto a live TV broadcast:
CFR vs IFR
I’ve explained in an earlier post the difference between the CFR (case fatality ratio) and the IFR (infection fatality ratio.) But I think it’s worth going over again, because it’s an important point that is germane to the question I am exploring in this post.
As I am fond of telling my students, much of epidemiology is simply dividing two numbers. The tricky part is knowing which two numbers to divide, and what that division actually tells us.
The CFR is the fraction of known cases who are known to die. The IFR is the proportion of all cases –known and unknown– who die. We never really know the true IFR until the pandemic is over, since we won’t know the true death toll until all the data are in; nor will we know the total number of people who have the disease because we only know about the ones we test.
This distinction has seeded a lot of distrust in some corners of the general public, who don’t realize that this is something epidemiologists are well aware of. They hear, for example, of large prevalence studies that drop the “death rate” because of “finding asymptomatic cases”, and assume that this fairly obvious realization invalidates everything we claim to know about the disease.
But it was always expected that the final “death rate” would be lower than what is observed in current known cases; this is not surprising. It has always been known that our testing strategy will not catch a large number of mildly infected people. This is not revolutionary. This is why we have two different terms –CFR and IFR– to capture that eventuality.
Globally, COVID-19 CFR is currently about 7%. But that’s an average of a wide range of observations from countries like Iceland (around 6%) and Italy (above 12%), as of today’s date.
There’s a lot of variation because of: (1) differences in testing expanse (which affects the size of the denominator); (2) who is being tested (testing more serious cases means you’ll get more who die); (3) the quality of the health care system (better ones mean fewer people will die); (4) and data quality in each country (some places count better than others).
The IFR, on the other hand, should represent the true risk of dying of a disease if you catch it. Assuming no data quality issues, and assuming it’s the same disease being experienced, it should vary by region according to pretty much only two factors: (1) the quality of the health care system, and (2) the resilience of the population.
With COVID-19, for example, the population of a sub-Saharan African country will be less obese and much younger than that of the USA (measures of population resilience). So they are expected to die in fewer numbers. On the other hand, their health system quality is poorer, so they should die more. On the third hand, with the abysmal access Americans have to health care and insurance, maybe the health systems quality is not that much different between the two regions.
I bring this up because a reader of an earlier post suggested that the estimated IFR from the recent New York seroprevalence study (0.86%) should not be applied to other parts of the country because New York has a much greater number of cases.
But I argue that so long as the health care responsiveness and population age- and morbidity-distribution of New York is comparable to another part of the same country, then the latter should experience the same IFR. It’s the same disease, after all.
So what is the IFR for COVID-19? We don’t know yet. We only have estimates from around the world. One study suggests that the IFR in the USA should be 1.3%. Meanwhile, this pre-print (i.e., non-peer reviewed) meta-analysis suggests that it is 0.75%.
I’m on record of expecting the final IFR in Western nations to be 0.3-0.5%…. based on absolutely nothing.
How is this useful? Well, the IFR is the best guess of a random person’s probability of dying if they get COVID-19.
But Can You Trust the Death Count?
Remember that both IFR and CFR have as their numerators the people who die of the disease. A common refrain from the distrusting public is that we are overestimating COVID-19 deaths because of the way the data are coded.
One frequent offering: “If you die of a car accident and happen to have COVID, they count it as a COVID death.” Someone even asked me, “Why are they paying doctors to inflate COVID deaths?”
Let’s leave aside both that car accidents deaths are probably at an all time low, and the ridiculous assertion that overworked doctors across the nation are being bribed en masse (by Bill Gates, I’m told) to fudge the numbers. The public’s consternation is nevertheless understandable. What needs to be understood is that the current methods of counting COVID-19 deaths vastly underestimate the true death toll. Let me explain.
In many US states and Canadian provinces, a COVID death only “counts” if the deceased tests positive for the virus, at least early in the epidemic. But imagine an outbreak in a nursing home (which is where a lot of the deaths are occurring.) After a handful of patients (let’s say three) test positive, an outbreak in the home is officially declared, and everyone inside is assumed to be infected; therefore no tests are “wasted” on them.
When those first three die, their deaths count as COVID deaths. But any subsequent deaths in the same nursing home –who are very likely to be infected, as they are part of the same contained outbreak– do not count in the official tally, because they were never tested for the virus.
While this policy has changed of late, it was in effect early on during the worst of the pandemic. So in many jurisdictions (including Ontario, where I live), during the acute phase a very large number of deaths were not counted.
Some jurisdictions have chosen to expand their case definition to include, not only those confirmed by testing, but those with a high likelihood of being COVID-positive based upon their symptoms alone. This also fuels conspiracy theorists who claim the number is inflated, because “how can you know if they had it unless you test them?”
But in April the official CDC data only included those deaths with a confirmed test for the presence of the virus. So those totals were definitely missing all the highly probable cases that nevertheless were not tested.
Now, however, the CDC includes probable cases to try to make up for that undercount. Do keep in mind that there are four categories of cases: unknown, suspected, probable, and confirmed. The great majority of tested probable cases become confirmed cases. So it was decided to include the probables along with the confirmed in the final case and death tally. The best data products will distinguish the two for the discerning reader. Any of the probable cases who then died were then likely coded as COVID deaths.
Yes, this likely overestimated —by a very slim margin— the death tally. I say very slim because it is highly unlikely that anyone showing most of the symptoms of COVID was not actually infected with COVID. As testing becomes more widely available, this bias goes away.
This is an issue that is very common across epidemiology. It’s what we call an information bias. In the early days of AIDS, for example, an easy virus test was not available. So in many parts of the world, particularly the poorer countries, something called the Bangui Definition was used. It said that if the patient showed at least 12 of a list of symptoms, then the patient could be reliably assumed to be suffering from AIDS.
When HIV testing became widely available, the Bangui defintion was done away with. But symptomatic case definitions have a role to play during epidemics when testing is not available for everyone.
Are Doctors Lying on Death Certificates?
That’s an actual question that was asked of me. It’s essentially an accusation that doctors are (intentionally or otherwise) coding anyone as a COVID death who happened to have COVID when they died, even if they didn’t die of COVID.
Well, obviously doctors are not lying about who died of COVID. Different countries code COVID deaths differently, as the following table from a Politico article shows:
The CDC’s official guidance (as of April 2) states: “If COVID–19 played a role in the death, this condition should be specified on the death certificate.” See? The physician must rationally consider COVID to have been a cause of death, or an accelerator of death, for the death certificate to reflect its role.
Is it possible that this practice inflates COVID deaths? Sure. But to quote Epidemiologist Marc Lipsitch, “There are going to be some people who die of something else, happen to have COVID and get tested, and get counted as COVID deaths but would die anyway. It would be wrong to say that number is zero. However given current testing shortages and protocols, the number of such cases will be small.”
My colleague Dr Colin Furness described the challenge of coding diseases on death certificates this way, using the flu as an example: “You’ve got chronic obstructive pulmonary disease. You get the flu and then you die. On your death certificate, it’s going to say that you died of COPD. It doesn’t record what pushed you over the cliff, it records that you were tottering on the edge of the cliff and then you fell.”
This information bias is a constant challenge in medical records keeping. In the early days of AIDS, there was no opportunity to record AIDS deaths on American death certificates. As you probably know, no one dies of AIDS; they die of pneumonia, a fungal infection, or some other so-called “opportunistic” disease made possible by AIDS. As a result, the number of AIDS deaths in the early part of the American epidemic were vastly undercounted. It wasn’t until many counties and states started allowing some consideration of “death hastened by AIDS”, or some phraseology like that, was the real toll of the disease realized.
To avoid that same data bias with COVID, it’s important to create policies to capture more deaths. Why? because much like AIDS, no one really dies of COVID… they die of the pneumonia or heart failure or systems failure that COVID creates. Hence, the coding of deaths is a delicate administrative task.
So COVID Deaths Are Overcounted?
No! They most definitely are not! In fact, if anything they are vastly undercounted. While, for the reasons listed above, there is a slight information bias toward overcounting with respect to how the known deaths are examined, it’s the vast number of unknown deaths that is truly troubling.
People die at home never having known they were infected. People died before the pandemic was declared. People died in places where COVID testing was not available. And people died in palliative care centres where they were preparing to die, so no one thought to check to see if COVID hastened their deaths.
It is my opinion that this is more likely to be the case in the USA than in Canada (where I live), for the simple reason that lack of health insurance prohibits people from seeking care. The uninsured are more likely to die at home and not to have their deaths captured in the official counts.
The only way to know for sure how many died with COVID in their systems is to have collected tissue samples from everyone who died in the jurisdiction since January, and testing those samples for COVID residue.
But we can’t do that. So instead, epidemiologists do a thing called “computing excess mortality.” It works this way: we look at the total number of people who died in a given period –let’s say February to April of this year– and compare it to the total number of people who died in the same period the previous year.
I did a similar thing with this stupid study, in which we attempted to determine if after 9/11, more Americans were driving than usual, and if they were then dying more than usual due to driving being less safe than flying. (The answer is yes, by the way).
The CDC has made their analysis of excess all-cause mortality in the USA public on their website. Here’s a taste. It shows that deaths are dramatically higher now than what is historically expected:
We’re seeing this everywhere that the pandemic has touched. Moscow is reporting 18% greater deaths than last year. The New York Times has a thorough summary of the excess COVID mortality stats around the world.
Keep in mind that the best estimators of true COVID deaths based upon excess all-cause mortality will also attempt to subtract the expected traffic deaths that would have happened had we not been staying at home.
Bottom line: while the unmeasured cases in the community provide for a larger denominator, thus reducing the IFR, the unmeasured deaths in the community provide for a larger numerator, thus increasing the IFR.
Why Are We So Focused on Deaths?
For some reason, people are obsessed with IFRs and risk of death. That’s a fraction of the story.
COVID-19 comes with it crippling long term disability in a fair number of cases. Read about Barry Mangione’s story in his Facebook post.
And consider this quote from Belgian scientist Peter Piot: “Many people think COVID-19 kills 1% of patients, and the rest get away with some flulike symptoms. But the story gets more complicated. Many people will be left with chronic kidney and heart problems. Even their neural system is disrupted…”
And this thread from Epidemiologist David Lilienfeld:
You don’t want this disease.
So Is That It?
Nope. Not at all. The original question was about the risk of dying of a particular person in a particular age group, in a particular region, of a particular sex, with a particular set of comorbidities.
Currently, we have good data on the distribution of comorbidities amoung COVID cases and among COVID deaths. That is not the same thing as knowing the risk of COVID infection or the risk of COVID death among people with certain comorbidities.
This is the difference. The distribution looks at all the known COVID deaths and sees what proportion had diabetes, obesity, old age, etc. That doesn’t tell you the probability of dying of COVID if you happen to be diabetic, obese, old, etc.
To know that, we would have to know the proportion of diabetics who get COVID (to compute the risk of a diabetic getting COVID); and then the proportion of diabetic COVID patients who die (to get the comorbidity-specific IFR for diabetics). We can probably dig up the latter, but the former would be very difficult indeed.
But let’s say we did dig up the former, that somehow we accurately knew the risk of COVID death for every diabetic, or every heart patient, or every obese person. We would then have to disentangle the interactions and overlapping effects. Is it really the diabetes that heightens your risk? Or is it the fact that diabetics go to the hospital more often, and are therefore more likely to be exposed to sick people?
We do this statistically by conducting a regression analysis. Such techniques tease out which factors (the diabetes, the obesity, the age, the rurality, the gender, etc) are contributing to most of the risk.
To my knowledge, this has not yet been done anyone. As a result, I cannot accurately answer the question that elicited this blog post.
But because everyone wants an answer of some kind, here are the estimated age-specific COVID IFRs from the recent New York sero-prevalence study:
So My Risk is Low?
Yes, if you’re a healthy 32 year old your risk of death is low. It has always been low. Nothing has changed about that. According to the IFR table above, your chances of dying if you get the disease are on par with a very bad season of the flu. (Just for your age group, mind you. For the population as a whole, the risk of death is higher; so please don’t take away the wrong message!)
In fact, I would argue that your risk is even lower than that. The IFR tells us the risk of dying if you get the disease. But what is the risk of getting the disease? Well, that is given by the prevalence of infected people in your vicinity.
P(d) = P(i) x IFR
Where P(d) = your probability of death, and P(i) = your probability of encountering someone with the disease.
In fact, P(d) = P(i) x P(t) x IFR
Where P(t) = the probability of an infected person actually transmitting the disease to you.
P(i), P(t), and IFR are all very small numbers. When you multiply them out, you get an even smaller number.
What does this mean? It means if you, a healthy 32 year old person living in a low-prevalence part of the country, were to step outside and walk by another human being, the probability of that walk-by causing you to die of COVID-19 is vanishingly small.
Mind you, this type of analysis assumes homogeneous mixing and that everyone has an equally likely chance of interacting with everyone else. But real life is not like that. Those in rural, wide open places are less likely to interact with others than are people in the crowded inner city who have to ride the subway everyday.
People need not panic. This is not the Bubonic Plague. But….
So It’s All a Hoax?
No, no, no, no, no.
This is the important part. So pay attention. The error that many people, especially in the media, are making is in viewing this pandemic through the lens of individual risk. It was never about individuals. It has always been about populations.
An IFR of 0.1% sounds small. It means that if 1000 people get the disease, only one person will die of it. But if 370 million people get it, it means that 370,000 people will die. Suddenly this becomes an atom-bomb level threat.
And if a mere 0.5% of people don’t die, but end up with the crippling disability issues noted above? (A number I just made up, by the way). Well, when applied to a population the size of the USA, that’s close to 2 million people profoundly and tragically affected.
Low risks to individuals translate to incredibly high risks to populations.
The reason for the mask-wearing and the social-distancing and the temporary stay-at-home order is to drive the number of weekly new cases down to a mere handful. It’s not necessarily to keep you, an individual, alive –though that’s a happy byproduct. If we get the case production down to a small numer, then the probability of a very large number of people becoming infected drops to a much smaller quantity, and the total number of disabilities and deaths is also kept as small as possible.
I hope I’ve made the point well enough. This is not an issue of individual risk, but one of population impact. As individuals, very few of us are at any real risk of something bad COVID-related happening to us. But our communities are at profound risk of something bad happening to them. From my perspective, this is a community health issue, not an individual health one.
This is related to something called the ecological fallacy, something I blogged about two years ago. The ecological fallacy is the false assumption that an individual has the characteristics of the group from which he or she emerges. It’s not precisely relevant here, but the philosophy is the same: the human tendency to see a population-level crisis and to immediately try to contextualize it around what it means to me, as an individual.
The thing about COVID-19 is that while it is demonstrably more lethal than the flu, it is still not particularly lethal for any given random individual. However, it is amazingly infectious. This means that unless public health interventions are taken (e.g., social distancing), it will tear through the population very quickly indeed. So that low lethality rate will be applied to a very large population quite quickly, rendering a very high total body count.
To quote “Phil” from the Stats Modelling discussion board of Columbia University, “A virus with an IFR of 40% in a given population, but that only infects 0.1% of the people exposed to it, would not become an epidemic because it would not infect enough people. But a virus with an IFR of 0.1% that infects 40% of the people exposed to it would be a public health disaster and would kill millions of people.”
I believe a failure to make this realization is the source of a lot of the division between people right now.
I want to make it clear that I see my role here is as an educator. I want to help explain some of the numbers, in what little free time that I have available to write this blog. What I am not doing (yet) is advocating for one policy over another (though I might do that in other media). Please keep that in mind when commenting.
This blog has been getting a lot of attention lately, some of it unfriendly. But most commenters, emailers, and tweeters have been remarkably respectful, even when taking issue with my content. I derive a lot of positives from that observation: most notably that most people are looking for polite engagement, in which I am happy to participate when I can. (Sometimes I can’t.)
One reader, Kerry G., sent me a lovely lengthy email expressing in almost poetic terms an appreciation for my posts. It was an inspiring email, and I wanted to offer a public appreciation for the time it took to write, and for the sentiment underlying the message.
I’m very much an optimist. While some ugliness has come out of this global crisis, I mostly see heroism and people trying to help each other. So please keep up the good attitude, folks!