COVID19: Let’s Talk About Models. Not the Fun Kind.
by Raywat Deonandan, PhD
Epidemiologist & Associate Professor
University of Ottawa
(I add my credentials to these COVID-19 blog posts in case they get shared. I want readers to know that my opinion is supposedly an educated and informed one)
I used to be a fairly pretty teenager with the requisite insecurities and fragile ego that come with that time in life. Unsurprisingly, I would get not infrequent offers from questionable men with questionable business cards to engage in “modelling.” Well, those memories came seeping back yesterday as I started getting many questions from journalists and the public to comment on a different kind of modelling.
Friday was an important day in the ongoing COVID19 epidemic, at least in the province of Ontario where I reside. At present, the state of New York is the global epicentre of the disease, with frankly nightmarish scenes of hospital overloads haunting our TV screens daily. There is some concern that Ontario will suffer that same fate unless efforts are taken immediately.
Today, the Ontario government revealed its model of how they expect the disease to play out locally, presenting several scenarios of likely outcomes. So today I’m going to talk about disease modelling.
Before we get into that, here are some links to recent media engagements I did, all about COVID19:
- March 2, 2020 — On TRTWorld television in the UK
- March 30, 2020 — On CFRA radio
- March 31, 2020 — On CBC’s “The National”
- March 31, 2020 — On the “Solving Healthcare” podcast
- April 1, 2020 — On CFRA radio
- April 2, 2020 — On Zoomer Radio in Toronto
- April 3, 2020 — On CBC’s “The National”
- April 4, 2020 — on two CBC news articles, #1 and #2
It’s weird doing media in a time of compelled social isolation. One’s living room becomes a makeshift studio, and peculiar joy is taken in dressing from just the waist up:
I’d also like to take a moment to thank the friends who invited me to give a virtual talk and Q&A for the community of Pauli Murray College at Yale University earlier this week. That was a joy:
I think it’s very important that those of us with a modicum of knowledge and expertise on this pandemic issue be available to the wider public to answer questions as best we can. That means you science students, too, though you may not feel as if you’re sufficiently qualified. You are. Your community needs you right now.
Back On Topic
Now, back to the Ontario public health presentation. They offered the base scenario, which is what would have happened had we done nothing. I did a quick calculation of that “Let It Burn” scenario in an earlier post for the whole Canadian population. And it appears that the government modelers took a similar approach for Ontario: take the base population of 15 million people, multiply it by the fraction needed to achieve herd immunity (70%), and by the current case fatality rate (1%) and you get something very near to 100,000 deaths.
That’s how many people would die from this disease in Ontario in a very short time had we done nothing.
Then they offered that, because we had implemented “social distancing” and other interventions, that base case scenario would not happen. Instead, we are looking at 3000-10000 deaths over the next two years, and a staggering 80,000 cases by the end of this month alone.
Now, I’m not going to weigh in on whether I think their model is right or wrong. Frankly, I have not been living with the data long enough to be able to offer any sort of worthwhile contribution on that front. Instead, I’d like to explain how these models work and why we do them, and offer some competing ideas from around the world.
What Is a Model?
In mathematics, a model simply describes the relationship between variables in a system. In disease outbreak epidemiology, a model is used to describe, explore, and explain how the different factors work together in such a way that we are better empowered to combat the outbreak.
I am not a disease modeler. I’ve only written one paper on disease modeling, and that was about how to model the impact of a theoretical extraterrestrial pathogen that jumped off of a comet or meteorite. So I am not going to pretend for a moment to be an elite expert on the nuances of infectious disease mathematics, so crucial to the well being of our very civilization at the moment. Any genuine experts who wish to correct any misstatement that I make are welcome to do so in the comments section below. Genuinely.
A predictive model uses data from the here and now to guess how things will look in the future. It’s as much art as it is science. In every scientific domain, one can take all the variables available and massage them to fit the present observations almost perfectly. But a perfectly fitted present-day model is less likely to fit the observations in the future.
As a result, there’s a degree of play –the art— in deciding which variables and insights are most appropriate to include in the model, such that it adequately describes the present while not compromising its ability to predict the future.
What Goes Into A Disease Model?
At its most basic, an infectious disease unfolds like fluids moving from one compartment to another, and then another. In fact, the equations we use for this were derived from fluid dynamics in physics.
Imagine three boxes: one labelled “S” for people susceptible to getting infected; another labelled “I” for those actually becoming infected; and a third labelled “R” for those, once having become infected, end up recovering. The people in a population flow from one box to the next to the next:
S -> I -> R
If you know a little bit about calculus, you know that a differential equation governs the rate of flow from “S” to “I”, and another one governs the flow from “I” to “R”.
You can also imagine that the “I” box can have people flow into an “H” box for hospitalized patients. And from there, they can flow to “R” or to….. gulp… a box for the deceased. You can imagine all sorts of other boxes, too, like one fore re-infections, or asymptomatic carriers, and so on. This can get pretty complicated the more nuance you want to add.
Essentially, the values that we assign to the differential equations will determine the rate and timing of flow of people between these “boxes” and therefore through this epidemic. And those values come from observing how the disease has behaved historically and/or in other countries.
Isn’t This Flawed?
Oh, it’s immensely flawed. First of all, the way I described it, the model assumes that nothing changes once you press “start”. But in real life, everything changes. The disease mutates. Our ability to treat it gets better. Our health care system might suffer and be unable to treat it. People’s behaviour will change, affecting the extent to which they are susceptible to infection. Our public health interventions might change the rate of flow from the “S” box to the “I” box.
If we were to include in the model some factors accounting for changes in the parameters over time, we would call that a”dynamic model”, as opposed to a “static model”, which assumes that all the variables remain unchanged. Most easy to understand models are static.
As well, as with all systems, it’s only as good as the data we put into it. If we are using COVID19 experiences from other countries, are the data from those countries well and accurately collected? Do the circumstances of those countries –their cultures, demographics, geographies, etc- match those of our population? Probably not. So the errors compound.
I will get back to the data quality later on, because that’s a big part of this.
Lastly, it’s important to keep in mind that the further into the future we try to project, the more imprecise that prediction becomes. The best models will include error bars or uncertainty ranges that get wider as time unfolds.
What Did Ontario Present?
I’m disappointed that Ontario did not present a model per se. They presented predictions made by the model, which is a bit different. I don’t know what assumptions went into their model, which data inputs were made, and whether their model was meant for health systems planning or for public communications.
That latter part is subtle. If you’re modelling likely outcomes of different scenarios, then the intention is to compel action. But if you’re modelling the likely spread of the disease itself, given all known conditions, then the intention is to assist in planning. The intent matters, as it drives the choices made in modelling.
To my mind, the most important bit of information presented at the press conference was this chart:
It shows the likely effect on this province’s capacity to provide ICU beds, with the two dotted lines indicating our current capacity and our projected capacity after we surge in building dedicated COVID19 hospitals next week.
It also shows a peak in the wave of ICU-bound cases in mid-to-late April, which is the closest we got to a timeline being shared.
Note that the outcome measured is ICU usage. Not hospital beds. Not deaths. Not cases. So at least four questions remain unanswered:
- When do they expect the actual epidemic to dwindle in Ontario
- What has been the effect of the public measures taken so far on the timing of the epidemic
- What would be the effect of more intense public measures on the timing of the epidemic
- What does this mean for a timeline for a return to some semblance of normalcy?
What Are Some Other Models?
Dr David Fisman of the University of Toronto is an elite modeler, in my opinion, who has developed a model called the Incidence Decay and Exponential Adjustment (IDEA) model that incorporates the introduction of public health measures into its prediction of the evolution of a disease. It was used convincingly to describe the 2014-2015 Ebola outbreak in West Africa.
Dr Fisman occasionally shares his opinions with the public on how he thinks the IDEA model fits to various Canadian populations during this COVID19 crisis. For example, he predicted that BC will see a flattening of their cumulative case counts in mid-April:
If you’re interested, he and his colleagues, Dr Ashleigh Tuite and Dr Amy Greer, have released this pre-peer reviewed model of the transmission and mitigation strategies in Ontario. These pre-review papers are often removed from the server as the paper makes its way through peer review. So I hope the link still works when you click on it. Here’s a figure from their paper showing projected ICU requirements in Ontario:
The nation of Iceland, whose praises I have been singing of late, pursues a more scientific approach to monitoring their epidemic. In addition to testing symptomatic individuals, they engage in rigorous contact-tracing as well as testing a random sample of citizens, including asymptomatic ones, to get a good sense of the true prevalence of COVID19 in that country.
They’ve also been transparent in their methods, sharing a simple version of their predictive model via their public website:
They produced a hospitalization chart, too, much like the one presented by Ontario public health, with a similar timeline for peak hospitalizations:
The nice part about the way Iceland has presented this very simple graph is that they have shown how well the actual observations are tracking with the predictions. At a simple visual level, this offers some confidence in the validity of the model.
The mother of all models with respect to the current pandemic is that put forth by Dr Neil Ferguson and colleagues of the Imperial College of London. This is the one that most lucidly put forth the competing models of do nothing, suppression, or mitigation, fitted to the UK population:
The Imperial College math is supposedly the bit of evidence that kicked various Western governments into action. So we should be thankful for Dr Ferguson and team’s timely contribution.
I’m personally bullish on the Institute of Health Metrics and Evaluation (IHME) models for the USA. Their organization is backed by the Gates Foundation, and their calculations are apparently driving much of US policy right now.
Here is their projection for deaths in that country:
All the models assume that the pandemic will continue for the next two years. However, most predict a peak in both cases and deaths in mid- to late-April. In North America, the IHME model anticipates an end to the current wave by the beginning of June. They reiterate the message I’ve been trying to spread, that “avoiding reintroduction of COVID-19 through mass screening, contact tracing, and quarantine will be essential to avoid a second wave.”
I.e., test, test, test, then test some more.
(UPDATE: Here’s a systematic review and critical appraisal of various COVID19 models)
You might have noticed that the IHME model and some of the others focus on deaths rather than cases. This is a little bit controversial, but let me explain the reasoning.
When we want to know about how the epidemic will unfold, we’re really talking about the rate of generation of new infections…. an incidence rate. If we had an accurate measure of true incidence (and true recovery), then we would have an excellent grasp on the state of the affairs.
But by now everyone is aware of the problems we’re having with testing. In short, we don’t test a lot. Because test kits are in short supply, we reserve them for people showing strong symptoms, because then we know how best to treat them and the degree of protection that needs to be instituted for them, the heath care workers, and other patients.
If we’re only testing the serious cases, we’re missing all the non-serious cases, who are nevertheless out there infecting other people who might become serious cases.
So because of our inability to test widely and randomly, the known incidence rate is poor estimator of the true incidence rate. At a certain point in the pandemic, this is acceptable, since the testing fraction does not change. In other words, if we are implementing the same policies and rate of testing today as we are tomorrow, then the rate of change of known cases between today and tomorrow will correlate very well with the actual rate of change of true cases. In theory.
But over longer periods of time, that correlation is not high. This is for a lot of reasons, including that testing policies change, the availability of test kits vary, and growth in new cases might be different among the serious cases (those we test) and the non-serious ones (those we don’t test). So looking at known cases ultimately does not give us a reliable view of the spread of the epidemic.
Deaths, on the other hand, are known. It is unlikely that in a Western industrialized nation there are a substantial number of people in the community dying of COVID19 of whom we are not aware. All of the deaths are captured in the data.
But deaths are not a perfect reflection of the growth of the epidemic, either. There may be faster growth among the non-serious cases who never proceed to hospitalization and death, for example. And as clinical experience with a disease progresses, it is conceivable that our ability to prevent death also improves. So over time, death might increasingly become an under-estimator of case incidence.
This is part of the data quality issues I mentioned above. Neither reliance on known cases nor on deaths is a perfect gauge of the flow of the disease. But for long term planning purposes (i.e., more than a few days), in my opinion deaths are more reliable. That is, until we develop a reliable surveillance system for COVID19.
(UPDATE: I’d like to add that deaths as an indicator are further underestimated in regions that embrace palliation at home or in long term care centres. In such instances, it’s possible that COVID19 deaths would not be visible to the case-counters.)
How Could We Do Better?
I always seem to have the same answer, but I think it’s the right one: better data is always needed. Better data can be obtained through good surveillance. Relying on test results of symptomatic patients showing up in hospitals is not good surveillance.
Eventually we have to deploy a real screening program, like that used in Iceland, but at much larger scale. The ability to detect most cases in short time not only empowers us to build much more accurate models, but it also actually allows us to return to society faster. Contact tracing to find all cases, symptomatic or not, empowers us to quarantine only those who need to be so isolated, leaving the rest of us mostly free to go about our lives.
I would also like to put out a call for a real-time global database of COVID19 cases, complete with demographic information, medical history, known comorbidities, applied treatments, and prognoses. This database should be readily available to the world’s epidemiologists so we can nail down specific clinical and behavioural factors relevant to a good outcome. That way we would know specifically which risk factors predispose someone to a very bad COVID19 experience, and could take special precautions to protect those people.
Again, these are existing technologies –databases, tests, and contact tracers– that can be put into action in short order. They are in toto less expensive than multi-billion dollar economic stimulus packages. All they require is will and management acumen. And we have no shortage of managers, most of whom are sitting at home right now, looking for a purpose.
The doctors and nurses can hold the front line. Maybe the administrators can win the war.