Mortality with Meep: COVID-19 Deaths and the Importance of Dates
Without correct details, you get a misleading big picture
Yes, this is going to get quite technical as I go down farther in this post, but I will start with a piece somebody forwarded to me, which explains the issue in words. I think many people are more comfortable thinking in words than numbers, though that is not necessarily true of you, my reader.
How COVID-19 fatality reports are distorting the data on daily death rates
Seven months into the COVID-19 pandemic in the United States, state governments and media outlets continue to publicize confusing, misleading data on the spread of the disease here, perpetuating fears that deaths from the virus are skyrocketing on a daily basis even as those fatalities are generally distributed across a period of days, weeks or even months.
At issue is how state health departments publicize daily reports of fatalities within the state’s borders. State health officials have for months been publishing two sets of mortality statistics: deaths that occurred on the publication date in question, and deaths that have only recently been catalogued from state backlogs.
This is poorly (and/or inaccurately) expressed. There are two sets of statistics, one of which is meaningful by itself, and the other one which is meaningful only in relation to the first set.
The one that is meaningful without any other context is providing the number of deaths based on the dates they occurred. The CDC data I used in my video explainer are all based on the date the deaths occurred. That is why the total number of deaths drops off in recent weeks – nowhere near to all those deaths have been reported yet. Every time the CDC updates its numbers, the numbers for a particular week can go up (usually, as more deaths are coming in) or down (less often, but a correction due to change of date of death or cause of death).
This is clear: the date on which a death actually occurred is the most important piece of information, next to the cause(s) of that death.
If you are trying to determine pandemic impact, the date on which the death was reported/classified is almost meaningless. The date of that the death was reported is meaningful only in the context in knowing what the lag was between when it occurred and when it was reported. The “newly reported deaths” statistic, which is the meaningless-on-its-own stat, is the number most often reported in the media.
Back to the piece:
The Arizona Department of Health Services publishes both of those figures on its coronavirus dashboard: On its “Summary” page, it lists the “number of new deaths reported today,” while on its “Covid-19 Deaths” tab, the state lists the actual “deaths by date of death.”
The distinction is a critical one: The state’s “new deaths” every day do not actually reflect the number of coronavirus fatalities Arizona has logged in the past 24 hours, but rather the number of COVID-19 deaths it has identified from both new and older death certificates.
“While we had 91 new deaths reported today, the graph [on the dashboard] shows them by the actual date of death,” she said on Friday. “Although those 91 were reported today, it doesn’t mean today was the date of death. Those deaths may have occurred at any time on the graph but were simply reported today.”
She added that the state’s graph “gets updated every day with the new deaths by actual date of death.” Poynter did not respond when asked if there was a way to see the date distribution from each day’s new report of deaths.
The daily new death report often generates misleading sensationalist reports throughout the media. On July 7, for instance, the state recorded 117 “new deaths” on its dashboard. Calling that number a “record,” CNN reported that Arizona that day reported “117 deaths from Covid-19 over the last 24 hours.” Business Insider reported that Arizona recorded “its highest number of newly reported coronavirus deaths” on that day. News Break said the state on that date “recorded its highest single-day death toll.”
You can read the rest there, which also gets into some of the misleading reportage of Florida COVID deaths.
Don’t attribute to malice which can be attributed to laziness
Indeed, I consider laziness a stronger force than stupidity, as common as both conditions are. Anybody can be lazy. Even supposedly smart folks.
In an email with the person who sent me this piece, I pushed back on necessarily attributed malicious motives to those who have set up these misleading systems.
Here are some other possible motives:
1. desire for publicity [not much news in “there were a thousand extra deaths from two months ago we didn’t count til today”]
2. laziness [far easier to set up systems that just show reported than updating the whole curve each day]
3. incompetence
A lot of people would simply like to add a new point to their graph, and not have to update the whole series daily. It sure is easier. It’s very easy to say what changed day-over-day in a media report rather than say “Of 150 newly-reported deaths, 100 occurred this week, 30 last week, 10 two weeks ago, and 10 from before that.”
It requires a lot more thought to dig into multidimensional trends than a single data point… no matter how misleading that single data point is.
What’s amusing to me is that you can get extremely silly results if you stick to this reporting-date view of deaths, which I will show below.
Negative deaths and unnatural spikes
Look at the following graph:
The people who put this graph together at Our World in Data obviously know the distinction between reported date and occurrence date. It’s in the explanation in the text to the right.
It would have been better if they had gotten better data and had their graphs based on when the deaths occurred. Obviously, using a 3-day or even 7-day rolling average will not be able to fix that absurd spike in the graph. No, it’s not fake data. It’s just data based on the date of reporting the deaths. Those deaths almost definitely took place over a period of several weeks. You will get a spike like this when they change the criteria for counting as a COVID death, for example. You can also get negative deaths that way.
The most grimly amusing situation with this sort of data was when a bunch of deaths got recategorized (or it was found there was double-counting of deaths), and because the system wasn’t based on the date of occurrence, the number of reported deaths for a day was negative. (WHO SitRep, 4 May 2020)
Due to the recent trend of countries conducting data reconciliation exercises which remove large numbers of cases or deaths from their total counts, WHO will now display such data as negative numbers in the “new cases” / “new deaths” columns as appropriate.
Here it is for 4 May — I guarantee there weren’t negative COVID deaths on that day in the U.S.
Heck, for all I know, that correction got re-corrected the very next week, resulting in a spike.
That’s what you get for being lazy.
Getting into geekery: IBNR
None of this is new to actuaries.
Many actuaries have to set a reserve called IBNR – Incurred But Not Reported. Insurance companies have to book a reserve for the claims they’re on the hook for, but they don’t know about yet. It’s obviously an estimate.
I did this for life insurance & reinsurance for a while, and there are a variety of methods we use to estimate this. One key piece of information we use is the history of how long it took for deaths to get reported to us. Sometimes deaths don’t get reported to the insurance companies until long after, and if we didn’t have reserves set up to pay those claims… there would be some extremely angry people, including regulators.
Any time something happens in the claims process that the timing changes, the actuaries (or even accountants) setting IBNR reserves need to know. If one estimates that it takes on average 3 months before the insurer hears about the death, then one would take the risk exposure of the business into account and estimate about 3 months of claims (this is a big simplification of one way to set IBNR). But say your processes have run into problems and now the average time til you hear about a claim is 6 months. If you keep assuming 3 months lag, you will likely set the IBNR too low.
This is why I made the comment in my recent video at about the 9 minute mark: “But I swear to you, if they were in insurance, I would not let this person set IBNR”. I was commenting on the “weighted estimated total deaths” graph from the CDC, which I believe underweights for the lag. It’s just not credible to me.
Fixing the Graphs for Florida
This finance person on twitter has done his own effort of fixing Florida’s COVID death graphs:
That is a very busy graph, and tough to read. Let’s look at something simpler to read.
I grabbed a screenshot from here, which is based off the same data set, and here’s an indication of the incidence of the most recently reported deaths [on 21 July 2020, as I grabbed this graphic on 22 July 2020. There is a data problem today on 23 July 2020, as one of the deaths was officially entered as having occurred in 1989, so I obviously didn’t update anything].
So, of all the COVID deaths reported on 21 July 2020, only one actually occurred on 21 July. That doesn’t surprise me at all. The majority of the reported COVID-19 deaths occurred 1-4 days before. Then, there are some small stragglers from almost a month ago.
This is even worse for COVID cases, by the way. If nothing else, you can die only once from COVID.
You can test both negative and positive from COVID multiple times, dependent on how many times you’ve been tested. There is definitely a lag on reporting back that negative or positive result, and it can differ within the same testing organization and obviously between organizations.
Jon Taylor has discovered other issues with the Florida data, such as labs that are reporting only positive test results. Again, I posit only laziness.
Going back to total deaths from all causes, and relevant comparisons
Yes, yes, I will be getting to my mortality update (hope to do a video this weekend). This is me getting an important point out of the way.
If things weren’t changing so much so rapidly, it would not be too huge a deal that there were 1-4 days lag in reporting COVID deaths and then just going by newly reported deaths.
Indeed, if people just told you the new deaths from any cause per day, that is far less likely to give you huge spikes. You can say “There were 900 total new deaths reported today, of which 150 were attributed to COVID-19.” It gives you an idea of how many deaths per day have been happening, because it’s tough to know if 150 deaths are very few or quite a lot compared to background mortality.
That does presume that the COVID deaths aren’t being treated differently from counting other deaths (almost certainly not true, but let’s pretend).
The problem with much of the reporting around COVID-19 deaths is that the focus is making sure the media audience keeps showing up (this is even separate from any political situations), and not necessarily in giving people information so that they can gauge how serious the mortality impact is. Most people really have no clue how many people die per day in the U.S. normally (and how much that varies seasonally), forget about in their particular state or city.
Bringing it back to the actuarial profession, we are notorious for getting hung up on details and nit-pickery, and supposedly have trouble with the big picture.
Well, I say for fast-moving issues like COVID-19, those details are indeed very important, and you will get an entirely misleading big picture unless you know exactly what you’re looking at.
Data people — you need to build better, more meaningful dashboards. Stop with the meaningless statistics.