Geeking Out: In Death Rates, the Denominator Is Also Important
Revisiting the spurious cancer rate death spike among the elderly - it's a fake spike
About a month ago, I posted about cancer death rates by age and sex and noticed this:
What the heck with that 2021 spike?
When I looked at the actual number of deaths, that hadn’t spiked, but I did notice something odd about the estimated number of 85+-year-old women:
So I contacted folks at the CDC, and I had a nice chat with Jane Henley (who has a fascinating piece on the interaction of cancer and COVID - but that’s for another time).
And I realized I should read all those notes that come with my data downloads from WONDER.
Timing is everything
I have been updating my mortality slides for a presentation to the Iowa Actuaries Club (and this bit is getting into the presentation, by the way), and I did notice this note at the bottom of the data query:
The population figures for years 2022 and later are single-race estimates of the July 1 resident population, from the Vintage 2022 postcensal series released by the Census Bureau on June 22, 2023. The 2022 series is based on the Modified Blended Base produced by the US Census Bureau in lieu of the April 1, 2020 decennial population count. The Modified Blended Base consists of the blend of Vintage 2020 postcensal population estimates for April 1, 2020, 2020 Demographic Analysis Estimates, and 2020 Census data from the internal Census Edited File (CEF). The population figures for years 2021 are single-race estimates of the July 1 resident population, based on the Blended Base produced by the US Census Bureau in lieu of the April 1, 2020 decennial population count, from the Vintage 2021 postcensal series released by the Census Bureau on June 30, 2022. The population figures for year 2020 are single-race estimates of the July 1 resident population, from the Vintage 2020 postcensal series based on April 2010 Census, released by the Census Bureau on July 27, 2021. The population figures for year 2019 are single-race estimates of the July 1 resident population, from the Vintage 2019 postcensal series based on April 2010 Census, released by the Census Bureau on June 25, 2020. The population figures for year 2018 are single-race estimates of the July 1 resident population, from the Vintage 2018 postcensal series based on April 2010 Census, released by the Census Bureau on June 20, 2019. More information.
Yes, I know it’s a lot of yadda yadda.
If one clicks through on “more information” and digs through that documentation, one gets to the methodology of “Population Information”. This details how the population size was estimated for each year, and for what period that was in effect.
But here is the key point:
Comparison with other releases:
The rates and population figures by single-race for years 2018-2020 are different from the other releases of the 2018-2020 mortality data on CDC WONDER, in that data are available by single race categories.
The rates and population estimates for 2021 and later years differ in methodology from prior years, in that population figures are based on the Blended Base produced by the US Census Bureau in lieu of the April 1, 2020 decennial population count. The Blended Base consists of the blend of Vintage 2020 postcensal population estimates, 2020 Demographic Analysis Estimates, and 2020 Census PL 94-171 Redistricting File see 2020-2021 Population Estimates Methodology).
Note the change in population denominators for death rates in 2021 and later years, from the July 1, 2020 population estimates to the July 1, 2021 population estimates, effective in the provisional mortality update in January 2023.
Note the change in population denominators for death rates in 2022 and later years, from the July 1, 2021 population estimates to the July 1, 2022 population estimates, effective in the provisional mortality update on January 18, 2024.
Let’s take a quick interlude before diving into the nitty-gritty details of what to do with all these denominator shenanigans.
But wait, didn’t a lot of old people die in 2020 from COVID?
Sure. But not enough to explain this huge population drop from 2020 to 2021.
A simple way to look at the age 85+ group is to assume there is no net migration (no emigrants out and no immigrants in), so the way the count changes from year 2020:
Age 85+ group in 2021 = Age 85+ group in 2020 who didn’t die + Age group age 84 in 2020 who didn’t die
Pretty simple, right?
Let’s adjust this a little:
Age 85+ group in 2021 = (Age 85+ group in 2020 - Age 85+ group in 2020 who died) + Age group age 84 in 2020 who didn’t die
Age 85+ group in 2020 - Age 85+ group in 2021 = Age 85+ group in 2020 who died - Age 84 group in 2020 who didn’t die
Let’s go to the numbers:
So about 1 million people over the age of 85 died in the U.S. in 2020.
The population estimate for the age 85+ group decreased by 682,664 people.
682,664 = 1,012,805 - x
x = 330,141
Let’s see the U.S. Census estimate for 84-year-olds in 2020:
Um, looking like it should be about 1,070,307 - 87,616 = 982,691 should have been sitting there. Not 330,141.
So if the 2020 numbers are correct (and ignoring the 2021 numbers), we’d come up with an estimate of:
beginning 85+ population in 2020: 6,658,420
Number of people age 85+ who died in 2020: -1,012,805
Number of those aged 84 in 2020: 1,070,307
Number of those age 84 who died in 2020: -87,616
This gives us a population estimate of 6,628,306 — now that is 11% higher than the estimate currently sitting in CDC Wonder… and may still be too high.
Remember, I’m ignoring net migration. There may be an effect there, though one does not expect a lot of migration in and out of the country at these ages.
In addition, there still may be a death undercount for the oldest people in 2020 as well as incorrect population estimates in 2020.
However, I’d be happier with a 6.6 million estimate for those age 85+ than a 6.0 million estimate.
Effect on Age 85+ Death Rates
So let me take my 6.6 million estimate and compare it against the 5.9 million in results for calculated death rates.
Here are my calculations.
Before: (6.0 million population estimate)
2021 All-cause death rate for age 85+: 15.7%
After: (6.6 million population estimate)
2021 All-cause death rate for age 85+: 14.2%
Both of these are very high, but one is 11% higher than the other — and, by the way, it’s creating anomalies at slightly lower ages as well.
What happened with the estimates?
First, CDC Wonder population estimates come from the U.S. Census Bureau via various survey techniques, and there are multiple nationwide surveys the Census Bureau does (and I’ve answered some of these! Obviously, I use these stats all the time.) to gather these statistics.
Look at this page: Census’s population and housing units methodology documentation page. I was going to share a screenshot, but things puke on me.
You may see a “Vintage 2020”, “Vintage 2022”, and “Vintage 2023” link.
That’s strange, you may say to yourself. Where’s Vintage 2021? Isn’t there something about Vintage 2021 in the notes above?
Indeed. Yeah, they screwed up their methodology surrounding “group quarters” in Vintage 2021, e.g. nursing homes. (and dorms and some other stuff)
Who do you think are the most likely people to be in nursing homes?
Yeah. To be sure, it’s not just the age 85+ group — the ages 75-84 group is also affected.
I have not been able to find where it’s explicitly stated, but I believe that Covid disruptions along with the timing of the 2020 decennial Census caused trouble for the Vintage 2021 results. And they were removed from the Census website when they were obviously trash.
But, unfortunately, they’re currently in CDC Wonder because it’s part of the finalized 2021 data. (Currently) It’s going to be a while before that gets changed.
Effect on Age-Adjusted Death Rates
Let me return to the age 85+ adjustment (or, rather, the wide range I’m looking at).
To remind you about age-adjusted death rates, here’s my explainer:
Here is the set of weights currently being used for age-adjusted death rates:
So the current age-adjusted all-cause death rate for 2021 is 880 (per 100K people), but if I use the 6.6 million in population for the 85+ age group estimate, the age-adjusted death rate is 856.
That’s 3% less.
That’s not much of an effect.
What Will I Be Doing?
I’m going to be lazy. (Yes, really.)
I could use the Vintage 2022 or 2023 estimates and then try to merge them with my CDC Wonder data draws, but that requires extra work on my part…. and frankly, I don’t feel like doing that right now.
First, all the effects going forward are just in calendar year 2021. And that was our worst mortality year in terms of total deaths, highest death rates (thanks to COVID), and more.
It stood out with the “cancer spike” because cancer didn’t actually get worse in 2021, not for the elderly. Actually, not for anybody, as far as I can tell — but I will be revisiting the cancer issue soon enough. Cancer is a slower-moving issue.
Second, the effects on age-adjusted death rates are fairly minor, so for longer-term trends, that’s easy enough to glide over when it’s only one year due to a misstated denominator at older ages.
Third, I do know about a Census data release that will eventually be coming that will affect prior multiple years’ of mortality data rates… but I do not know when it ultimately will come. I am patient. I will talk about that one when it happens.
Fourth, I mainly care about what’s happening now, so I’m focusing on 2023 results mostly (though I’m also waiting on finalized 2022 results). So… I’m ignoring 2021 for current analysis.
Lesson: Know Your Data, Know Materiality
Yeah yeah yeah
But here are my prior posts/podcasts/videos on that very point:
April 2020: Mortality with Meep: Excess Deaths and Coronavirus
October 2022: On COVID (and all-cause) Mortality and Political Affiliation Studies
August 2022: A Tale of Know Your Data: The Mystery of the Excess Connecticut Summer Deaths
March 2020 Video: Know Your COVID Data
March 2023 Podcast: Public Finance and Data
Aug 2020: Data people, get some standards!
Apr 2020 Video: Know Your Data — How is it Reported?
Data issues are a big part of my (paid) working life, so I’m always aware of how data can go wrong.
But I also know that it’s not necessarily worth my time to fix it for every data bobble — I do need to know what is signal and what is noise, but I can let some of the noise just bubble in the background because I need to concentrate my time on the signal.
So, now that I’ve worked through the effect, I know I need to de-emphasize any potentially spurious “pattern” in mortality for older ages in 2021.
I know some people hate that there are errors that will persist, but that’s just life.
I mainly want to make sure I understand what’s going on, and then decide priorities. 2021 is not my priority right now.