I used to teach at various colleges, and one of my favorite classes to teach was Introduction to Probability and Statistics.
But my absolute favorite thing to do was to prep other math adjuncts to teach the class. The dirty secret, of course, is that a lot of Intro Prob/Stats classes are taught by grad students and adjuncts who barely know real statistics….because they never had to take the class themselves.
And they never had real-life experience with how statistics go wrong in the real world.
But I do have that experience, and I shared examples with them.
One of my favorite things to give them were real-life statistics that are strong results — usually correlations, but some are hypothesis testing results — but really are ridiculous.
Here’s one I picked up this year — a strong correlation between chocolate consumption and Nobel Prize-winning.
Winning a Nobel Prize is Like Eating a Box of Chocolates
“Chocolate Consumption, Cognitive Function, and Nobel Laureates” by Franz H. Messerli, M.D., published in The New England Journal of Medicine, 2012.
The graph from the paper:
The correlation coefficient was 79%, which is very strong.
Here is the argument the author made:
Chocolate consumption enhances cognitive function, which is a sine qua non for winning the Nobel Prize, and it closely correlates with the number of Nobel laureates in each country. It remains to be determined whether the consumption of chocolate is the underlying mechanism for the observed association with improved cognitive function.
Therefore, China should be feeding its scientists more chocolate in order to win more Nobel Prizes.
Some obvious weaknesses
Let’s see what weaknesses the author himself pointed out.
The entirety of the “study limitations” section:
The present data are based on country averages, and the specific chocolate intake of individual Nobel laureates of the past and present remains unknown. The cumulative dose of chocolate that is needed to sufficiently increase the odds of being asked to travel to Stockholm is uncertain. This research is evolving, since both the number of Nobel laureates and chocolate consumption are time-dependent variables and change from year to year.
I did wonder if he wrote all of this as a joke, by the way.
Before we get to a rebuttal paper (or papers), I want to note a few things.
First, the author uses this Wikipedia page (as of a particular date which captured awards given through Oct 10, 2011), which has two Nobel rankings, depending on whether you include the Economics prize or not. I assume he used only one specific ranking. I will just note the ranking I see now, for which there are 919 prizes.
Unsurprisingly, the U.S. has the highest number of prizes, but we are the third largest country with respect to population, so that cuts our per capita take. The United Kingdom has the 2nd highest, then Germany, and their smaller populations get their per capita counts up.
Second, the countries where no prize was awarded were excluded. So that’s a bunch of countries with zeros as the “response” variable.
Third — my main issue is where he got his per capita chocolate consumption data. Alas, he linked to the sites in general, and not the specific pages/reports from which he got his data. I had to use the Wayback Machine to try to find his info. I could get to the pages, but they were all of PDFs that weren’t archived! I couldn’t get any of the data.
So he has consumption data from four different sites for only 21 countries in total.
The ranking table has 74 countries. HMMM. Iceland is ranked pretty high on the Nobel list — what’s its chocolate consumption? So we have a limited amount of data for the “independent” variable.
What’s the Process for Awarding Nobel Prizes?
But mainly, let us look at this bit in the paper, which is why I thought the paper was a joke:
The only possible outlier in Figure 1 seems to be Sweden. Given its per capita chocolate consumption of 6.4 kg per year, we would predict that Sweden should have produced a total of about 14 Nobel laureates, yet we observe 32. Considering that in this instance the observed number exceeds the expected number by a factor of more than 2, one cannot quite escape the notion that either the Nobel Committee in Stockholm has some inherent patriotic bias when assessing the candidates for these awards or, perhaps, that the Swedes are particularly sensitive to chocolate, and even minuscule amounts greatly enhance their cognition.
Sweden is an outlier.
There should be a really obvious reason why - the Nobels are centered in the country of Sweden. Alfred Nobel, the founder of the prizes, was Swedish. He set up the process for the awards.
Except for the Peace Prize, which is selected by the Norwegian Nobel Committee. Nominations can come from all sorts of people, with a bias towards the Swedes and universities close by, but the selection for most of the prizes is done by the Royal Swedish Academy of Sciences (that link goes to the page related to the Chemistry prize).
If you check out the awarding of Nobel Prizes, there’s a little clumpiness to the awards for the ones going to Swedes and other neighboring countries.
Other things that correlate with Nobel Prizes
Here is a rebuttal paper.
“Does Chocolate Consumption Really Boost Nobel Award Chances? The Peril of Over-Interpreting Correlations in Health Studies.” by Pierre Maurage, Alexandre Heeren, Mauro Pesenti. Published in The Journal of Nutrition, Volume 143, Issue 6, June 2013, Pages 931–933, https://doi.org/10.3945/jn.113.174813
Abstract:
A correlation observed between chocolate consumption and the number of Nobel laureates has recently led to the suggestion that consuming more chocolate would increase the number of laureates due to the beneficial effects of cocoa-flavanols on cognitive functioning. We demonstrate that this interpretation is disproved when other flavanol-rich nutriment consumption is considered. We also show the peril of over-interpreting correlations in nutrition and health research by reporting high correlations between the number of Nobel laureates and various other measures, whether cogently related or not. We end by discussing statistical alternatives that may overcome correlation shortcomings.
Let me share the other things that highly correlate with per capita Nobel awards.
GDP
This makes sense.
The correlation coefficient is the “r” you see on the graph. r is 0.73 here, a little less strong than chocolate consumption.
GDP has a high correlation with per capita chocolate consumption, too, by the way. The correlation is 66% for that.
IKEA Stores per Capita
This is a stronger correlation than chocolate consumption, at 82%.
Given that IKEA was founded in Sweden and headquartered in the Netherlands, you can expect that IKEA density pattern.
While the authors of the paper could not come up with a causal pattern, I definitely can — proximity to Sweden. There is a “who do we know” aspect to awarding Nobel Prizes, at least historically.
Correlations tell the researchers the degree of relationship between factors; no more, no less. They prove useful in understanding which factors are related and in generating hypotheses for further experimental testing. Our discussion of a recent report attributing beneficial effects to chocolate consumption shows the peril of over-interpreting correlations. In nutrition research, such erroneous inferences may have dramatic effects, as they may lead to attributing beneficial (or harmful) effects to a wrong cause, hence representing a real danger for health. We hope we have helped readers to correctly situate the relevance of the initial report and to avoid misinterpretations of correlations that hamper nutrition and health research.
Remember Correlations and Causation Do Not Always Go Together
Here is the obligatory xkcd comic:
The smartassery is that things may have zero connection whatsoever and have a strong correlation, as in Tyler Vigen’s Spurious Correlations site.
That’s the power of small data dimensions, which may be what is going on here.
We’ve excluded a bunch of countries with zero Nobel Prizes and zero data on chocolate consumption. That skinnies the data set quite a bit.
You can have a large data set to begin with, but with few dimensions (such as a few years or a few countries), you’re sure to find one slice that will give you the high correlation you’re looking for.
There’s something more insidious in that one can actually have a causal relationship and miss a correlation entirely, but that’s for another time.
Usually, however, what you’re seeing are two items where the causality is something shared, such as proximity to Sweden.
And yes, some of this is a reminder to be cautious in all the COVID DEATHS CORRELATE WITH THIS POLITICAL THING THAT I HATE! studies you see floating around.
Just a thought.
Send me funky data or studies!
I love this sort of thing, as you can see. If you know of examples (and yes, I know Tyler Vigen, I linked to him above), please send them to me: marypat.campbell@gmail.com or tweet it at me at meepbobeep. That’s how I saw this one.
I particularly love ones that actually got published in academic journals, and got rebuttal papers.