35 Correlation Does Not Imply Causation

You may have heard repeatedly that “Correlation does not imply causation.” An amusing example of this comes from a 2012 study that showed a positive correlation (Pearson’s r = 0.79) between the per capita chocolate consumption of a nation and the number of Nobel prizes awarded to citizens of that nation (Messerli, 2012). It seems clear, however, that this does not mean that eating chocolate causes people to win Nobel prizes, and it would not make sense to try to increase the number of Nobel prizes won by recommending that parents feed their children more chocolate.

There are two reasons that correlation does not imply causation. The first is called the directionality problem. Two variables, X and Y, can be statistically related because X causes Y or because Y causes X. Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the third-variable problem. Two variables, X and Y, can be statistically related not because X causes Y, or because Y causes X, but because some third variable, Z, causes both X and Y. For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as spurious correlations. Some excellent and amusing examples of spurious correlations can be found at Tyler Vigen’s website (Figure 7.7 provides one such example).

Figure 7.7 Example of a Spurious Correlation.
Figure 7.7 Example of a Spurious Correlation.

5G Violence

The Covid-19 pandemic was a world-changing event in that it brought the entire world to a standstill for almost 2 years. There was a lot of uncertainty and fear about the origins of the virus as well as numerous conspiracy theories about causes and cures. One common image floating around the internet was similar to what we see in Fig 7.8: the overlap between 5G towers and the prevalence of Covid-19 cases.

Data for COVID-19 (as of September 18, 2020) and rollout of 5G as of September 2020.
Fig 7.8 Data for COVID-19 (as of September 18, 2020) and rollout of 5G as of September 2020 (Tsiang & Havas, 2021).

 

As we can see, there is a strong correlation between 5G towers and Covid-19 cases. There was even a bombing in Nashville Tennessee that was linked to this conspiracy theory. However, what people didn’t understand was the third variable problem: correlation (however strong) does not mean causation. Is it possible that the distribution of 5G towers and the prevalence of an epidemic are correlation only because they are related to a third variable?

Map showing the population density of the United States.
Fig 7.9 Population density of the United States. From the 2006-2010 American Community Survey.

 

As we can see from Fig 7.9, the population in the continental United States is not equally distributed across the land. Companies build 5G towers where they are needed. Therefore, they tend to build more towers in cities with high population density compared to rural areas. In the same vein, epidemics spread in places where human beings are more densely population and less so in rural areas with lower populations. This third variable (population density) is independently correlated with 5G towers and Covid-19 cases. However, if you just did a correlation between the latter two variables, you will find correlations. However, this does not mean there is a relationship between them.

 

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Critical Thinking Copyright © by Dinesh Ramoo, Thompson Rivers University Open Press is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book