Correlation Does Not Imply Causation

“Correlation does not imply causation” — I came to know of this phrase from the above comic from xkcd few years back.
(Fun Fact: In Latin, it is called post hoc ergo propter hoc)
So, what does this statement mean? Before that, let me explain what the terms Correlation and Causation mean and the difference between them.
Correlation
Correlation is a statistical technique that gives us an idea of how strongly 2 variables are related to each other. This is expressed numerically by the correlation coefficient, which varies between -1 & +1. There are 3 types of correlations that can exist between any 2 variables:
In short, correlation is a measure of the relation between 2 variables or events.
Causation
Causation is a scenario where the occurrence of one particular event impacts and causes the occurrence of another event. It is also known as cause and effect. An example is, the more you exercise, the amount of fat you have burned increases. The former causes the latter incident.
So, What Does The Phrase Mean?
“Correlation does not Imply Causation” — This phrase simply means that just because 2 events are correlated i.e related to each other, does not mean that one event has caused the other. Take a look at the graph below. It compares the No. of marriages in Kentucky vs No. of deaths due to drowning. From the graph, it is evident that both these events are very much related to each other. But does that mean that, the more people get married in Kentucky, more people are going to die by drowning?
Or, consider this graph, which has a correlation coefficient of almost 1. Can we deduce that the more the US government spends on science & technology, more people are going to commit suicides?
For more such interesting graphs, you can refer to this link: Spurious Correlations
But…
This need not always be the case. Sometimes, two variables can be correlated because of a third variable/factor that is unknown to us. Take for example, the famous example of the correlation between ice-cream sales and murder rate in New York City
As the sales of ice creams increase, the number of homicides also increase in New York City. Does this mean that, ice creams are causing the death of people?
No. If we look more closely, we find that this correlation is because of a third variable: Weather. During summer time, due to the sunny weather, more people prefer to go out and enjoy the climate, which in turn, leads to more people on the streets, offering wider selection of victims for predators. The sunny weather also drives more people to have ice creams, causing an increase in ice cream sales
We can conclude that ice creams and murder rates have nothing to do with each other. The sunny weather ties both these factors together. Both ice cream sales and murder rates have a causal relationship with weather.
In Conclusion
Correlation alone does not imply causation. Sometimes, correlation between 2 events could be spurious and have been caused by pure coincidence (like the drowning vs marriages case). But, there can also be instances where, the presence of an unknown factor can influence the 2 variables, as was the case with the ice cream-murder rate scenario.
So, do not jump to conclusions as and when you find a correlation between 2 variables/events. Do analyze and check if there might be any hidden factors causing the same.
Thanks for reading!!!
Originally published at http://infinitesimallysmallcom.wordpress.com on July 18, 2020.