Why "X is correlated with Y" doesn't mean "X causes Y"
When two variables move together, it's tempting to conclude that one causes the other. But correlation does not imply causation. This demo will show you why—and how confounding variables can create misleading patterns.
These correlations are real—but the causal relationship is absurd. Click each example to see the data.
Per capita cheese consumption correlates with engineering doctorate awards
r = 0.95
Nicolas Cage films correlate with swimming pool drownings
r = 0.87
Ice cream sales correlate with violent crime rates
r = 0.79
Both cheese consumption and PhD awards have increased over time due to population growth, economic development, and changing preferences. Time is the lurking variable.
With enough variables, you'll always find spurious correlations by chance. This is why we need theory, not just data mining.
A confounding variable affects both the supposed cause and the effect, creating a false appearance of causation.
Does ice cream cause crime? Click to reveal the confounder.
Hot weather is the confounding variable. When it's hot, people buy more ice cream AND spend more time outside, increasing opportunities for crime. Ice cream doesn't cause crime—they share a common cause.
A trend that appears in groups can reverse when the groups are combined. This is one of the most counterintuitive phenomena in statistics.
| Gender | Applicants | Admitted | Admission Rate |
|---|---|---|---|
| Men | 8,442 | 3,738 | 44% |
| Women | 4,321 | 1,494 | 35% |
Men appear to be admitted at a higher rate. Is the university discriminating against women?
| Department | Men Applied | Men Admitted | Women Applied | Women Admitted |
|---|---|---|---|---|
| A (Easy) | 825 | 62% | 108 | 82% |
| B (Easy) | 560 | 63% | 25 | 68% |
| C (Hard) | 325 | 37% | 593 | 34% |
| D (Hard) | 417 | 33% | 375 | 35% |
Within each department, women were admitted at equal or higher rates!
The difference occurred because women applied more to competitive departments (like English) while men applied more to less competitive departments (like Engineering). Department choice was the confounding variable.
If correlation isn't enough, what is? Here are the gold standards:
Randomly assign subjects to treatment/control groups. Random assignment eliminates confounders by distributing them equally.
Find situations where an external event creates random-like variation. Example: policy changes at arbitrary geographic boundaries.
Find a variable that affects X but only affects Y through X. This isolates the causal effect.
Compare changes over time between treated and untreated groups, eliminating time-invariant confounders.
Observational data can show association. Establishing causation requires careful research design—ideally experiments, or clever quasi-experimental methods when experiments aren't possible.