Correlation vs. Causation

Why "X is correlated with Y" doesn't mean "X causes Y"

The Most Dangerous Mistake in Data Analysis

When two variables move together, it's tempting to conclude that one causes the other. But correlation does not imply causation. This demo will show you why—and how confounding variables can create misleading patterns.

1 Spurious Correlations

These correlations are real—but the causal relationship is absurd. Click each example to see the data.

🧀 Cheese & Engineering

Per capita cheese consumption correlates with engineering doctorate awards

r = 0.95

🎬 Nicolas Cage & Drowning

Nicolas Cage films correlate with swimming pool drownings

r = 0.87

🍦 Ice Cream & Crime

Ice cream sales correlate with violent crime rates

r = 0.79

Cheese Consumption vs PhD Awards

Cheese (lbs/person) Engineering PhDs
r = 0.95

Why This Happens

Both cheese consumption and PhD awards have increased over time due to population growth, economic development, and changing preferences. Time is the lurking variable.

⚠️ Warning

With enough variables, you'll always find spurious correlations by chance. This is why we need theory, not just data mining.

2 Confounding Variables

A confounding variable affects both the supposed cause and the effect, creating a false appearance of causation.

Ice Cream Sales
Crime Rate

Does ice cream cause crime? Click to reveal the confounder.

Ice Cream Sales
🌡️ Hot Weather
Crime Rate

The Real Story

Hot weather is the confounding variable. When it's hot, people buy more ice cream AND spend more time outside, increasing opportunities for crime. Ice cream doesn't cause crime—they share a common cause.

3 Simpson's Paradox

A trend that appears in groups can reverse when the groups are combined. This is one of the most counterintuitive phenomena in statistics.

UC Berkeley Admissions (1973)

Gender Applicants Admitted Admission Rate
Men 8,442 3,738 44%
Women 4,321 1,494 35%

⚠️ Apparent Bias

Men appear to be admitted at a higher rate. Is the university discriminating against women?

Broken Down by Department

Department Men Applied Men Admitted Women Applied Women Admitted
A (Easy) 825 62% 108 82%
B (Easy) 560 63% 25 68%
C (Hard) 325 37% 593 34%
D (Hard) 417 33% 375 35%

The Paradox Resolved

Within each department, women were admitted at equal or higher rates!

The difference occurred because women applied more to competitive departments (like English) while men applied more to less competitive departments (like Engineering). Department choice was the confounding variable.

4 How to Establish Causation

If correlation isn't enough, what is? Here are the gold standards:

🔬 Randomized Experiments

Randomly assign subjects to treatment/control groups. Random assignment eliminates confounders by distributing them equally.

📊 Natural Experiments

Find situations where an external event creates random-like variation. Example: policy changes at arbitrary geographic boundaries.

🎯 Instrumental Variables

Find a variable that affects X but only affects Y through X. This isolates the causal effect.

📈 Difference-in-Differences

Compare changes over time between treated and untreated groups, eliminating time-invariant confounders.

Key Takeaway

Observational data can show association. Establishing causation requires careful research design—ideally experiments, or clever quasi-experimental methods when experiments aren't possible.