What is Confounding?
Confounding occurs when a third variable influences both the independent and dependent variables, creating a spurious association that can be mistaken for a causal relationship. It is the primary threat to valid causal inference from observational data.
How it works
A confound is a variable that is associated with both the treatment and the outcome and is not on the causal path between them. If ice cream sales and drowning deaths are correlated, the confound is hot weather (which causes both). Without controlling for confounders, observational studies cannot determine whether the treatment caused the outcome or whether both were caused by the same underlying factor. Randomization eliminates confounding by ensuring that treatment assignment is independent of all other variables.
Applied example
A study finding that coffee drinkers live longer than non-coffee drinkers may be confounded by socioeconomic status: wealthier people drink more specialty coffee and also have better healthcare, diet, and living conditions. The longevity may be caused by wealth, not coffee.
Why it matters
Confounding is the reason that randomized experiments are the gold standard for causal claims, and why observational studies must be interpreted with extreme caution.




