What is Collider bias?
Collider bias occurs when a statistical analysis conditions on a variable that is caused by both the treatment and the outcome, creating a spurious association between them. It is one of the most common and least recognized threats to valid causal inference.
How it works
In a directed acyclic graph, a collider is a variable where two arrows point in. Conditioning on it (including it as a control variable or restricting the sample to it) opens a path between its causes that would otherwise be blocked. The result is a statistical association that does not reflect a real causal relationship. Collider bias explains why hospital studies often find that smokers have better outcomes than non-smokers (collider: hospitalization, caused by both smoking and severity of other conditions).
Applied example
A study of admitted hospital patients finds that COVID-19 patients who smoke have better outcomes than non-smokers. This surprising finding disappears in the general population because hospitalization is a collider: smoking and COVID severity both increase hospitalization probability, creating a spurious negative association among hospitalized patients.
Why it matters
Collider bias produces paradoxical findings that can mislead policy and medical practice, making it essential to understand causal structure before deciding which variables to control for.



