Correlation measures how two variables move together; causation means changing one variable produces change in another. Confusing them leads to bad product decisions and failed models.
Correlation basics
Correlation ranges roughly from -1 to +1 (for linear relationships):
- Positive — both increase together
- Negative — one rises as the other falls
- Near zero — little linear association
Non-linear relationships can have low correlation but strong patterns—plots still matter.
Correlation is not causation
- Confounding — a third variable drives both (ice cream and drowning both rise in summer)
- Reverse causality — you may have the arrow backward
- Spurious correlation — coincidence in small samples
Randomized experiments (A/B tests) are the gold standard for causal claims; observational data needs careful design.
In modeling
Highly correlated features can destabilize linear models (multicollinearity). Correlated features with the target may still leak if they encode future information—see train/test lessons later.
Practical habit
When someone says “X correlates with Y,” ask: Could Z explain both? What action would we take if we believed causation?
Important interview questions and answers
- Q: What is confounding?
A: A hidden factor influences both variables, creating correlation without direct causation. - Q: How establish causation?
A: Controlled experiments, causal inference methods, or strong domain theory—not correlation alone.
Self-check
- Give one reason correlation ≠ causation.
- What is confounding?
- Why plot data beyond correlation coefficients?
Pitfall: Confusing correlation with causation in executive slides.
Interview prep
- Correlation?
Linear association—not proof one variable causes another.
- Confounder?
Third variable drives both X and Y.