Correlation and causation

Last reviewed May 28, 2026 Content v20260528

Track mode

server_script

Means

Server runner

Reading

~2 min

Level

beginner

This lesson

This lesson teaches Correlation and causation: the data science mindset, methods, and communication habits behind evidence-based decisions.

Correlation dashboards fool teams—causation requires experiments or careful quasi-experimental design.

You will apply Correlation and causation in contexts like: Analytics teams, product experimentation, research labs, and ML-adjacent engineering in every data-driven company.

Read the narrative, run Python in the playground (stdlib snippets now; install Jupyter, pandas, and scikit-learn locally for full notebooks), and complete MCQs to lock in vocabulary.

When you can explain the previous lesson's ideas in your own words.

Correlation measures how two variables move together; causation means changing one variable produces change in another. Confusing them leads to bad product decisions and failed models.

Correlation basics

Correlation ranges roughly from -1 to +1 (for linear relationships):

Positive — both increase together
Negative — one rises as the other falls
Near zero — little linear association

Non-linear relationships can have low correlation but strong patterns—plots still matter.

Correlation is not causation

Confounding — a third variable drives both (ice cream and drowning both rise in summer)
Reverse causality — you may have the arrow backward
Spurious correlation — coincidence in small samples

Randomized experiments (A/B tests) are the gold standard for causal claims; observational data needs careful design.

In modeling

Highly correlated features can destabilize linear models (multicollinearity). Correlated features with the target may still leak if they encode future information—see train/test lessons later.

Practical habit

When someone says “X correlates with Y,” ask: Could Z explain both? What action would we take if we believed causation?

Important interview questions and answers

Q: What is confounding?
A: A hidden factor influences both variables, creating correlation without direct causation.
Q: How establish causation?
A: Controlled experiments, causal inference methods, or strong domain theory—not correlation alone.

Self-check

Give one reason correlation ≠ causation.
What is confounding?
Why plot data beyond correlation coefficients?

Pitfall: Confusing correlation with causation in executive slides.

Interview prep

Correlation?: Linear association—not proof one variable causes another.
Confounder?: Third variable drives both X and Y.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Code runner not available

Server runner is disabled. Set LEARNING_RUNNER_ENABLED=true and LEARNING_RUNNER_URL in .env (see .env.example).

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

Confounder?
Correlation ≠ cause?

No discussion yet. Be the first to ask a question.