Missing data basics

Last reviewed May 28, 2026 Content v20260528

Track mode

server_script

Means

Server runner

Reading

~2 min

Level

beginner

This lesson

This lesson teaches Missing data basics: the data science mindset, methods, and communication habits behind evidence-based decisions.

Missing data mechanisms (MCAR/MAR/MNAR) decide whether imputation is safe—blind fill creates false confidence.

You will apply Missing data basics in contexts like: Messy CSV exports, API logs, and survey data before any dashboard ships.

Read the narrative, run Python in the playground (stdlib snippets now; install Jupyter, pandas, and scikit-learn locally for full notebooks), and complete MCQs to lock in vocabulary.

When you can explain the previous lesson's ideas in your own words.

Missing values are gaps in your table: empty cells, None in Python, NULL in SQL, or sentinel codes like -1 meaning “unknown.” How you handle them changes model behavior and metrics.

Types of missingness

MCAR — missing completely at random (rare in practice)
MAR — missing depends on observed columns
MNAR — missing depends on unobserved or the value itself (hardest)

Example MNAR: high earners skip income survey questions more often—dropping rows biases averages downward.

Audit missingness first

Count missing per column
Cross-tab missing flags with target or segment
Check if “missing” is informative (create indicator features)

Common strategies (preview)

Drop rows — only if few rows and MCAR-like
Impute — median/mode, or model-based (advanced)
Separate category — “unknown” for categoricals

Cleaning lessons cover imputation workflow; never impute on full data before splitting train/test.

Important interview questions and answers

Q: Why MNAR matters?
A: Imputing without modeling why data are missing can bias conclusions.
Q: Missing indicator feature?
A: Binary column marking imputation—sometimes improves models when missingness is informative.

Self-check

What does NULL mean in SQL?
Name two strategies for missing numeric data.
Why audit missingness before imputing?

Tip: Ask why data is missing before filling—MNAR is common.

Interview prep

MCAR?: Missing completely at random—rare in practice.
Impute blindly?: Understand why missing before filling.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Code runner not available

Server runner is disabled. Set LEARNING_RUNNER_ENABLED=true and LEARNING_RUNNER_URL in .env (see .env.example).

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

Why missing?
Impute risk?

No discussion yet. Be the first to ask a question.