Skip to content
Learn Netverks

Lesson

Step 19/36 53% through track

handling-missing-values

Handling missing values

Last reviewed May 28, 2026 Content v20260528
Track mode
server_script
Means
Server runner
Reading
~2 min
Level
beginner

This lesson

This lesson teaches Handling missing values: the data science mindset, methods, and communication habits behind evidence-based decisions.

Missing data mechanisms (MCAR/MAR/MNAR) decide whether imputation is safe—blind fill creates false confidence.

You will apply Handling missing values in contexts like: Messy CSV exports, API logs, and survey data before any dashboard ships.

Read the narrative, run Python in the playground (stdlib snippets now; install Jupyter, pandas, and scikit-learn locally for full notebooks), and complete MCQs to lock in vocabulary.

When you can explain the previous lesson's ideas in your own words.

After auditing missingness, choose a strategy per column: drop, impute, or model missingness explicitly. The right choice depends on how much is missing and why.

When to drop rows

  • Very few rows missing and no pattern tied to target
  • Critical identifier missing (cannot join or attribute)

Dropping many rows can bias results if missingness is not random.

Imputation options

  • Numeric — median (robust), mean, or group-wise median by category
  • Categorical — mode (most frequent) or explicit “unknown”
  • Advanced — model-based imputation (use with care, fit on train only)

Indicators

Add revenue_was_missing flag columns when missingness may carry signal (optional survey questions, partial form completion).

Train-only fitting

# Conceptual pattern (after split):
# median_age = statistics.median(train_ages)
# for row in train: impute with median_age
# for row in test: use same median_age from train

Important interview questions and answers

  1. Q: Why median for skewed numeric?
    A: Less pulled by outliers than mean—common default for imputation.
  2. Q: Impute on full dataset risk?
    A: Test information leaks into training via global statistics—inflate metrics.

Self-check

  1. Name two imputation strategies for categoricals.
  2. When is dropping rows reasonable?
  3. Why fit imputation on training data only?

Tip: Document imputation strategy in the README.

Interview prep

Drop vs impute?

Drop when few rows; impute with care and documentation.

Interview tip Lesson completion confidence

Can you explain this lesson in 30 seconds without reading notes?

Not saved yet.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Check yourself

Multiple choice — immediate feedback.

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

  • Drop vs fill?
  • MNAR thought?

Sign up or log in to post comments and sync lesson progress across devices.

No discussion yet. Be the first to ask a question.

Jump