Missing values are gaps in your table: empty cells, None in Python, NULL in SQL, or sentinel codes like -1 meaning “unknown.” How you handle them changes model behavior and metrics.
Types of missingness
- MCAR — missing completely at random (rare in practice)
- MAR — missing depends on observed columns
- MNAR — missing depends on unobserved or the value itself (hardest)
Example MNAR: high earners skip income survey questions more often—dropping rows biases averages downward.
Audit missingness first
- Count missing per column
- Cross-tab missing flags with target or segment
- Check if “missing” is informative (create indicator features)
Common strategies (preview)
- Drop rows — only if few rows and MCAR-like
- Impute — median/mode, or model-based (advanced)
- Separate category — “unknown” for categoricals
Cleaning lessons cover imputation workflow; never impute on full data before splitting train/test.
Important interview questions and answers
- Q: Why MNAR matters?
A: Imputing without modeling why data are missing can bias conclusions. - Q: Missing indicator feature?
A: Binary column marking imputation—sometimes improves models when missingness is informative.
Self-check
- What does NULL mean in SQL?
- Name two strategies for missing numeric data.
- Why audit missingness before imputing?
Tip: Ask why data is missing before filling—MNAR is common.
Interview prep
- MCAR?
Missing completely at random—rare in practice.
- Impute blindly?
Understand why missing before filling.