Cleaning with Python preview

Last reviewed May 28, 2026 Content v20260528

Track mode: server_script
Means: Server runner
Reading: ~1 min
Level: intermediate

This lesson

This lesson teaches Cleaning with Python preview: the data science mindset, methods, and communication habits behind evidence-based decisions.

Teams apply Cleaning with Python preview in every serious Data Science project—skipping it leaves blind spots in analysis and reviews.

You will apply Cleaning with Python preview in contexts like: Messy CSV exports, API logs, and survey data before any dashboard ships.

Read the narrative, run Python in the playground (stdlib snippets now; install Jupyter, pandas, and scikit-learn locally for full notebooks), and complete MCQs to lock in vocabulary. Also change input values and re-run to see mean vs median shift.

When you can explain the previous lesson's ideas in your own words.

Filter invalid rows, impute missing numeric values with a median, and keep a cleaning summary—stdlib Python on a list of dicts before you graduate to Pandas pipelines.

Scenario

User signup records: drop rows without country, impute missing age with training-set median (here: median of valid ages in the batch), normalize country codes to uppercase.

Pipeline steps in code

Filter rows missing required fields
Compute median age from remaining valid ages
Fill missing ages with that median
Print before/after counts

Production note

In jobs, persist rules in SQL views or Python transforms tested in CI—notebooks alone are not a pipeline.

Important interview questions and answers

Q: Why uppercase country?
A: Consistent keys prevent duplicate categories IN vs in.
Q: Impute median in preview?
A: Demonstrates robust default; production stores median from train split only.

Self-check

What rows does the filter remove?
Which statistic imputes missing age?
Why normalize country strings?

Tip: Compare row counts before and after filters.

Interview prep

Filter rows?: List comprehensions remove invalid records in small examples.

Interview tip Lesson completion confidence

Can you explain this lesson in 30 seconds without reading notes?

Self-reflection (saved on this device)

Not saved yet.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Code runner not available

Server runner is disabled. Set LEARNING_RUNNER_ENABLED=true and LEARNING_RUNNER_URL in .env (see .env.example).

Check yourself

Multiple choice — immediate feedback.

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

Rows before/after?
Impute median why?

No discussion yet. Be the first to ask a question.