Data science workflow

Last reviewed May 28, 2026 Content v20260528

Track mode: server_script
Means: Server runner
Reading: ~1 min
Level: beginner

This lesson

This lesson teaches Data science workflow: the data science mindset, methods, and communication habits behind evidence-based decisions.

Teams apply Data science workflow in every serious Data Science project—skipping it leaves blind spots in analysis and reviews.

You will apply Data science workflow in contexts like: Analytics teams, product experimentation, research labs, and ML-adjacent engineering in every data-driven company.

Read the narrative, run Python in the playground (stdlib snippets now; install Jupyter, pandas, and scikit-learn locally for full notebooks), and complete MCQs to lock in vocabulary. Also write one measurable question for a dataset you care about.

At the start of the track—complete before lessons that assume workflow and statistics vocabulary.

A repeatable loop: define question → audit data dictionary → explore → clean → split train/test → baseline → iterate → document → ship recommendation.

Document the question

Write: Who decides? What action changes? How will we measure success? Vague goals produce vague analysis.

Audit before plotting

Row count, column types, missing %
Unit of observation (user vs session vs order)
Time range and known collection bugs

Reproducibility habits

import random
random.seed(42)
print('Seed set — same random split in reruns')

Important interview questions and answers

Q: Unit of observation?
A: Grain of one row—mixing users and sessions causes wrong aggregates.
Q: Why set random seed?
A: Makes train/test splits repeatable for debugging and audits.

Self-check

List three audit checks before modeling.
Why document the business decision maker?

Challenge

Set a random seed

Run the workflow lesson code.
Change random.seed and observe split differences in later labs.

Done when: you see seed output and understand why reproducibility matters.

Interview prep

Random seed?: Makes stochastic steps reproducible for audits.
Unit of observation?: Grain of one row—must match the business question.

Interview tip Lesson completion confidence

Can you explain this lesson in 30 seconds without reading notes?

Self-reflection (saved on this device)

Not saved yet.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Code runner not available

Server runner is disabled. Set LEARNING_RUNNER_ENABLED=true and LEARNING_RUNNER_URL in .env (see .env.example).

Check yourself

Multiple choice — immediate feedback.

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

Your project question?
Random seed why?

No discussion yet. Be the first to ask a question.