Pandas workflow

Last reviewed May 28, 2026 Content v20260528

Track mode

server_script

Means

Server runner

Reading

~2 min

Level

beginner

This lesson

This lesson teaches Pandas workflow: Pandas tabular manipulation—indexing, dtypes, reshaping, and analysis habits for real-world tables.

This track orients workflow; NumPy/Pandas tracks teach the tools you will use daily in notebooks.

You will apply Pandas workflow in contexts like: CSV/Parquet analysis, ETL notebooks, and ad hoc reporting.

Read the narrative, run `import pandas as pd` snippets with in-memory DataFrames (install pandas and numpy with pip if needed), inspect `.head()`, `.dtypes`, and complete MCQs. Also print `df.shape`, `df.dtypes`, and `df.head()` after every transform.

At the start of the track—complete before lessons that assume Series, DataFrame, and dtype vocabulary.

A repeatable Pandas workflow: load → inspect (head, info, describe) → clean (dtypes, missing) → transform → aggregate → export or hand off to ML.

Inspect first

df.shape — rows × columns
df.head() / df.tail() — sample rows
df.info() — dtypes and non-null counts
df.describe() — numeric summary stats
df.isna().sum() — missing value counts per column

Clean before analyze

Fix dtypes (strings that should be numbers), handle missing values explicitly, and deduplicate before joins. Silent dtype bugs cause wrong aggregates.

Reproducible patterns

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [1, 2, np.nan], 'b': ['x', 'y', 'z']})
print(df.info())
print(df.describe())

Next steps in this track

Modules 02–05 cover basics through advanced reshaping. Module 06 previews NumPy, Matplotlib, sklearn, and SciPy integration; module 07 prepares interviews and production habits before SciPy and deeper SQL.

Important interview questions and answers

Q: Why info() before groupby?
A: Reveals object vs numeric dtypes and missing counts—prevents silent aggregation errors.
Q: describe() limits?
A: Summarizes numeric columns by default; categoricals need value_counts().

Self-check

List four inspect methods for any new DataFrame.
What is the recommended first step after loading data?

Challenge

Inspect a new DataFrame

Run the workflow lesson code.
Add df.info() output mentally—note dtypes and null counts.

Done when: you can describe shape, dtypes, and missing data before transforming.

Interview prep

Inspect first?: head, info, describe, isna().sum() before heavy transforms.
Why dtypes?: Wrong dtypes cause silent math errors—strings that look like numbers fail aggregation.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Code runner not available

Server runner is disabled. Set LEARNING_RUNNER_ENABLED=true and LEARNING_RUNNER_URL in .env (see .env.example).

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

head/dtypes habit?
Copy warning?

No discussion yet. Be the first to ask a question.