Categorical data

Last reviewed May 28, 2026 Content v20260528

Track mode: server_script
Means: Server runner
Reading: ~1 min
Level: intermediate

This lesson

This lesson teaches Categorical data: Pandas tabular manipulation—indexing, dtypes, reshaping, and analysis habits for real-world tables.

Teams apply Categorical data in every serious Pandas project—skipping it leaves blind spots in analysis and reviews.

You will apply Categorical data in contexts like: CSV/Parquet analysis, ETL notebooks, and ad hoc reporting.

Read the narrative, run `import pandas as pd` snippets with in-memory DataFrames (install pandas and numpy with pip if needed), inspect `.head()`, `.dtypes`, and complete MCQs.

When you can explain the previous lesson's ideas in your own words.

The category dtype stores low-cardinality strings efficiently and enables ordered comparisons. Convert with astype('category') or pd.Categorical when you have fixed label sets.

Why categorical?

Lower memory vs object strings on repeated values
Faster groupby and sort on known categories
Enforce valid values (typos become NaN if not in categories)
Ordered categories for ranking (small < medium < large)

Creating ordered categories

import pandas as pd
size_order = ['S', 'M', 'L']
cat = pd.Categorical(['M', 'S', 'L'], categories=size_order, ordered=True)
print(cat.sort_values())

In DataFrames

df = pd.DataFrame({'size': ['M', 'S', 'M', 'L']})
df['size'] = df['size'].astype('category')
print(df['size'].cat.categories)

Important interview questions and answers

Q: When not to use?
A: High-cardinality unique strings (user IDs, timestamps)—object or string dtype better.
Q: cat accessor?
A: .cat.categories, .cat.codes, .cat.reorder_categories for ordered ops.

Self-check

Convert a column to category dtype.
Create an ordered categorical for sizes.

Tip: Convert low-cardinality strings to category before large groupby operations.

Interview prep

When category?: Low-cardinality repeated strings—memory and groupby speed.
Ordered?: Enables sort/compare by business order (S < M < L).

Interview tip Lesson completion confidence

Can you explain this lesson in 30 seconds without reading notes?

Self-reflection (saved on this device)

Not saved yet.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Code runner not available

Server runner is disabled. Set LEARNING_RUNNER_ENABLED=true and LEARNING_RUNNER_URL in .env (see .env.example).

Check yourself

Multiple choice — immediate feedback.

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

Memory win?
ordered category?

No discussion yet. Be the first to ask a question.