The category dtype stores low-cardinality strings efficiently and enables ordered comparisons. Convert with astype('category') or pd.Categorical when you have fixed label sets.
Why categorical?
- Lower memory vs object strings on repeated values
- Faster groupby and sort on known categories
- Enforce valid values (typos become NaN if not in categories)
- Ordered categories for ranking (small < medium < large)
Creating ordered categories
import pandas as pd
size_order = ['S', 'M', 'L']
cat = pd.Categorical(['M', 'S', 'L'], categories=size_order, ordered=True)
print(cat.sort_values())
In DataFrames
df = pd.DataFrame({'size': ['M', 'S', 'M', 'L']})
df['size'] = df['size'].astype('category')
print(df['size'].cat.categories)
Important interview questions and answers
- Q: When not to use?
A: High-cardinality unique strings (user IDs, timestamps)—object or string dtype better. - Q: cat accessor?
A: .cat.categories, .cat.codes, .cat.reorder_categories for ordered ops.
Self-check
- Convert a column to category dtype.
- Create an ordered categorical for sizes.
Tip: Convert low-cardinality strings to category before large groupby operations.
Interview prep
- When category?
Low-cardinality repeated strings—memory and groupby speed.
- Ordered?
Enables sort/compare by business order (S < M < L).