groupby splits a DataFrame into groups by key column(s), applies a function per group, and combines results—the Pandas equivalent of SQL GROUP BY.
Split-apply-combine
import pandas as pd
df = pd.DataFrame({'dept': ['S','S','E'], 'sales': [100, 150, 200]})
totals = df.groupby('dept')['sales'].sum()
print(totals)
Multiple keys
df.groupby(['region', 'dept'])['sales'].mean()
as_index=False
df.groupby('dept', as_index=False)['sales'].sum() keeps group keys as columns—easier for merges and plotting.
Important interview questions and answers
- Q: groupby object?
A: Lazy split—aggregation triggers computation; inspect with .groups dict. - Q: Multiple aggregations?
A: Use .agg(['sum', 'mean']) or named dict per column.
Self-check
- Group by department and sum sales.
- Why use as_index=False?
Tip: Use as_index=False so group keys stay columns for merges and plots.
Interview prep
- Split-apply-combine?
Split by key, apply aggregation, combine into result.
- as_index=False?
Keeps group keys as columns—easier for downstream merge/plot.