Go beyond single sum or mean: use agg for multiple functions, custom lambdas, and column-specific aggregations in one call.
agg syntax
import pandas as pd
df = pd.DataFrame({'g': ['A','A','B'], 'x': [1,2,3], 'y': [10,20,30]})
result = df.groupby('g').agg(
x_sum=('x', 'sum'),
y_mean=('y', 'mean'),
)
print(result)
Built-in aggregations
sum,mean,median,std,countmin,max,first,last,nuniquesize— count including NaN groups
Named aggregation (pandas ≥ 0.25)
Named tuples in agg produce readable column names—preferred in production over multi-index columns from list-of-funcs style.
Important interview questions and answers
- Q: count vs size?
A: count excludes NaN per column; size counts all rows in group including NaN. - Q: Multiple columns different funcs?
A: Pass dict: {'col1': 'sum', 'col2': ['min', 'max']}.
Self-check
- Compute sum and mean of one column by group.
- Use named aggregation syntax.
Tip: Named aggregation ('col', 'sum') produces readable output column names.
Interview prep
- Named agg?
col_sum=('col', 'sum') produces readable column names.
- count vs size?
count excludes NaN per column; size counts all rows in group.